Methodology | Orange Dev Tracker

The Data Lifecycle

Extract

Git Numstat

Resolve

Identity Fusion

Classify

Functional Areas

Serialize

JSON Artifacts

The Orange Dev Tracker is powered by a forensic Git analysis pipeline. We reconstruct the 15-year history of the project to understand the evolution of the software and the people behind it.

Contributors & Aliases

We track individuals who have authored at least one commit. Because developers often use multiple aliases, we apply identity resolution to unify them into unique human identities.

Unique Contributors Engine: clean.py

Total count of distinct individuals with aliases unified via a graph clustering algorithm.

Methodology

We build a relational graph where names and emails are nodes. Edges are drawn when they appear in the same commit. Connected components represent a single identity.

Resolution

Known pseudonyms (e.g., sipa → Pieter Wuille) are manually fused using our aliases_lookup.json index.

Maintainer Tracking

Maintainers are the technical gatekeepers of the project. We track both historical lineage and current authority.

The Trusted Circle

Historical (Author Era)

Verified via community records (StackExchange) for early committers like Satoshi Nakamoto, Laszlo Hanyecz, and Gavin Andresen.

Modern (Merge Era)

Identified by the committer_email on is_merge = True commits, cross-referenced with the trusted-keys whitelist from the repository.

Codebase & LOC

We measure project size using a combination of static analysis (current state) and historical churn replays (evolution).

Logic Lines of Code (LLOC)

Included

C++, Python (Tests), C, and Shell scripts.

Excluded

Qt translations (.ts), UI forms (.ui), documentation (.md), and generated assets.

Categories & Mapping

Every file in the Bitcoin Core repository is assigned a functional category. This allows us to track development focus and calculate risk-weighted impact scores.

Priority Rules

To ensure cross-cutting work like Testing and Documentation is correctly captured, we apply rules in a specific priority order. Cross-subsystem tests are identified first. For example, a file residing in src/wallet/test/ is categorized as Tests (QA), not Wallet, because its primary function is validation rather than feature logic.

Full Category Index

Category	Primary Paths / Patterns	Core Role
Consensus	`src/consensus/`, `src/validation`	Protocol invariant rules
Cryptography	`src/crypto/`, `src/secp256k1/`	Mathematical primitives
Core Libs	`src/kernel/`, `src/script/`	Architectural scaffolding
P2P Network	`src/net/`, `src/protocol`	Communication fabric
Database	`src/leveldb/`, `src/dbwrapper`	Persistence & Indexing
Utilities	`src/util/`, `src/support/`	Shared helpers
Node & RPC	`src/node/`, `src/rpc/`	Application & Interface
Wallet	`src/wallet/`, `src/interfaces`	Key & Coin management
GUI	`src/qt/`	Visual presentation
Tests (QA)	`/test/`, `/fuzz/`, `/test/.cpp`	Global & subsystem validation
Build & CI	`ci/`, `Makefile`, `depends/`	Compilation & DevOps
Documentation	`doc/`, `*.md`	Education & Guides

Impact & Criticality

We use a Risk-Weighted Impact Model to visualize development focus. Not all code is equal; a bug in Consensus is existential, while a typo in Documentation is minor.

Weighting Groups

Security Group (50x)

50x Consensus, Cryptography, Core Libs.

Resilience Group (30x - 40x)

40x P2P Network. 30x Database, Utilities.

Usability Group (10x - 20x)

20x Wallet. 10x Node & RPC, GUI.

Quality & Education (1x - 5x)

5x Tests, Build & CI. 1x Documentation.

The Formula

Impact scores are calculated using fractional attribution to prevent over-counting multi-category work.

Score = Σ (Commit × Weight × 1/N)

Data Sources

Transparency is our priority. You can verify all static classification data directly via our repository.

Verified Data Assets

These JSON files provide the ground truth for our forensics. We invite the community to review and suggest improvements via PRs.

Identity & Aliases

Unifies multiple developer identities (names/emails) into single human entities.
aliases_lookup.json

Maintainer Trust

The whitelist of historical and active maintainers authorized to merge code.
maintainers_lookup.json

Corporate Sponsorship

Heuristics for mapping emails and companies to sponsorship status.
sponsors_lookup.json

Regional Identification

Verified geographical locations for contributors (where public).
identified_locations.json