The Data Lifecycle
Extract
Git Numstat
Resolve
Identity Fusion
Classify
Functional Areas
Serialize
JSON Artifacts

The Orange Dev Tracker is powered by a forensic Git analysis pipeline. We reconstruct the 15-year history of the project to understand the evolution of the software and the people behind it.

Contributors & Aliases

We track individuals who have authored at least one commit. Because developers often use multiple aliases, we apply identity resolution to unify them into unique human identities.

Unique Contributors Engine: clean.py

Total count of distinct individuals with aliases unified via a graph clustering algorithm.

Methodology
We build a relational graph where names and emails are nodes. Edges are drawn when they appear in the same commit. Connected components represent a single identity.
Resolution
Known pseudonyms (e.g., sipaPieter Wuille) are manually fused using our aliases_lookup.json index.

Maintainer Tracking

Maintainers are the technical gatekeepers of the project. We track both historical lineage and current authority.

The Trusted Circle

Historical (Author Era)
Verified via community records (StackExchange) for early committers like Satoshi Nakamoto, Laszlo Hanyecz, and Gavin Andresen.
Modern (Merge Era)
Identified by the committer_email on is_merge = True commits, cross-referenced with the trusted-keys whitelist from the repository.

Codebase & LOC

We measure project size using a combination of static analysis (current state) and historical churn replays (evolution).

Logic Lines of Code (LLOC)

Included
C++, Python (Tests), C, and Shell scripts.
Excluded
Qt translations (.ts), UI forms (.ui), documentation (.md), and generated assets.

Categories & Mapping

Every file in the Bitcoin Core repository is assigned a functional category. This allows us to track development focus and calculate risk-weighted impact scores.

Priority Rules

To ensure cross-cutting work like Testing and Documentation is correctly captured, we apply rules in a specific priority order. Cross-subsystem tests are identified first. For example, a file residing in src/wallet/test/ is categorized as Tests (QA), not Wallet, because its primary function is validation rather than feature logic.

Full Category Index

Category Primary Paths / Patterns Core Role
Consensus src/consensus/, src/validation Protocol invariant rules
Cryptography src/crypto/, src/secp256k1/ Mathematical primitives
Core Libs src/kernel/, src/script/ Architectural scaffolding
P2P Network src/net/, src/protocol Communication fabric
Database src/leveldb/, src/dbwrapper Persistence & Indexing
Utilities src/util/, src/support/ Shared helpers
Node & RPC src/node/, src/rpc/ Application & Interface
Wallet src/wallet/, src/interfaces Key & Coin management
GUI src/qt/ Visual presentation
Tests (QA) /test/, /fuzz/, */test/*.cpp Global & subsystem validation
Build & CI ci/, Makefile, depends/ Compilation & DevOps
Documentation doc/, *.md Education & Guides

Impact & Criticality

We use a Risk-Weighted Impact Model to visualize development focus. Not all code is equal; a bug in Consensus is existential, while a typo in Documentation is minor.

Weighting Groups

Security Group (50x)
50x Consensus, Cryptography, Core Libs.
Resilience Group (30x - 40x)
40x P2P Network. 30x Database, Utilities.
Usability Group (10x - 20x)
20x Wallet. 10x Node & RPC, GUI.
Quality & Education (1x - 5x)
5x Tests, Build & CI. 1x Documentation.

The Formula

Impact scores are calculated using fractional attribution to prevent over-counting multi-category work.

Score = Σ (Commit × Weight × 1/N)

Data Sources

Transparency is our priority. You can verify all static classification data directly via our repository.

Verified Data Assets

These JSON files provide the ground truth for our forensics. We invite the community to review and suggest improvements via PRs.

Identity & Aliases
Unifies multiple developer identities (names/emails) into single human entities.
aliases_lookup.json
Maintainer Trust
The whitelist of historical and active maintainers authorized to merge code.
maintainers_lookup.json
Corporate Sponsorship
Heuristics for mapping emails and companies to sponsorship status.
sponsors_lookup.json
Regional Identification
Verified geographical locations for contributors (where public).
identified_locations.json