Contributors

We track individuals who have authored at least one commit to the Bitcoin Core repository. Because developers often use multiple names or email addresses, we apply identity resolution to unify aliases.

Unique Contributors KPI

The total count of distinct individuals who have authored commits to Bitcoin Core, with aliases unified into single identities.

How it's calculated
We build a graph where names and emails are nodes, connected when they appear in the same commit. Connected components represent one person. Known pseudonyms (e.g., "sipa" = Pieter Wuille) are manually linked.
What it includes
Any person who authored a commit, including documentation, tests, and typo fixes.
What it excludes
Reviewers who only ACK/NACK without authoring commits. Bot accounts are filtered out.

Cohort Year

The year of a contributor's first commit to the repository.

Use case
Enables analysis of contributor retention and generational patterns. "Veterans" (first commit years ago) vs "Newcomers" (first commit this year).

Tenure (Active Years)

The count of distinct calendar years in which the contributor made at least one commit.

Example
A developer active in 2012 and 2024 has tenure of 2 years, not 12 years span.
Why this matters
Distinguishes sustained contributors from one-time contributors, regardless of calendar span.

Contribution Tiers

Percentile-based segmentation of contributors by commit volume.

Tier Percentile Description
👑 The Core Top 1%~12 people responsible for ~80% of commits
⭐ The Regulars Top 10% Consistent, active contributors
⚒️ The Sustainers Top 25% Periodic contributors with steady output
🔭 The Explorers Top 50% Occasional contributors
🧱 The Scouts Bottom 50% One-time or rare contributors

Note: Commit count does not equal impact. A single consensus-critical change can be more significant than hundreds of documentation updates. The tier system reflects activity volume, not importance.

Bitcoin Core Components

Bitcoin Core consists of several key components that work together to run the network and provide user interfaces.

bitcoind (Daemon)

The core engine that validates transactions, maintains the blockchain, and communicates with peers.

What it does
Runs the Bitcoin protocol, manages the UTXO set, and provides RPC interfaces for automation.
Categories involved
Consensus, Node & RPC, P2P Network, Database, Cryptography, Utilities.

bitcoin-qt (GUI)

The graphical user interface for wallet management and node operation.

What it does
Provides a desktop app for sending/receiving Bitcoin, viewing transactions, and configuring the node.
Categories involved
GUI, Wallet.

These components share underlying libraries for consensus rules, cryptography, and utilities, ensuring consistency across implementations.

Maintainers

Maintainers are individuals who merge code into the repository. In Bitcoin Core's governance model, maintainers have commit access but decisions are made through rough consensus.

Total Maintainers

Count of distinct individuals who have ever merged commits into the repository.

Technical basis
Identified by the committer_email field on merge commits (is_merge = True).
Caveat
This is broader than current write access. It includes historical maintainers and may include some self-rebases.

Active Maintainers

Maintainers with merge activity in the last 12 months.

Current value
Approximately 5 active maintainers, reflecting Bitcoin Core's small trusted group.

Context: Bitcoin Core has historically had a small group of maintainers (rarely more than 5-6 at a time). This is intentional — maintainer access is granted conservatively and requires significant trust from the community.

Codebase & Lines of Code

We measure codebase size using static analysis of the current repository, supplemented by historical churn data.

Current Codebase Size (LOC) KPI

Total lines of code in source files, measured from the current repository state.

What's included
C++, Python, Shell, C, and other logic files.
What's excluded
Qt translations (.ts, .xlf), UI files, images, config files, build system files, and generated data.
Current value
~480,000 lines of logic code.

Historical LOC Evolution

Cumulative lines of code over time, calculated by replaying commit additions and deletions.

Methodology
Commits are sorted chronologically. For each month, we sum (additions - deletions) to get net change, then apply cumulatively.
Calibration
A scaling factor is applied so the final historical value matches the static scan. This corrects for file moves, reformatting, and branch merge artifacts.

Why churn ≠ static scan: Git records file moves as delete + add (inflating both). Reformatting commits change lines without adding logic. The static scan gives the "true" current size; historical evolution shows the trend.

Categories (Architecture)

Every file is assigned to a functional category based on its path. This segmentation reflects Bitcoin Core's architecture, grouping related functionality to track development focus. Categories with consensus-critical code require higher scrutiny than user-facing features.

Category Paths Matched Description
Consensus (Domain Logic) src/consensus/, src/script/, src/validation The core rules of Bitcoin. Changes here are critical and require extreme scrutiny.
Node & RPC (App/Interface) src/node/, src/rpc/ Application layer — the bitcoind daemon and command-line interface.
P2P Network (Infrastructure) src/net, src/addrman Peer discovery, message handling, network protocol.
Wallet (Client App) src/wallet/ Internal wallet for key management and transaction creation.
GUI (Presentation Layer) src/qt/ The graphical interface (Bitcoin-Qt).
Cryptography (Primitives) src/crypto/, src/secp256k1/ Cryptographic primitives and the secp256k1 elliptic curve library.
Database (Persistence) src/leveldb/, src/dbwrapper Persistence layer for blockchain and wallet data.
Utilities (Shared Libs) src/util/, src/support/ Shared libraries, logging, and low-level support code.
Tests (QA) src/test/, test/ Unit tests, functional tests, and benchmarks.
Build & CI (DevOps) ci/, Makefile, CMakeLists Build system configuration and continuous integration.
Documentation doc/, *.md Developer documentation and markdown files.
Core Libs src/ (misc) Core libraries and dependencies not fitting other categories.

When a commit touches multiple categories, the category with the most changed lines is assigned as the primary category for that commit.

The 5 Dimensions of Criticality

Not all code is equal. A one-line change to consensus rules carries significantly more risk (and value) than a 500-line documentation update. To visualize this, we use a Risk-Weighted Impact Model that groups categories into 5 qualitative dimensions.

1. Security (The Guard Rails)

The "Nuclear Core" of Bitcoin. Changes here are existential.

Risk Weight
100x - 50x
Categories
Consensus, Cryptography, Core Libs.
Rationale
Bugs in these areas can cause chain splits, inflation events, or key leakage. The highest standard of review is required.

2. Resilience (The Fabric)

The networking and validation infrastructure that keeps the system alive.

Risk Weight
50x - 30x
Categories
P2P Network, Utilities, Database.
Rationale
Failure here can lead to eclipse attacks, peer bans, or data corruption. Critical for network health.

3. Usability (The Interface)

How humans and software interact with the node.

Risk Weight
20x - 10x
Categories
Wallet, Node & RPC, GUI.
Rationale
Wallet bugs can lose user funds (local risk). RPC/GUI bugs break integration but don't threaten the network globally.

4. Quality (The Safety Net)

Ensuring correctness and stability.

Risk Weight
5x
Categories
Tests (QA), Build & CI.
Rationale
Vital for long-term velocity and safety, but do not affect the mainnet runtime directly.

5. Education (The Knowledge Base)

Onboarding and explaining the system.

Risk Weight
1x
Categories
Documentation.
Rationale
Important for the ecosystem, but carries near-zero risk to the protocol's operation.

Calculation Logic

How we compute the "Impact Score" for the Radar Chart.

Impact Score = Σ (Commit × Weight × Fractional_Factor)
Variables
  • Commit: 1 for each commit authored.
  • Weight: The Risk Weight of the primary category (e.g., 50 for Consensus).
  • Fractional_Factor: 1 / N, where N is the number of categories touched by that commit.
Example
A commit that touches Consensus (50x) and Tests (5x):
Score = (1 * 50 * 0.5) + (1 * 5 * 0.5) = 25 + 2.5 = 27.5

Model Analysis: Interpreting the Data

How to read the "Impact Score" and understand its trade-offs.

How to Interpret
  • Shape > Size: Focus on the direction of the chart. A skew toward "Security" indicates a Protocol Engineer. A skew toward "Usability" indicates a Product Engineer.
  • No "Best" Shape: A "Generalist" (circular shape) is not necessarily better than a "Specialist" (spiky shape). Bitcoin needs both.
Trade-offs (Pros & Cons)
✅ Pros (Why we use it)
  • Risk Awareness: Correctly identifies that a 1-line Consensus fix is more critical than a 100-line documentation update.
  • Refactor Dampening: "Fractional Attribution" prevents score inflation from scripts or refactors that touch hundreds of files (their weight is diluted by 1/N).
⚠️ Cons (Limitations)
  • Subjective Weights: The choice of "50x" vs "20x" is a heuristic judgment, not a law of physics.
  • Atomic Commit Sensitivity: Theoretically, score inflation is possible by making many tiny, separate commits to critical files. (However, this is rare and usually caught by maintainers).

Activity Patterns

Heatmap (Hour × Year)

Commit count binned by UTC hour and year, showing when development happens globally.

Insight
Early development was concentrated in US timezones. Over time, activity spread across European and Asian hours as the project globalized.

Weekend Ratio

Percentage of commits made on Saturday or Sunday (UTC) per year.

What it indicates
Higher ratio suggests hobbyist/passion-driven development. Lower ratio suggests professionalization and sponsored work during business hours.
Caveat
UTC "weekend" may be Friday night or Monday morning in some timezones. This metric uses a Western work-week assumption.

Social & Sponsorship

Stars & Forks

GitHub stars (interest) and forks (derivative work or contribution intent) over time.

Data source
GitHub API with pagination. Early data is complete; recent data is extrapolated linearly to current totals.

Corporate Era (Sponsorship %)

Approximate percentage of commits from developers with corporate/sponsored affiliations.

Classification method
Corporate: Developer has a company in GitHub profile, OR uses a corporate email domain (e.g., @chaincode.com, @blockstream.io).
Personal: Developer uses a personal email (gmail, hotmail, etc.) with no company affiliation.
Accuracy
This is a heuristic. Some sponsored developers use personal emails; some .edu emails are not sponsorship-related. The trend direction is reliable; exact percentages are approximate.

Known Sponsors: Chaincode Labs, Spiral (Block/Square), Blockstream, MIT Digital Currency Initiative, Brink, Human Rights Foundation, and individual patrons via GitHub Sponsors or grants.

Known Limitations

What We Don't Capture

  • Mailing List Contributions: Design discussions, BIP authorship, and concept development happen off-GitHub.
  • Testing Without Commits: Many contributors test releases without leaving a commit trail.
  • Pseudonymous Linking: Our alias list is incomplete. Some pseudonyms may be counted as separate people.

Data Freshness

Commit Data
Refreshed monthly from the Bitcoin Core git repository.
Enrichment Data
GitHub profile information (company, location) is from a 2024 snapshot and may be stale.
Social Data
Stars/forks totals are live; historical curve is partially extrapolated.

Data Sources

Source What We Extract Method
Bitcoin Core Git Repository Commits, authors, timestamps, file changes git log --all --numstat
Repository HEAD Current file count, lines of code Static file system scan
GitHub API Stars, forks, watchers, user profiles REST API with pagination
Legacy Snapshot (2024) Contributor GitHub profiles, locations, companies Historical data fusion

Technical Pipeline

1. ingest.py   → Parse git log → commits.parquet
2. social.py   → GitHub API   → social_history.parquet  
3. clean.py    → Identity graph resolution
4. enrich.py   → Profile fusion → contributors_enriched.parquet
5. process.py  → Aggregation  → JSON artifacts for dashboard

For technical details, see the source repository and docs/project_context/metric_logic_review.md.