Books or Collection of Essays:
Recent writings/essays that I loved: The Movement of Big-3: Part 1, Part 2, Part 3
I am not able to follow cricket regularly anymore, but I used to enjoy all the writings of
These metrics come from the graph literature. Link1 . LN graph has its own set of constraints and challenges, so the metrics needs to defined for it.
In this post, I will lay out the metrics category. I’ll come up and share the precise metrics definition after diving deep into the data along with the below metrics dimension.
Here is a list of metrics:
Here is what I’ll try achieving in this post:
Degree Centrality for a node (N) is calculated as:
$\text{Degree Centrality}(N) = \frac{\text{Number of channels for } N}{\text{Total number of channels in LN graph}}$
For LN, it is ‘# of channels’ and only ‘# of channels’ that will have play on degree centrality. A channel like LQWD-Canada with thousand of channels have 5 times higher degree centrality compared to River, even though River has committed 3 times more bitcoin as liquidity. Refer:Plebdashboard
How to think about ‘Degree centrality’ for node selection? If you find a node with high D, and not directly connected with you, and if the average channel size is not too low, likely they are good peer to get started with. We should note that a node may have a good D, but still may not give us good coverage, if all of their channels in concentraed in one part of LN graph. Make a note that capacity/liquidity has no play on this metric.
Calculation | Node | Degree ( # of nodes connected to) | Calculation | Degree Centrality | |——|——–|————————————|——————-| | Sia | 2 | $\frac{2}{5-1} = \frac{2}{4}$ | 0.5 | | Ria | 4 | $\frac{4}{5-1} = \frac{4}{4}$ | 1.0 | | Xi | 2 | $\frac{2}{5-1} = \frac{2}{4}$ | 0.5 | | Ivy | 4 | $\frac{4}{5-1} = \frac{4}{4}$ | 1.0 | | Eva | 2 | $\frac{2}{5-1} = \frac{2}{4}$ | 0.5 |
$\text{Betweenness Centrality}(N) = \frac{\text{Number of shortest paths passing through } N}{\text{Total number of shortest paths}}$
Below table shows shortest path count for each pair, and it should give you an idea of what is shortest path. I believe defining it is redundant.
Node Pair | Shortest Path Count | Path Details | Intermediary Nodes |
---|---|---|---|
Sia - Ria | 1 | Direct path | None |
Sia - Xi | 1 | Path through Ria | Ria |
Sia - Ivy | 1 | Direct path | None |
Sia - Eva | 2 | 1 path through Ivy, 1 path through Ria | Ivy, Ria |
Ria - Xi | 1 | Direct path | None |
Ria - Ivy | 1 | Direct path | None |
Ria - Eva | 1 | Direct path | None |
Xi - Ivy | 1 | Direct path | None |
Xi - Eva | 1 | Path through Ivy | Ivy |
Ivy - Eva | 1 | Direct path | None |
Total | 11 |
Once, we have shortest path through each node, and also total shortest path count, we can easily calculate big B.
Node | Shortest Paths Through Node | Betweenness Centrality Calculation | Betweenness Centrality Value |
---|---|---|---|
Sia | 0 | $0/11 = 0$ | 0.0 |
Ria | 2 | $2/11 \approx 0.182$ | 0.182 |
Xi | 0 | $0/11 = 0$ | 0.0 |
Ivy | 2 | $2/11 \approx 0.182$ | 0.182 |
Eva | 0 | $0/11 = 0$ | 0.0 |
It is not just the count of channel matters for high B, but the location of node in the graph. A node with a low channel count (low D) may have high B, if it acts as a bridge.
For an example, have a look at below graph. Kim has high B, even though we have nodes, Alice and Dave, with higher D.
How to think about ‘Betweenness centrality’ for node selection? In general, it is great connecting to a bridge, as it gives you a very good coverage. However, make a note again that capacity/liquidity has no play on this metric. So, we can choose one with high capacity.
Closeness Centrality for a node (N) is calculated as:
$\text{Closeness Centrality}(N) = \frac{\text{Total number of nodes} - 1}{\text{Sum of the shortest path distances from } N \text{ to all other nodes}}$
This metric evaluates how quickly a node can reach all other nodes in the network, providing a measure of how ‘central’ a node is in terms of network navigation.
Calculation for sum of the shortest path
Node | Paths and Distances | Sum of Distances |
---|---|---|
Sia | To Ria: 1, To Xi: 2, To Ivy: 1, To Eva: 2 | 6 |
Ria | To Sia: 1, To Xi: 1, To Ivy: 1, To Eva: 1 | 4 |
Xi | To Sia: 2, To Ria: 1, To Ivy: 1, To Eva: 1 | 5 |
Ivy | To Sia: 1, To Ria: 1, To Xi: 1, To Eva: 1 | 4 |
Eva | To Sia: 2, To Ria: 1, To Xi: 1, To Ivy: 1 | 5 |
Calculation for Closeness centrality
Node | Sum of Distances to Other Nodes | Calculation | Closeness Centrality |
---|---|---|---|
Sia | 6 | $\frac{5-1}{6}$ | 0.67 |
Ria | 5 | $\frac{5-1}{5}$ | 0.8 |
Xi | 6 | $\frac{5-1}{6}$ | 0.67 |
Ivy | 5 | $\frac{5-1}{5}$ | 0.8 |
Eva | 6 | $\frac{5-1}{6}$ | 0.67 |
Study the graph below to internalize that how big C (Closeness centrality) compares with big D or big B.
Lightning and Closeness centrality: Looking at the graph, you may guess that ‘HighbetweennessNode’ does not have super low closeness centrality. There is an overlapp with other meadures of centrality, so in the contxt of LN, we need to ask are we getting value from an additional metric. However, if someone is doing micro mass payment, this would be the node to get connected to. Micro payment makes sure that we dont have to worry about liquidity a lot, mass payment because, through this node, you can connect to eveyone in the graph with least hops.
To calculate the eigenvector centrality of a node ( N ) in a network, we use the following formula:
$\text{Eigenvector Centrality}(v) = \lambda_1 \times \text{Sum of the centralities of the nodes connected to } N$
The solve of the above problem has to be iterative, the centrality of a node $N$ depdends on its neighbors, and each neighbors centrality depends on all its neighbors that includes $N$. We’ll embark on presenting above equation as matrix form as it would bvery effective in solving for this iterative problem
Mathematically, we can say the centrality $x_{N}$ of node $N$: $x_N = \frac{1}{\lambda} \sum_{M \in \text{Neighbors}(N)} x_M$
Incorporating the adjacency matrix $A$, where $n$ is the total number of nodes, and $a_{NM}$ is an element of the adjacency matrix indicating the presence or absence of a link between $N$ and $M$. Expressing the centrality using matrix notation for all nodes $x_N = \frac{1}{\lambda} \sum_{M=1}^{n} a_{NM} x_M$
Multiply through by $λ$ and rearrange the equation: $x = \frac{1}{\lambda} Ax$
Final expression in matrix equation form: $\text{Eigenvector Centrality}Ax = \lambda x$
The 5-node graph we are working on, can be represented as below table. When represented as matrix, it is called adjacency matrix $A$ . You may notice that there is a row and a column for each node. For n nodes, it is $n * n$ table. if two nodes are connected, we assign 1 to that cell, if they are not connected we assign 0 to the cell. Simple.
Sia | Ria | Xi | Ivy | Eva | |
---|---|---|---|---|---|
Sia | 0 | 1 | 0 | 1 | 0 |
Ria | 1 | 0 | 1 | 1 | 1 |
Xi | 0 | 1 | 0 | 1 | 0 |
Ivy | 1 | 1 | 1 | 0 | 1 |
Eva | 0 | 1 | 0 | 1 | 0 |
$\text{Adjacency Matrix } A = \begin{bmatrix}0 & 1 & 0 & 1 & 0 \1 & 0 & 1 & 1 & 1 \0 & 1 & 0 & 1 & 0 \1 & 1 & 1 & 0 &1\0 & 1 & 0 & 1 & 0 \\end{bmatrix}$
Now, we know what $A$ is we can solve for centrality vector $x$ in the equation $Ax = \lambda x$ with an initial guess of $x^{(0)} = \begin{bmatrix} 1 \ 1 \ 1 \ 1 \ 1 \end{bmatrix}$
Iteration | Vector $x$ | Norm of $x$ | Normalized $x$ | Approx. $\lambda$ | $\lambda$ Formula |
---|---|---|---|---|---|
Initial | $[1, 1, 1, 1, 1]^T$ | $\sqrt{5}$ | $[1, 1, 1, 1, 1]^T$ | - | - |
1 | $[2, 4, 2, 4, 2]^T$ | $2\sqrt{11}$ | $\left[\frac{1}{\sqrt{11}}, \frac{2}{\sqrt{11}}, \frac{1}{\sqrt{11}}, \frac{2}{\sqrt{11}}, \frac{1}{\sqrt{11}}\right]^T$ | $2\sqrt{11}$ (6.633) | $\lambda \approx \frac{|x^{(1)}|}{|x^{(0)}|} = \frac{2\sqrt{11}}{\sqrt{5}}$ |
Now, we know how to calculate big E (eigenvector centrality), have a look at a graph below to get a feeling of how it compares with big B, big C or big D.
Lightning and Eigenvector centrality: Big E is super sensitive to network changes, and depending how we choose nodes, no zombie nodes, lurkers, nodes that come and go, and nodes with liquidity higher than certain threshold to route paymment reliably, the big E will change a lot.
]]>These questions come to mind when we start thinking about growth of Lightning network.
The goal of this post is to share data, provide some interpretation, and get answers to the above basic questions. In the next iteration of cohort analysis, I’ll pass more coherent and holistic story.
Data: We have LN graph output from an LND node since May-2023. We are analysing graph data pulled on a day of the first week for following months. Here is how the data looks.
#of Nodes is count of unique pub key from the graph output. Active nodes: Nodes with at least one active public channel that has sent at least one update to its peer.
What you need to make a note, looking at the table above:
Methodology:
What to read from the heatmap above:
We’ll look at Amboss, 1ml, Mempool, hashXP data, and then at the graph output from an LND node. The graph output has info on nodes and channels.
Data on 04/03/2024
Hashxp talks about zombie nodes. What exactly is it? Are we okay with the definition of zombie nodes? Or the better approach would be to understand the approach and use and talk about a number based on context.
The output json file has two data components. One is for nodes and another one is for channels. A basic data pull shows:
Description | Count |
---|---|
# of Nodes | 15246 |
# of Channels | 51861 |
The output from my nodes is closest to 1ml and Mempool. We’ll continue slicing and dicing our data to understand it more, and hopefully, it will give us some understanding to make sense of data in other explorers. Before we do that a couple of info to keep in the mind:
The node data has below columns
Code:
first_node = graph_data['nodes'][0] # Get the first item (node)
node_keys = first_node.keys() # Get the keys, which represent the columns
print("Columns in 'nodes' dictionary:", node_keys)
Output:
Columns in ‘nodes’ dictionary: dict_keys([‘last_update’, ‘pub_key’, ‘alias’, ‘addresses’, ‘color’, ‘features’, ‘custom_records’])
Are all pub keys ( node ID) unique here?
Code
print("Count of unique pub_keys:", len({node['pub_key'] for node in graph_data['nodes']}))
Output:
Count of unique pub_keys: 15246
This is the number of nodes that I said earlier, we got from the ‘describegraph’ output from my node. Now, can we trim out zombie nodes – nodes that are ‘dead’.
Lets look at how many nodes have never sent an update of their presence to their peer for broadcasting.
Code:
count_last_update_greater_than_0 = 0
count_last_update_less_or_equal_0 = 0
for node in graph_data['nodes']:
if node['last_update'] > 0:
count_last_update_greater_than_0 += 1
else:
count_last_update_less_or_equal_0 += 1>
print(f"Count of pub_keys with last_update > 0: {count_last_update_greater_than_0}")
print(f"Count of pub_keys with last_update <= 0:{count_last_update_less_or_equal_0}")
Output:
Count of pub_keys with last_update > 0: 9321 Count of pub_keys with last_update <= 0: 5925
~6K Nodes have never broadcasted their presence. They could be zombie nodes. Based on this work, we have 9k nodes to work with. However, hashXP thinks that there are still some zombies hiding in 9k. We’ll need to get to edges (Channels) data to find them out.
The channel data has below columns
Code:
first_edges = graph_data['edges'][0]
edges_keys = first_edges.keys()
print("Columns in 'edges' dictionary:", edges_keys)
Output:
Columns in ‘edges’ dictionary: dict_keys([‘channel_id’, ‘chan_point’, ‘last_update’, ‘node1_pub’, ‘node2_pub’, ‘capacity’, ‘node1_policy’, ‘node2_policy’, ‘custom_records’])
Edges are unique on channel_id. Remember, two nodes may have multiple channels between them. Let’s find out how many channels have never been updated.
Code:
count_last_update_greater_than_0 = 0
count_last_update_less_or_equal_0 = 0
for edges in graph_data['edges']:
if edges['last_update'] > 0:
count_last_update_greater_than_0 += 1
else:
count_last_update_less_or_equal_0 += 1
print(f"Count of channels with last_update > 0: {count_last_update_greater_than_0}")
print(f"Count of channels with last_update <= 0: {count_last_update_less_or_equal_0}")
Output:
Count of channels with last_update > 0: 37166 Count of channels with last_update <= 0: 14695
~15k channels out of ~52K channels that we got querying our node have never been updated. Most likely, they are not usable and are zombies, but let’s continue looking at more variables. We’ll look at capacity/channel size now.
Code:
count_capacity_greater_than_0 = 0
count_capacity_less_or_equal_0 = 0
for edges in graph_data['edges']:
if int(edges['capacity']) > 0:
count_capacity_greater_than_0 += 1
else:
count_capacity_less_or_equal_0 += 1
print(f"Count of channels with capacity > 0: {count_capacity_greater_than_0}")
print(f"Count of channels with capacity <= 0: {count_capacity_less_or_equal_0}")
Output:
Count of channels with capacity > 0: 37915 Count of channels with capacity <= 0: 13946
~14k channels have no capacity. These channels are not usable at all without any doubt. But how do 14k channels with no capacity overlap with 15k channels that have not sent any updates?
Group # | Group based on Last update, Capacity, and node policy data | # of channels |
---|---|---|
1 | last update > 0, capacity > 0, Both policies are good - non-null | 37067 |
2 | last update = 0, capacity = 0, Both are null | 13915 |
3 | last update = 0, capacity > 0, Both are null | 780 |
4 | last update > 0, capacity > 0, One policy is null | 68 |
5 | last update > 0, capacity = 0, Both policies are good - non-null | 28 |
6 | last update > 0, capacity = 0, One policy is null | 3 |
From the above work, it seems not more than 37k to 38k channels are usable. But how many of these usable channels are from so-called ‘zombie’ nodes of node data? Or, is it possible that out of ~9k nodes that have sent an update a good percentage don’t have a usable channel?
‘Source node last update’ or ‘target node last update’ is the last update from node data for the source node or target node. The ‘last update’ comes from channel data, and it is the last update for the channel. Capacity and policy fields are of course channel-specific data.
We got a unique pub key count. it is unique only for a group. Considering some fields of the group are made up of channel data. ‘Unique pub key count’ counts unique pub key for both source or target pub key.
Group Number | Group | Count | Unique Pub Key Count |
---|---|---|---|
1 | source node last update > 0, target node last update > 0, last update > 0, capacity > 0, Both policies are non-null | 36999 | 7899 |
2 | source node last update <= 0, target node last update <= 0, last update <= 0, capacity <= 0, Both policies are null | 4380 | 3244 |
3 | source node last update <= 0, target node last update > 0, last update <= 0, capacity <= 0, Both policies are null | 3795 | 2953 |
4 | source node last update > 0, target node last update <= 0, last update <= 0, capacity <= 0, Both policies are null | 3357 | 2668 |
5 | source node last update > 0, target node last update > 0, last update <= 0, capacity <= 0, Both policies are null | 2383 | 1340 |
6 | source node last update <= 0, target node last update > 0, last update <= 0, capacity > 0, Both policies are null | 271 | 343 |
7 | source node last update > 0, target node last update > 0, last update <= 0, capacity > 0, Both policies are null | 199 | 262 |
8 | source node last update <= 0, target node last update <= 0, last update <= 0, capacity > 0, Both policies are null | 163 | 266 |
9 | source node last update > 0, target node last update <= 0, last update <= 0, capacity > 0, Both policies are null | 147 | 218 |
10 | source node last update > 0, target node last update > 0, last update > 0, capacity > 0, One policy is null | 49 | 86 |
11 | source node last update > 0, target node last update <= 0, last update > 0, capacity > 0, Both policies are non-null | 46 | 61 |
12 | source node last update > 0, target node last update > 0, last update > 0, capacity <= 0, Both policies are non-null | 28 | 28 |
13 | source node last update <= 0, target node last update > 0, last update > 0, capacity > 0, Both policies are non-null | 22 | 31 |
14 | source node last update > 0, target node last update <= 0, last update > 0, capacity > 0, One policy is null | 13 | 22 |
15 | source node last update <= 0, target node last update > 0, last update > 0, capacity > 0, One policy is null | 6 | 11 |
16 | source node last update > 0, target node last update > 0, last update > 0, capacity <= 0, One policy is null | 2 | 4 |
17 | source node last update > 0, target node last update <= 0, last update > 0, capacity <= 0, One policy is null | 1 | 2 |
From the above table, the 7.8k node looks good to go. We can consider them active nodes. Make a note of group 5. These nodes have updates on nodes, but we have 2.3k channels from them that have no update.
Based on our work, we can say 7.8K nodes are active. One of the explorer hashXP says only 6k nodes are active, we are still 1.8k ahead. Remember, our filtering criterion is last update > 0, so a node or a channel that has no update in the last year is counted as active. Does it make sense to do it? We will discuss it in next post.
Target Customers: Is the business serving bitcoin native folks or no coiners. Even with similar product offerings, depending on who they are serving business behave differntly. Think of Coinbase and River.
Market Focus: An important dimension, as serving Retail and Institutions require different strategy and mindset. Here, we don’t need to think of only product-market fit, but founder-market fit too.
Usage of Bitcoin: Transactinal or Investment. Sadly still most of the companies that are making money in the sapce are focussed on Bitcoin as an ‘investment’/asset. Think of all the big names. Comanies that are building infrastructure to take bitcoin technology at par and beyond with current infrastructure for usage is not making money and are reliant on funding.
Existing use case or Novel use case: For companies, that are building for the transactional use case of bitcoin, good novel use-case would be where a digital money has inherent advantage - AI agent payment, remittance, Micropayments. Even for companies, that are working on ‘investment’ use case, it is possible to be novel. Think of decentralized lending.
User experience: Complex to Simple. Most of the bitcoin business are like for the developers, by the the developers. The current user expereince is just an artifact of the fact that transactional use case has not gone mainstream. Companies like Lightspark are working towards bringing the UX as par with current infra. Companies that are working on ‘investment’ use case in some case, arguably has went past the current status quo in terms of experience.
Ecosystem Integration: Isolated to Integrated. This dimension is important one because it forces us to reconcile the company’s product roadmap with the the developement of bitcoin ecosystem. A prodcut roadmap that is not in line with ecosystem growth is less likely to survive. This is particulay challenging becasue of decentralized nature of developemnt of the ecosystem.
Regulatory alignment: Compliance focussed to innovation focussed
First, I’ll lay out the framework, and then we’ll go deeper into why it is useful, even though it is simple. This lens to look at companies works for startup founders, employees & VCs.
It is an important dimension because there is no other lens to look at a Bitcoin business that tells us more about its users, and their needs and behavior. A Bitcoiner is a different species from no coiners, and they are similar in thousands of ways.
Why does segmenting based on who they are targeting make sense?
Target customers define how the company communicates, behaves, and builds. Just look at the Twitter feed of a Bitcoin company, and you will know who they are talking to. The language and memes that work for Bitcoiners are alien to far-right no-coiners. Companies that are targetting ‘no coiners’ customers, consumers, or businesses, to the chagrin of Bitcoiners talk in a ‘new’ language. For example, compare the communication of LightSpark and others in the same category.
Now, this lens works for VC and employees too. It gives a pulse on the clarity of thought of the founders and company. If a company’s product is more in line with no coiners, but they are talking in bitcoin meme, more likely than not, they won’t be able to run a successful business. Likewise, If a company’s product is geared towards hard-core bitcoiners, and they talk in a normie language, it is not a good sign.
For startup founders, this lens provides useful insight into what they should build. For someone with a decade of experience in the Web 2 world, where they have hardly interacted with Bitcoiners, it is not wise to build a ‘Bitcoiner’ only product. The wisdom is you build for yourself and a user base that you understand.
Lastly, it is even better to consider this dimension a continuous variable, rather than a categorical one. As we all know, not all Bitcoiners are the same, and not all no-coiners are the same. And, we all are at different stages in terms of understanding Bitcoin.
As the tech wisdom goes, to beat an incumbent you need to come up with 10x better product. Considering bitcoin is a novel technology, we can argue any company that uses bitcoin is bringing novelty to the product.
But, for the sake of this dimension, merely using a new technology is not a novelty. The tech has to manifest into a more usable & valuable product.
For the remittance use case, it would be faster, cheaper, and easy to use. The sovereign money feature would be considered a 10x addition, only for products and the user base who value it. For a product category (e.g. online learning marketplaces like Udemy) where censorship-resistant and decentralization are not core needs of users (top 10 needs), launching a product and adding an LN payment layer is not a novelty. However, in some cases, it is still okay, when the company is targeting hard-core bitcoin native users. With every cycle, the base is growing, so there is opportunity for companies to grow, if not exponentially.