Ethereum is a technology that lets you send cryptocurrency to anyone for a small fee. It also powers applications that everyone can use and no one can take down. It's the world's programmable blockchain and a marketplace of financial services, games and apps that can't steal your data or censor you.
Ethereum also has its own cryptocurrency, namely 'ETHER' or 'ETH' for short. Check out more about Ethereum and ETH on https://ethereum.org/.
This dataset is obtained from the Beacon Scan block explorer, where it captured and characterized a validator's "journey" as a validator joins the Ethereum 2.0 Medalla testnet.
To participate as a validator, Ethereum 1.0 holders will transfer 32 ETH into a deposit contract that creates an equivalent 32 bETH credit on Ethereum 2.0’s Beacon Chain. This places the validator into an activation queue. Before blockchain activation there is an eligibility period where the queued validator must wait until the first epoch it is eligible to be activated. At any point after the eligibility epoch has passed, the validator may complete the setup of the beacon chain client and join the network. Once online, the validator’s activation epoch is logged and it may begin being assigned to propose blocks or participate in block attestations. For the validators who can no longer commit to their responsibilities, after a set duration of time, it is possible to exit the network. Beacon clients that exit have a time stamp logged of the epoch their client was disabled and when their funds are withdrawn.
Let's load the data and look at the first few rows. We will use the pandas and mathplotlib libraries. Also, we can use the head() function to show the first five rows.
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
valid_raw = pd.read_csv("validator_data.csv",encoding = "ISO-8859-1")
#Note that we need the encoding parameter here as the csv file has some special character: '|'
valid_raw.head()
We can use dtypes to see the data types of each column and shape to see the size of the data.
valid_raw.dtypes
valid_raw.shape
We can see that we have 80392 rows and 11 columns.
The 11 variables are:
Unnamed: 0
- The row index of the validator.publickey
- The public key identifying the validator.index
- The index number of the validator.currentBalance
- The current balance, in ETH, of the validator.effectiveBalance
- The effective balance, in ETH, of the validator.proposed
- The number of blocks assigned, executed, and skipped by the validator.eligibilityEpoch
- The epoch number that the validator became eligible.activationEpoch
- The epoch number that the validator activated.exitEpoch
- The epoch number that the validator exited.withEpoch
- Epoch when the validator is eligible to withdraw their funds. This field is not applicable if the validator has not exited.slashed
- Whether the given validator has been slashed.To be suitable for analysis, this data required some minor data manipulation of its fields. Here is what we do:
#1
proposed = valid_raw['proposed'].str.split("|",n =2, expand=True)
#2
valid = pd.concat([valid_raw, proposed], axis=1)
#3
valid.rename(columns={'Unnamed: 0': 'rowindex', 0:'assigned', 1:'executed', 2:'skipped'}, inplace=True)
#4
valid = valid.drop(columns=['proposed','eligibilityEpoch','withEpoch'])
#5
valid = valid.replace(['genesis'],0)
valid = valid.replace(['--'],0)
#6
valid['currentBalance'] = valid['currentBalance'].str.split(" ").str[0].astype(float)
valid['effectiveBalance'] = valid['effectiveBalance'].str.split(" ").str[0].astype(int)
#7
valid.iloc[:,[5,6,8,9,10]] = valid.iloc[:,[5,6,8,9,10]] .astype(int)
Let's take a look at the data after all the manipulation we did. sample(n) is a great way to see n random rows of the data.
valid.sample(5)
We can use describe() to create a simple summary statistics table.
valid.describe()
valid.dtypes
After the manipulation, the data has 12 columns (1 character, 1 logical, 10 numeric)
Activation is the first step towards compliance for any node attempting to join the active validator set. Let's find out how many validators are activated over time. We can simply count the number of each 'activationEpoch' group using groupby() and count() .
num_activated = valid.groupby('activationEpoch')['activationEpoch'].count().reset_index(name="count")
# Note that reset_index() is used to make 'activationEpoch' a column and for renaming purposes
num_activated
We can plot a simple line plot using plot() to see how the number changes over time. We will exclude activationEpoch 0 for the purpose of getting a better graph.
num_activated[1:]['count'].plot()
plt.title('Activated Validators over Time')
plt.xlabel('Activation Epoch')
plt.ylabel('Number of Activated Validators')
plt.show()
From the graph, we see that 4 is a very common number of activated validators over time.
Let's take a look at the time series of its cumulative number over time by using the cumsum() and the plot() functions.
num_activated['count'].cumsum().plot()
plt.title('Activated Validators over Time')
plt.xlabel('Activation Epoch')
plt.ylabel('Cumulative Number of Activated Validators')
plt.show()
A close inspection of the graph reveals two visible anomalies; one between Epoch 3238 and 3440, and the other, between 14189 and 14311. In both instances no new validators were activated on the blockchain for over 150 epochs which suggests there was some fault in the network’s activation functionality.
For validators attempting to leave the network, there is a mandatory lock-in period that is enforced. It is only after this time frame is a staker are allowed to withdraw their funds and leave the network. This process is a two step procedure where the node client software is first shut down and the bETH is withdrawn from the network. Let's find the distribution of the time from activation to exit. First, we create a temporary variable: 'timetoExit' using the formula:
$$ \frac{6.4}{60} * (\text{exitEpoch} - \text{activationEpoch}) $$, where the $6.4$ is the conversion from Epoch to minute and $60$ is the conversion from minute to hour. Then, we can plot a histogram using plot.hist() .
nonzero_exitEpoch = valid[valid['exitEpoch']!=0]
nonzero_exitEpoch['timeToExit'] = (nonzero_exitEpoch['exitEpoch']-nonzero_exitEpoch['activationEpoch'])*6.4/60
nonzero_exitEpoch['timeToExit'].plot.hist(bins=20)
plt.title('Distribution of Time to Exit for Validators')
plt.xlabel('Time to Exit (Hours)')
plt.show()
The histogram looks great! We can dive in to the table to see exactly how many validators exited as well as other summary statistics.
nonzero_exitEpoch.describe()['timeToExit']
We see that a total of 6304 validators have exited with an average of 363 hours or around 15.1 days. Also, the fastest exit time is actually less than 1 hour.
We can better observe these trends over time with both a traditional time series plot and a cumulative count graph that tracks the number of validators exiting throughout the epochs.
count = nonzero_exitEpoch.groupby('exitEpoch')['exitEpoch'].count().reset_index(name="count")
cumsum = nonzero_exitEpoch.groupby('exitEpoch')['exitEpoch'].count().cumsum().reset_index(name="cumulative")
exitEpoch_table = pd.concat([count,cumsum.iloc[:,1]], axis=1)
exitEpoch_table
exitEpoch_table.describe()['count']
nonzero_exitEpoch.groupby('exitEpoch')['exitEpoch'].count().plot()
nonzero_exitEpoch.groupby('exitEpoch')['exitEpoch'].count().cumsum().plot()
To start, we will look at the distribution of the numbers of blocks 'assigned', 'executed' and 'skipped.' One way to do it is simply create a plot for each column, which we already know how to do. For instance, we can do the following:
valid.groupby('assigned')['assigned'].count().plot.bar()
Let's do a different way and plot all three variables using apply() and the parameter subplots = True
valid[['assigned','executed', 'skipped']].apply(pd.Series.value_counts).plot.bar(subplots=True,title=['','',''])
#Note that title =['','',''] is simply to remove all three of the subtitles
plt.suptitle('Distribution of Assigned, Executed, and Skipped Blocks')
plt.show()
From the barcharts, we observe that each validator status is distributed exponentially where most nodes have not had any assignments, executed blocks or skipped assignments. Globally however, the average validator has been assigned to 6 slots, has successfully proposed 4 blocks and has missed 2 slot assignments.
A quick exercise for you: How would you get the averages mentioned in one shot? Take a few minutes to try it out!
Here is one way:
valid[['assigned','executed', 'skipped']].mean()
By treating the executions and assignments skipped as proportions, we can visualize the distributions of both execution success and skipped slot as rates. To do so we define each variable by taking the number of executed or skipped blocks, and dividing them by the total number of assigned blocks.
valid['Execution Rate'] = valid['executed']/valid['assigned']
valid['Execution Rate'].plot.hist(color='#EA5600')
plt.title('Distribution of Execution Rate for Validators')
plt.show()
valid['Skipped Rate'] = valid['skipped']/valid['assigned']
valid['Skipped Rate'].plot.hist(color='#776DB8')
plt.title('Distribution of Skipped Rate for Validators')
plt.show()
Surprisingly, the rate of successful block executions and the proportion of skipped slots appear to follow reflected Beta distributions where most of the probability mass rests at the edges of the support range. Most nodes have had only success executing on their block proposals; however a significant portion of the validators have not had any success. Likewise, most validators have not skipped any slot assignments, but a substantial portion of them have skipped all of their block proposals. This result suggests that there will likely be a clear demarcation between the behaviors of certain validators on the network.
Slashing on the Ethereum 2.0 network is the act of punishing validators for violations of the consensus rules by either improperly proposing a block or failing to properly attest to a block while in an assigned committee. To better understand the slashing behavior within our dataset, we will investigate the number of slashed validators over exitEpoch. To do so, we simply filter the data based on 'slashed' being TRUE and group_by exitEpoch. Then, we use the cumsum() function just like before.
slashed = valid[valid['slashed']== True]
slashed.groupby('exitEpoch')['slashed'].count().cumsum().plot()
plt.title('Number of Slashed Validators over Time')
plt.xlabel('Exit Epoch')
plt.ylabel('Cumulative Number of Slashed Validators')
plt.show()
From the graph, we notice that between epochs 2000 and 4000, the slashed validators rose from 0 to 5000. Since epoch 4000, the growth has been much slower, barely creeping up towards 5500 through epoch 15000. The spike in slashings during epochs 2000 and 4000 correspond directly with the large exodus of validators that we observed previously. When punished with a slashing, a portion of the validators stake is removed. If the effective balance of the validator drops too low, it could be subject to removal from the network.
From our macro analysis, we’ve shown that analytical techniques applied to Medalla’s testnet data can help us develop a foundational understanding of the network. Our tracking of validator activations, execution rates, and exit patterns, among other metrics, cast the first form an picture of network health as a whole that we can then recast and project onto individual validators. Our next section will further develop this idea as we focus specifically on understanding the actions of Ethereum 2.0’s stakers.
It is our goal at this junction to develop a categorization method that can codify patterns in validator behavior, characterize it and discern the difference between constructive and destructive network actions. To facilitate the discovery of these behavioral patterns and fulfill the aforementioned objectives, we will employ a weighted linear rank scoring algorithm. This is a simple, yet powerful sorting technique that maps a validator’s characteristics on to a single ranked score that can be compared.
As inputs into the scoring function we’ll use the current balance, number of successful executions, the active status of the validator, how long the node has been active, the number of skipped assignments and a binary indicator for whether the node has been slashed.
OUt of the 6 variables, 'currentBalance', 'executed', 'skipped', and 'slashed' are readily available in the dataset. We can use 'exitEpoch' to get the active status of the validator, and lastly, how long the node has been active is the same as the 'timetoExit' variable we computed in previous slide, where we use this formula: $$\frac{6.4}{60}*(\text{exitEpoch} - \text{activationEpoch}).$$
The polarities of each of these variables are unambiguous. Of the six, the only variables that indicate negative behavior are the number of skipped slots and whether the validator has been slashed. To account for this we set negative weightings on those two variables, while allowing the others maintain their positive polarity.
Here is what we need to do in order to get all the variables ready in one place:
Note that the last two steps are done to account for negative behavior as mentioned.
valid_stats = valid
#1
valid_stats['active'] = valid_stats['exitEpoch']==0
#2
valid_stats['exitEpoch'] = valid_stats['exitEpoch'].replace([0],15579)
#3
valid_stats['active_time'] = (valid_stats['exitEpoch']-valid_stats['activationEpoch'])*6.4/60
#4
valid_stats.rename(columns={'executed':'executions'}, inplace=True)
#5
valid_stats['slashed'] = ~valid_stats['slashed']
#6
valid_stats['skips'] = -1*valid_stats['skipped']
valid_stats = valid_stats[['publickey','index','currentBalance','executions','skips','slashed','active','active_time']]
Now that we have all the variables we need, we can now try to implement the algorithm. The following explains briefly on what we need to do.
Let $x_1 =$ currentBalance, $x_2 =$ executions, $x_3 =$ skips, $x_4 =$ slashed, $x_5 =$ active, and $x_6 =$ active_time. For any specific validator, the ordered rankings of its respective values can be represented as $r_i$. We use weights, $w_i$, to correspond to emphasis placed on variable $x_i$ in the scoring function $S$. The weight vector satisfies the following constraint: $w_1+w_2+w_3+...+w_6 = 1$. The score, S, is then computed as the scalar product of the ranks and weights.
$$S =\sum_{i=1}^6 w_ir_i $$Now that we have a general idea, let's code!
Also, note that we divided 'active_time' by 4 to reduce the weighting as we do not want to punish new validators too much.
#1 Combine those ranks with 'publickey' and 'index' columns
valid_rank = pd.concat([valid[['publickey','index']],
valid_stats[['currentBalance','executions','skips','slashed','active','active_time']].rank()],
axis=1)
# Lower the weight of 'active_time'
valid_rank['active_time']=valid_rank['active_time']/4
#2
valid_rank['Score'] = valid_rank.iloc[:, 2:].sum(axis=1)
#3
valid_rank['Rank'] = (-valid_rank['Score']).rank()
Now that we have their 'Score' and 'Rank', we will put add the them back to the 'valid_stats' dataframe. To do so, we simply use merge() . Also, we revert 'slashed' and 'skips' to its original value before the negative weighting.
valid_all = pd.merge(valid_stats,valid_rank[['index','Score','Rank']],left_on='index', right_on='index')
valid_all['slashed'] = ~valid_all['slashed']
valid_all['skips'] = -1*valid_all['skips']
#Sort by 'Rank'
valid_all = valid_all.sort_values('Rank')
valid_all
To better highlight the rate of change of the 'Score', we can plot it against the 'Rank'. We will simply use scatter() and use the 'Score' as as the color parameter.
plt.scatter(x=valid_all['Rank'],y=valid_all['Score'], c =valid_all['Score'])
plt.title('Scores vs Rank')
plt.xlabel('Rank')
plt.ylabel('Scores')
plt.show()
While the validator score curve does show there is differentiation between the scores, it fails to give any indication of clear heterogeneity within the node’s behaviors. Let's plot the distribution of the 'Score'.
plt.hist(data=valid_all, x= 'Score', bins=32)
plt.title('Distribution of Validator Performance Scores')
plt.xlabel('Scores')
plt.show()
The histogram of the score values is multimodal, which is the first encouraging sign that our scoring function has successfully captured and encoded a significant portion of the variance in validator behavior.
As with many unsupervised tasks, the transition from scores to a finite segmentation is often tricky, particularly when there is no well established subject matter context for the selection of cut-offs, nor one agreed upon cluster validation method in the literature to appeal to. With a mixture of investigation, intuition and mathematical hand waving, we settled on the selection of seven score tiers to differentiate network behavior as follows.
$$\begin{align} \text{Rank} & & \text{Tier} \\ [1,2489] & \ \ \ \ \ \ \ \ \ \longrightarrow & 1 \\ (2489,6942] & \ \ \ \ \ \ \ \ \ \longrightarrow & 2 \\ (6942,38396] & \ \ \ \ \ \ \ \ \ \longrightarrow & 3 \\ (38396,56534] & \ \ \ \ \ \ \ \ \ \longrightarrow & 4 \\ (56534,67877] & \ \ \ \ \ \ \ \ \ \longrightarrow & 5 \\ (67877,75644] & \ \ \ \ \ \ \ \ \ \longrightarrow & 6 \\ (75644,\infty) & \ \ \ \ \ \ \ \ \ \longrightarrow & 7 \end{align}$$We can easily do this using np.select() .
conditions = [
(valid_all['Rank'] <= 2489),
(valid_all['Rank'] > 2489) & (valid_all['Rank'] <= 6942),
(valid_all['Rank'] > 6942) & (valid_all['Rank'] <= 38396),
(valid_all['Rank'] > 38396) & (valid_all['Rank'] <= 56534),
(valid_all['Rank'] > 56534) & (valid_all['Rank'] <= 67877),
(valid_all['Rank'] > 67877) & (valid_all['Rank'] <= 75644),
(valid_all['Rank'] > 75644)
]
values = [1,2,3,4,5,6,7]
import numpy as np
valid_all['Tier'] = np.select(conditions, values)
Let's recreate the Sorted Validator Scores plot. However, this time make sure that the color will show different tiers. Note that we used invert_xaxis() to reverse the order so we have it goes from lower tier to higher tier.
fig, ax = plt.subplots()
scatter = ax.scatter(x="Rank", y="Score",data=valid_all, c="Tier", cmap = 'rainbow', label="Tier")
legend = ax.legend(*scatter.legend_elements(), loc="lower right", title="Tier")
plt.title('Scores vs Rank colored by Tier')
plt.xlabel('Rank')
plt.ylabel('Scores')
ax.add_artist(legend)
ax.invert_xaxis()
plt.show()
With these cut-off ranges, we can apply them to our histogram of scores to create a stacked distribution and then reapply those same rules to partition the histogram along its tiers. Here, we can use plotly.express . Using their <color_discrete_map </b> parameter, we can defind the colors for each Tier.
import plotly.express as px
px.histogram(valid_all, x="Score",
color = 'Tier',
color_discrete_map = {1:'purple',2:'blue',3:'lightblue',4:'green', 5:'yellow', 6:'orange', 7:'red'},
nbins=32
)
We can also separate the histograms by Tier using facet_col .
px.histogram(valid_all, x="Score",
color = 'Tier',
facet_col = 'Tier',
color_discrete_map = {1:'purple',2:'blue',3:'lightblue',4:'green', 5:'yellow', 6:'orange', 7:'red'},
nbins=30
)
An investigation into validator performance can now begin on the tier level as a we compare how they each interact with the network. To categorize the behaviors of the tiers succinctly, we can look at the mean of the $6$ variables used for our 'Score' by Tier.
valid_all.groupby('Tier').mean().iloc[:,:-1]
We can also add color to each column to show how it changes across different Tiers like a heatmap. Here, we can use seaborn.light_palette .
cm = sns.light_palette("green", as_cmap=True)
Tier = valid_all.iloc[:,2:].groupby('Tier').mean().iloc[:,:-1]
Tier['Count'] = valid_all.groupby('Tier')['Tier'].count()
Tier.style.background_gradient(cmap=cm)
Tier 1 (Ranks 1-2489): Validators in this set can consider themselves “Proper Proposers” since they are the only nodes with a perfect track record of no skipped slots and no slashings. They often have the highest number of successful blocks to go along with their longer than average active time on the network.
Tier 2 (Ranks 2490 – 6942): Second tier validators are typically successful in executing their duties on behalf of the network, though with a slightly lower number of successful blocks and a few skipped slot assignments littered around. Overall this group is still healthy.
Tier 3 (Ranks 6943 – 38396): While validators in this tier are still healthy overall, they do have more skipped blocks and slightly fewer successful block proposals. This group has a lower average active time than tiers 1 and 2. It is in this tier do we see the first set of exited validators appear.
Tier 4 (Ranks 38397 – 56534): This is the tier where validators with more serious performance issues begin to exist more prevalently. Those the majority are active and have not been slashe there, are some who have. This tier is unique because it also houses many of the newer validator nodes who are just now trying to move up the ranks. Many have not even had their first assignment.
Tier 5 (Ranks 56535– 67877): - Tier 5 is the first of the truly unhealthy groups where the the ratio of skipped blocks to successful proposals is skewed negatively towards missed assignment. In this tier, more validators have experienced a slashing and the number of inactive nodes continues to increase.
Tier 6 (Ranks 67878 – 75644): Validators in this tier have skipped more block assignments than they successfully proposed. These nodes are in danger of being removed from the network due to current balances below the 32 ETH threshold.
Tier 7 (Ranks 75645 – 80392): The vast majority of validators in this bottom tier are all inactive and have had their proposals slashed at least once. There are also a few that left due to an insufficient balance as a result of a disproportionate number of skipped blocks. This group has the lowest current balance
Our tiers all possess distinct behavioral characteristics useful for discriminating between them; however, there is also a deeper level of heterogeneity that exists within the tiers themselves. This result can be found by applying a dimension reduction technique and plotting the component scores against one another. We use sklearn.decomposition.PCA to perform the principal component analysis on the $6$ variables used in scoring.
from sklearn.decomposition import PCA
pca = PCA()
principalComponents = pca.fit_transform(valid_all[['currentBalance', 'executions', 'skips', 'slashed', 'active', 'active_time']])
Then, we can create a dataframe that contains the principal components and the columns 'Tier', 'Rank' and 'index'.
principalDf = pd.DataFrame(data = principalComponents)
finalDf = pd.concat([valid_all[['Tier', 'Rank', 'index']].reset_index(drop=True),principalDf], axis = 1)
finalDf = finalDf.rename(columns={0:"PC1",1:"PC2", 2:"PC3"})
finalDf['Tier']=finalDf['Tier'].astype("category")
Then, we will use ggplot from R to plot a countour map of the principal component colored by Tier.
from plotnine import ggplot, aes, geom_density_2d
(ggplot(finalDf, aes(x = "PC3", y = "PC1", group = "Tier", color = "Tier", label = "index")) +
geom_density_2d() )
After some investigation, we managed to get one representation of the score "surface" when labeled by tier. Across the landscape, the scores within most tiers coalesce around one another, forming localized regions. Though this is true for some groups, Tiers 4 through 7 all have multiple regions where validator scores rest. This is an indication that there is further behavior to be distinguished between the nodes within the same segment.
Here, we show the representative sample of the most common validator profiles within each tier.
sample = valid_all.loc[valid_all['Rank'].isin([55821, 76695, 77761, 33530, 66925, 67885,
14959, 4979, 304, 19820, 55757, 66918, 72185])]
sample['index'] = sample['index'].astype('category')
cm = sns.light_palette("green", as_cmap=True)
sample.style.background_gradient(cmap=cm)
Starting with Tier 1, we have our Perfect Proposers, validators who exhibit the best combination of behaviors we track. These validators have the highest number of successful blocks to go along with their longer than average active time on the network.
Next we have the Tier 2 validators. Though they have similar characteristics as the Tier 1, particularly perfect proposal rates, they have been active for a slightly shorter time and have proposed fewer blocks.
In Tier 3, we see the trend continue where they've been on the network a shorter and shorter period of time.
Tier 4 is where poor performance begins to manifest itself in actual skipped blocks. We can see new validators and those who have skipped one or more block proposals in this Tier.
Then, In Tier 5 and Tier 6, we start to see more and more skipped blocks and lower current balance. In particular, validators in Tier 6 have skipped more block assignments than they have successfully proposed, yet have managed to be stay on the network for the longest period of time.
Lastly, we see new validators who have been slashed, regardless whether they have skipped a block in Tier 7.
Our key takeaway from this analysis is that, when performing our ranking procedure across the nearly dozen of variables and over 80,000 validators, true "tiers" of validators do in fact exist. At the top of the list, Tier 1 validators execute 100% of their assignments, maintain a high effective balance, have not been slashed and have been active from very early on. On the bottom end are validators who were slashed, and failed to execute their assignments. Distributionally, as expected, most validators fall somewhere in between these two extremes. It will be quite interesting to see how the scores that make up the backbone of the tier-based ranking system evolve as time goes on.
Among other interesting findings, we found:
With only a couple months of data, we expect that these findings will continue to evolve, and as such, the tiers defining relative performance of validators will continue to need adjustment over time.