Ethereum is a technology that lets you send cryptocurrency to anyone for a small fee. It also powers applications that everyone can use and no one can take down. It's the world's programmable blockchain and a marketplace of financial services, games and apps that can't steal your data or censor you.
Ethereum also has its own cryptocurrency, namely 'ETHER' or 'ETH' for short. Check out more about Ethereum and ETH on https://ethereum.org/.
This dataset is obtained from the Beacon Scan block explorer, where it provides the information on 1751 slashed validators. Being slashed means that a significant part of the validator’s stake is removed: up to the whole stake of 32 ETH in the worst case. Validator software and staking providers will have built-in protection against getting slashed accidentally. Slashing should only affect validators who misbehave deliberately. For more info, please visit https://codefi.consensys.net/blog/rewards-and-penalties-on-ethereum-20-phase-0.
Ethereum 2.0’s consensus mechanism has a couple of rules that are designed to prevent attacks on the network. Any validator found to have broken these rules will be slashed and ejected from the network. According to a blog post on Codefi, there are three ways a validator can gain the slashed condition:
Let's load the data and look at the first few rows. We will use the pandas and mathplotlib libraries. Also, we can use the head() function to show the first five rows.
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import matplotlib.pyplot as plt
df_slashed = pd.read_csv('slot-slashed.csv')
df_slashed.head()
Let's rename that 'Unnamed: 0' to 'rowindex' using rename() and we use the parameter inplace=True to save the changes.
df_slashed.rename(columns={'Unnamed: 0': 'rowindex'}, inplace=True)
The 7 variables are:
rowindex
- The row index of the validator.epoch
- The epoch number that the validator was slashed.slot
- The slot number that the validator was slashed.age
- The amount of time passed since the validator was slashed.validatorSlashed
- The index of the validator who was slashed.slashedBy
- The index of the validator who did the slashing.reason
- The reason why the validator was slashed.We can use dtypes to see the data types of each column and shape to see the size of the data.
df_slashed.dtypes
df_slashed.shape
We can see that we have 1751 rows and 7 columns (2 character and 5 numeric).
We begin our analysis with some high level statistics. Let's summarize the data using the describe() function. This command produces a simple summary table of centrality and spread statistics of our collected features.
df_slashed.describe(include='all')
Let's get the numbers of unique values in each column using unique() .
df_slashed.nunique()
Thus, we have 1647 unique validators that were slashed and they were slashed by 771 unique validators for only 2 reasons.
Let's find out how many validators are slashed over time. We can count the number of each 'epoch' group using groupby() and count() . Then, plot it using plot() .
df_slashed.groupby('epoch').count()['rowindex'].plot()
plt.title('Number of slashed over epoch')
plt.show()
To better assess the impact of these spikes in slashings, we produced a cumulative count plot that tracks the total number of slashings across epochs. We can do this very easily by simply adding a temporary column using mutate() and cumsum() .
df_slashed.groupby('epoch').count()['rowindex'].cumsum().plot()
plt.title('Number of slashed over epoch')
plt.show()
The first large spike in slashings occurs around epoch 3000 and another smaller spike in slashing around epoch 12500. Despite the fact that these jumps are significant, when focusing on the rate of change of slashings, the number of offensive rule violations are quite stable the majority of the time. Globally, the rate of slashing is approximately 117 slashes per 1000 epochs. When we exclude the spikes, the rate of change is approximately 63 slashes per 1000 epochs.
Next, we can investigate how often a slash occurs. We need to take a few steps:
df_slashed['epochelapsed'] = df_slashed['epoch']-df_slashed['epoch'].shift(-1)
Epochelapsed = df_slashed[df_slashed['epochelapsed'] > 0]
Epochelapsed['epochelapsed'].describe()
From the summary, we can see that approximately 40 epochs elapse between slashings on average, excluding slashings that occur in the same epoch. Let's plot the histogram so we can investigate further.
Epochelapsed.hist('epochelapsed', bins=30, color='lightblue')
plt.title('A Distribution of the Number of Epochs lapsed between Slashings')
plt.xlabel('Epoch Elapsed')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
We can see that it is very common that less than 25 epochs will elapse between slashings. In fact, about 41% of the time only 1 epoch without a slashing will occur between two epochs with at least 1 slashing. The longest period without a slashing lasted 900 epochs, which is 93 hours.]
We can find out more about the epoch elapsed between slashings.
Epochelapsed.groupby('epochelapsed').count()/Epochelapsed.shape[0]*100
Epochelapsed.plot('epoch','epochelapsed', color='red')
plt.title('Time series of Epochs lapsed between Slashings')
plt.xlabel('Epoch')
plt.ylabel('Epoch Elapsed')
plt.show()
Of the three ways a validator can violate consensus rules, there are only two such categories of offenses: attestation rule and proposer rule violations. Let's create a histogram that shows the percentage of each reason why a validator was slashed.
One way to do it is similar to what we have been doing. we use groupby and count() to get the data. Here, we can use plot.bar() as we want to plot a bar chart as it is.
df_slashed.groupby('reason').count()['rowindex'].plot.bar(rot=0)
plt.title('Number of slashes per reason')
plt.xlabel('Reason')
plt.ylabel('Count')
total = df_slashed.shape[0]
plt.show()
We can also add the percentages of each reason using the following codes.
ax=df_slashed.groupby('reason').count()['rowindex'].plot.bar(rot=0)
plt.title('Number of slashes per reason')
plt.xlabel('Reason')
plt.ylabel('Count')
total = df_slashed.shape[0]
for p in ax.patches:
#format to percent
percentage = '{:.1f}%'.format(100 * p.get_height()/total)
# make the (x,y) position of the percentage
x = p.get_x() + p.get_width()/3
y = p.get_y() + p.get_height()+1
# make the annotation appear
ax.annotate(percentage, (x, y))
plt.show()
The distribution is skewed heavily towards attestation rule violations as they encompass nearly 97% of justifications for slashes in our data. The remaining 3% of slashes can be attributed to proposer rule offenses.
Interestingly, this distribution has not been constant over time. Let's find out exactly how it changes over time. To do so, we can do the following:
num_slashed_over_epoch_reason = df_slashed.groupby(['epoch','reason']).count()['rowindex'].unstack().fillna(0)
num_slashed_over_epoch_reason
Now that we have the table, we can easily produce a time series of showing the numbers of the reason over time. We can also use
num_slashed_over_epoch_reason.plot()
plt.title('Reason for slashes')
plt.show()
The plot is great but we can barely see the graph for 'Poposer rule offense'. Let's fix that by the parameter subplots=True to separate the plots into two.
num_slashed_over_epoch_reason.plot(subplots=True)
plt.suptitle('Reason for slashes')
plt.show()
Despite the proposer rule offenses being rare throughout all epochs it was, interestingly enough, the very first offense committed by a validator on the network. Overtime proposer violations have becoming more frequent as shown in the subsequent time series graphs.
Create a time series of the cumulative number of the reason.
cumul_num_slashed_over_epoch_reason = df_slashed.groupby(['epoch','reason']).count()['rowindex'].unstack().fillna(0).cumsum()
cumul_num_slashed_over_epoch_reason.plot(subplots=True)
plt.suptitle('Reason for slashes')
plt.show()
Let's turn our attention to the distribution of number of slashings received and the number of slashings performed.
df_slashed.groupby('slashedBy').count()['rowindex'].hist()
plt.show()
df_slashed.groupby('validatorSlashed').count()['rowindex'].hist(bins=7)
plt.show()
We can see that, of the validators that have slashed, most have only done slashings once or twice. Similarly, most validators, who have been slashed, have received only one or two lashings, and only a handful of them have been slashed more than 2 times.
To learn more about the validators, we will bring in a second dataset that is also obtained from the Beacon Scan block explorer. To learn more about the data set, you can read our case study on it as well. In short, this data set provides the following information on any given validator:
X
- The row index of the validator.publickey
- The public key identifying the validator.index
- The index number of the validator.currentBalance
- The current balance, in ETH, of the validator.effectiveBalance
- The effective balance, in ETH, of the validator.proposed
- The number of blocks assigned, executed, and skipped by the validator.eligibilityEpoch
- The epoch number that the validator became eligible.activationEpoch
- The epoch number that the validator activated.exitEpoch
- The epoch number that the validator exited.withEpoch
- Epoch when the validator is eligible to withdraw their funds. This field is not applicable if the validator has not exited.slashed
- Whether the given validator has been slashed.We have the 'epoch' when a validator was slashed and also their 'activationEpoch'. Thus, the difference will answer how long before a validator get slashed. However, we have some validators who was slashed more than once, we will need to get the rows where the first slashing occurs for each unique validator, i.e. the minimum difference.
However, 'epoch' and 'activationEpoch' are in different data set. Thus, we need to first combine these two datasets using the validator index. Since we want the 'activationEpoch' of the validator that was slashed, we will merge the two dataframe on 'validatorSlashed' and 'index' via pd.merge() .
Also, we need to convert 'activationEpoch' to numeric values.
df_validator = pd.read_csv('validator_data.csv',encoding = 'ISO-8859-1')
#Note that we need the encoding parameter here as the csv file has some special character: '|'
df_validator = df_validator.replace(['genesis'],0)
df_validator['activationEpoch'] = df_validator['activationEpoch'].astype(int)
df_join_slashed = pd.merge(df_slashed,df_validator,left_on='validatorSlashed', right_on='index')
To find out how long before a validator get slashed, we will do the following:
df_join_slashed['timetoslashed'] = df_join_slashed['epoch']-df_join_slashed['activationEpoch']
df_join_slashed.groupby('validatorSlashed').min()['timetoslashed'].describe()
Thus, on average, validators are slashed after 3919 epoch. Note that this average accounts for only validators who are slashed at least once. Similarly, we can compute 'timetoslash' to find out how long before a validator slashes others.
All we need to do is change a few things from the code in previous two slides. Take a few minutes to figure out the code!
df_join_slasher = pd.merge(df_slashed,df_validator,left_on='slashedBy', right_on='index')
df_join_slasher['timetoslash'] = df_join_slasher['epoch']-df_join_slasher['activationEpoch']
df_join_slasher.groupby('slashedBy').min()['timetoslash'].describe()
On average, a validator marks their first slash in the initial 3409 epochs after activation. The fastest first slash was found to occur only 4 epochs after activation, while the slowest first slash was 14892 epochs after activation.
In previous slide, we looked at the Distribution of the Number of Slashings Performed by a Validator. There was a validator that has slashed others more than 90 times. We are interested to see who these frequent slashers are.
Since we have the data 'df_joined_slasher', we can easily get the information of these frequent slashers simply group by 'slashedBy' and arrange them according to their slashing frequency.
df_join_slasher['SlashingFrequency']=df_join_slasher.groupby('slashedBy')['slashedBy'].transform('count')
df_join_slasher.drop_duplicates('slashedBy',inplace=True)
Slashers = df_join_slasher.sort_values('SlashingFrequency',ascending=False)[['slashedBy', 'currentBalance', 'effectiveBalance', 'proposed', 'activationEpoch', 'SlashingFrequency']]
Slashers.head(10)
The table shows the top 10 validators that have done the most slashings. These slashers have similar current balance and effective balance. Most of them were also active for a long period of time. According to the tier system we created on the Case Study we did on validator data, 8 out of these top 10 slashers reside in tier 3 where validators' performance becomes noticeably worse.
Due to the sink-source structure of the 'slashedBy' and 'validatorSlashed' columns, it allows us to treat the various slashes as directed edges in a directed graph. A directed graph consists of a set of nodes and directed edges, where the directed edges represent some relationship between the nodes. The nodes in this instance are the individual validators, and a directed edge exists between two nodes if one node has slashed the other.
We will use the networkx library to draw the graph. We will first create the adjaceny matrix using crosstab() . Then, <b? use columns.union() </b> and reindex() to put the index back.
import networkx as nx
adjacenymatrix = pd.crosstab(df_slashed.slashedBy, df_slashed.validatorSlashed)
idx = adjacenymatrix.columns.union(adjacenymatrix.index)
adjacenymatrix = adjacenymatrix.reindex(index = idx, columns=idx, fill_value=0).to_numpy()
DG =nx.DiGraph(adjacenymatrix)
nx.draw(DG, node_size=1)
Since the whole network comprised of many vertices, we can decompose the network to all its connected component subgraphs to have a better understanding. Unfortunately, the library does not have the option for directed graphs. We will simply use undirected graph for this instead. We will use nx.connected_components() .
G = nx.Graph(adjacenymatrix)
largest_cc = max(nx.connected_components(G), key=len)
len(largest_cc)
This shows that the largest connected subgraphs has 415 vertices. Now, let's get all these subgraphs and draw them!
S = [G.subgraph(c).copy() for c in nx.connected_components(G)]
len(S)
We see that there are a total of 660 connected subgraphs.
To plot any of the subgraphs, we can simply use the following code:
nx.draw(S[1])
Let's find the subgraphs with the highest numbers of vertices. To do so, we can simply use a for loop to count the number of vertices of each subgraph.
num_vertices = [len(c) for c in nx.connected_components(G)]
num_vertices[0:5]
We can see that the first subgraph is the one with 415 vertices. So let's graph that!
nx.draw(S[0], node_size=10)
We can kind of see 8 validators that have done a high number of slashes and each of them have a star structure as they never slash the same validator twice.
There were two interesting observations about the slashing behavior that were particularly important to understanding the nature of the network visualizations.
The first was that there was not one validator that had been slashed by the same validator twice.
The second observation we discovered was that there were no instances of "revenge slashing" in which a validator slashed a second validator, and then the second validator eventually slashed the first in return.
When you combine these two facts, it explains why all of the networks we produced were only simple directed graphs (i.e. it has no loops or multiple edges).
Through our analysis of ETH2's security mechanism for blockchain security known as "slashing", we've observed some interesting patterns in its frequency, those who perform them, and their recipients. Some key findings include:
As the network of interconnected violators continues to grow, we expect the number of interesting sub-graphs to grow with it and represent some interesting dynamics in terms of the interaction between validators as it pertains to slashing.