class: center, middle, inverse, title-slide # Crypto Punks Analysis
### Omni Analytics Group --- ## Crypto Punks The CryptoPunks are 10,000 uniquely generated characters and each one of them are officially owned by a single person on the Ethereum blockchain. Originally, they were all claimed by for free. Now, you can buy, bid on, and offer punks for sale via the marketplace that's also embedded in the blockchain. You can check them out on https://www.larvalabs.com/cryptopunks. <p align="center"> <img src="punk-image.png" > </p> --- ## Getting Started We will be using the following libraries: ```r library(knitr) library(tidyverse) library(ggfortify) library(formattable) library(ggnewscale) library(scales) ``` <p align="center"> <img src="Cut_outs/Cut_out_17.png" width="200px" height="150px"> </p> --- ## The Data We will be using two data: * Transaction data that we scrapped from https://www.larvalabs.com/cryptopunks/details on 12/30/2021. * Details of each punk available on https://docs.google.com/spreadsheets/d/1qJLUy7BR1i3-aU2axFkuvNY_5EqLDsQiE9ryxB0dcA0/edit#gid=278362788 Let's read in the data! ```r punks_transactions <- read_csv("transactions_cryptopunks.csv") punks_raw <- read_csv("cryptopunks.csv") ``` --- ## Data Preview ```r head(punks_transactions,3) ``` ``` ## # A tibble: 3 x 7 ## Type From To Crypto USD Txn ID ## <chr> <chr> <chr> <dbl> <dbl> <date> <dbl> ## 1 Bid Withdrawn natealex <NA> 69 25532 2020-09-16 0 ## 2 Bid natealex <NA> 69 25104 2020-09-15 0 ## 3 Bid Withdrawn natealex <NA> 49 7031 2020-01-11 0 ``` ```r head(punks_raw,3) ``` ``` ## # A tibble: 3 x 25 ## Punk Sex Type Skin TypeSkin Slots Att1 Att2 Att3 Att4 Att5 Att6 ## <dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 6487 Girl Female Albino FemaleAl… 0 (blan… (bla… (bla… (bla… (bla… (bla… ## 2 2204 Girl Female Dark FemaleDa… 0 (blan… (bla… (bla… (bla… (bla… (bla… ## 3 641 Girl Female Light FemaleLi… 0 (blan… (bla… (bla… (bla… (bla… (bla… ## # … with 13 more variables: Att7 <chr>, TypeRare <chr>, AttCount <chr>, ## # Att1_1 <chr>, Att2_1 <chr>, Att3_1 <chr>, Att4_1 <chr>, Att5_1 <chr>, ## # Att6_1 <chr>, Att7_1 <chr>, Min <chr>, Avg <chr>, Rank <chr> ``` --- ## Data Cleaning The transaction data is good to go, but the punks data need some cleaning as the numerical value (percentage) are treated as character. We can simply use `parse_number()` to get the numerical value from those columns. ```r punks <- punks_raw %>% mutate(across(c(TypeRare, AttCount, Att1_1, Att2_1, Att3_1, Att4_1, Att5_1, Att6_1, Att7_1, Min, Avg, Rank), parse_number)) ``` Let's get started with the analysis! --- ## Transactions On 06/23/2017, all 10,000 punks were claimed for free by 365 unique owners. On the same day, 32 bids and 21 sales were made. As of 12/30/2021, there are a total of 60479 interactions, where interactions include 'Bid', 'Bid Withdrawn', 'Offered', 'Offer Withdrawn', '(Wrap)', '(Unwrap)', 'Sold', 'Transfer', and 'Claimed'. Below shows a summary of the number of each interaction: ```r punks_transactions %>% group_by(Type) %>% summarise(Count = n(), Average_per_punk = n()/10000) ``` ``` ## # A tibble: 9 x 3 ## Type Count Average_per_punk ## <chr> <int> <dbl> ## 1 (Unwrap) 922 0.0922 ## 2 (Wrap) 1397 0.140 ## 3 Bid 10705 1.07 ## 4 Bid Withdrawn 5181 0.518 ## 5 Claimed 10000 1 ## 6 Offer Withdrawn 2872 0.287 ## 7 Offered 17179 1.72 ## 8 Sold 7567 0.757 ## 9 Transfer 4656 0.466 ``` --- ## Transactions at the punk level We can look at how many interactions of each type for each punk as the table shows: ```r punks_transactions %>% group_by(ID, Type) %>% summarise(Count = n()) ``` ``` ## # A tibble: 31,170 x 3 ## # Groups: ID [10,000] ## ID Type Count ## <dbl> <chr> <int> ## 1 0 Bid 17 ## 2 0 Bid Withdrawn 6 ## 3 0 Claimed 1 ## 4 0 Offered 3 ## 5 0 Sold 3 ## 6 0 Transfer 1 ## 7 1 Bid 12 ## 8 1 Bid Withdrawn 6 ## 9 1 Claimed 1 ## 10 1 Offer Withdrawn 6 ## # … with 31,160 more rows ``` --- ## Your Turn 1. Find the number of unique owners for each punk. 2. Find the number of unique bidders for each punk. Hint: `filter()` and `n_distinct()` is an useful function here. --- ## Answers ### 1. ```r punks_transactions %>% group_by(ID) %>% summarise(unique_owners = n_distinct(To)) ``` ``` ## # A tibble: 10,000 x 2 ## ID unique_owners ## <dbl> <int> ## 1 0 6 ## 2 1 9 ## 3 2 2 ## 4 3 2 ## 5 4 2 ## 6 5 2 ## 7 6 2 ## 8 7 2 ## 9 8 2 ## 10 9 2 ## # … with 9,990 more rows ``` --- ### 2. ```r punks_transactions %>% filter(Type == "Bid") %>% group_by(ID) %>% summarise(unique_bidders = n_distinct(From)) ``` ``` ## # A tibble: 4,484 x 2 ## ID unique_bidders ## <dbl> <int> ## 1 0 9 ## 2 1 10 ## 3 2 8 ## 4 3 6 ## 5 4 5 ## 6 5 3 ## 7 6 3 ## 8 7 6 ## 9 8 8 ## 10 9 8 ## # … with 4,474 more rows ``` --- ## Selling Price for each punk There are a total of 7567 sales and only 3665 punks were involved in these sales. Let's take a look at their prices. ```r punks_transactions %>% filter(Type == "Sold") %>% group_by(ID) ``` ``` ## # A tibble: 7,567 x 7 ## # Groups: ID [3,665] ## Type From To Crypto USD Txn ID ## <chr> <chr> <chr> <dbl> <dbl> <date> <dbl> ## 1 Sold 0xf5099e 14715954 25 2822 2018-11-30 0 ## 2 Sold 0x00d7c9 10528156 1.6 386 2017-07-07 0 ## 3 Sold 0xc352b5 55241 0.98 320 2017-06-23 0 ## 4 Sold EliteCat… 0xcf6165 60 36305 2020-11-30 1 ## 5 Sold 0xf5099e GoWest23 31 5155 2019-04-06 1 ## 6 Sold 0xc352b5 0xa0a59c 0.42 107 2017-06-26 1 ## 7 Sold 0x582fa7 0x7c00c9… 2.5 234 2018-12-07 14 ## 8 Sold 0xd8e806 0x582fa7 1.85 351 2017-07-17 14 ## 9 Sold 0xc352b5 0xd8e806 0.9 185 2017-07-13 14 ## 10 Sold 0xc352b5 6689278 0.51 105 2017-07-10 33 ## # … with 7,557 more rows ``` --- ## First sale and last sale Since some of the punks are sold multiple times, we are interested in its first and last sale. As we can see from the table in the previous slide, the sales are in chronological order. Thus, we can do the following: ```r first_sold_by_ID <- punks_transactions %>% filter(Type == "Sold") %>% group_by(ID) %>% filter(row_number()==n())%>% rename(first_price_ETH = Crypto, first_price_USD = USD) last_sold_by_ID <- punks_transactions %>% filter(Type == "Sold") %>% group_by(ID) %>% filter(row_number()==1) %>% rename(last_price_ETH = Crypto, last_price_USD = USD) ``` --- To put `first_sold_by_ID` and `last_sold_by_ID` together, we simply use `inner_join()` as follows: ```r sales_temp <- first_sold_by_ID %>% select(ID, first_price_ETH, first_price_USD) %>% inner_join(last_sold_by_ID, by = c("ID"="ID")) sales_temp ``` ``` ## # A tibble: 3,665 x 9 ## # Groups: ID [3,665] ## ID first_price_ETH first_price_USD Type From To last_price_ETH ## <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> ## 1 0 0.98 320 Sold 0xf5099e 14715954 25 ## 2 1 0.42 107 Sold EliteCat… 0xcf6165 60 ## 3 14 0.9 185 Sold 0x582fa7 0x7c00c… 2.5 ## 4 33 0.51 105 Sold 0xc352b5 6689278 0.51 ## 5 89 1.5 336 Sold jmg TokenAn… 20.0 ## 6 116 1.2 229 Sold 0xd8e806 5779367 2.4 ## 7 143 6.5 1041 Sold 0xc352b5 MrTpunks 6.5 ## 8 155 7 2708 Sold Michel_M… DANNY 12 ## 9 169 1 127 Sold Pranksy Slats_e… 6.99 ## 10 240 2 255 Sold Pranksy Slats_e… 7 ## # … with 3,655 more rows, and 2 more variables: last_price_USD <dbl>, ## # Txn <date> ``` --- ## Your Turn 1. Create a table named `num_sold` that has the number of the sales for each punk that was sold at least once. 2. Create a table named `sales_temp_2` that combines `sales_temp` and `num_sold` above. --- ## Answers ### 1. ```r num_sold <- punks_transactions %>% filter(Type == "Sold") %>% group_by(ID) %>% summarise(num_sold = n()) ``` ### 2. ```r sales_temp_2 <- sales_temp %>% inner_join(num_sold, by = c("ID"="ID")) ``` --- ## Price Change Let's also compute the percentage change in the sale prices and create an extra column that categorize the frequency of a punk is being sold: 'Low' means a punk is sold less than or equal to 3 times, 'Medium' is sold more than 3 times and less than or equal to 5 times, and 'High' is sold more than 5 times. ```r sales <-sales_temp_2 %>% mutate(percentage_change_ETH = (last_price_ETH-first_price_ETH)/first_price_ETH*100, percentage_change_USD = (last_price_USD-first_price_USD)/first_price_USD*100, frequency_sold = ifelse(num_sold<=3, "Low", ifelse(num_sold<=5, "Medium", "High"))) head(sales,2) ``` ``` ## # A tibble: 2 x 13 ## # Groups: ID [2] ## ID first_price_ETH first_price_USD Type From To last_price_ETH ## <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> ## 1 0 0.98 320 Sold 0xf5099e 14715954 25 ## 2 1 0.42 107 Sold EliteCat… 0xcf6165 60 ## # … with 6 more variables: last_price_USD <dbl>, Txn <date>, num_sold <int>, ## # percentage_change_ETH <dbl>, percentage_change_USD <dbl>, ## # frequency_sold <chr> ``` --- ## Ditribution of Percentage Changed in Sold Price (ETH) We can visualize the distribution using `ggplot2` as follows: ```r scaleFUN <- function(x) sprintf("%.0f", x) x_breaks = c(1,2,3,5,10,20,30,50,100,200,300,500,1000,2000,3000,5000,10000,20000,30000,50000,100000, 200000,300000,500000) sales %>% filter(num_sold>1) %>% mutate(frequency_sold = factor(frequency_sold, levels=c("Low", "Medium", "High"))) %>% ggplot(aes(x=percentage_change_ETH))+ geom_histogram(aes(fill = frequency_sold), colour="black")+ scale_x_log10(breaks = x_breaks, labels = scaleFUN)+ scale_y_continuous(breaks = scales::pretty_breaks(n=20))+ theme(axis.text.x = element_text(angle = 45, vjust=0.9))+ labs(title = "Ditribution of Percentage Changed in Sold Price (ETH)", subtitle = "for all Crypto Punks that have been resold", x = "Percentage Change (%)")+ scale_fill_discrete(name = "Frequency Sold", labels = c("Low (2,3)", "Medium (4,5)", "High (>5)")) ``` --- <img src="cryptopunks_files/figure-html/unnamed-chunk-16-1.png" style="display: block; margin: auto;" /> --- ## Time Series We can also create a time series of the number of sales as follows. ```r sales$Month_Yr <- format(as.Date(sales$Txn), "%Y-%m") sales %>% group_by(Month_Yr) %>% summarise(num_sales = n()) %>% ggplot(aes(x=Month_Yr,y=num_sales, group=1))+ scale_y_continuous(breaks = scales::pretty_breaks(n=10))+ geom_line(color="blue")+ theme(axis.text.x = element_text(angle = 90, vjust=0.9))+ labs(title = "Number of Sales over time", y = "Number of Sales", x= "Time") ``` --- ## Time Series (continued...) <img src="cryptopunks_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- ## Your Turn Create a time series showing the total sales in each month! -- ## Answers ```r sales %>% group_by(Month_Yr) %>% summarise(total_sales = sum(last_price_ETH)) %>% ggplot(aes(x=Month_Yr,y=total_sales, group=1))+ scale_y_continuous(breaks = scales::pretty_breaks(n=10))+ geom_line(color="blue")+ theme(axis.text.x = element_text(angle = 90, vjust=0.9))+ labs(title = "Total Sales over time", y = "Total Sales (ETH)", x= "Time") ``` --- ## Answers (continued...) <img src="cryptopunks_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- ## Punk's Rank vs its' Price We will use the existing 'Rank' column to create 'RankNum' using the `Rank` function and join it with the 'last_sold_by_ID' table. ```r rank_and_transaction <- punks %>% mutate(RankNum = rank(desc(Rank))) %>% inner_join(last_sold_by_ID, by = c("Punk"="ID")) head(rank_and_transaction,3) ``` ``` ## # A tibble: 3 x 32 ## Punk Sex Type.x Skin TypeSkin Slots Att1 Att2 Att3 Att4 Att5 Att6 ## <dbl> <chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 6487 Girl Female Albino FemaleAl… 0 (blan… (bla… (bla… (bla… (bla… (bla… ## 2 641 Girl Female Light FemaleLi… 0 (blan… (bla… (bla… (bla… (bla… (bla… ## 3 1050 Girl Female Mid FemaleMid 0 (blan… (bla… (bla… (bla… (bla… (bla… ## # … with 20 more variables: Att7 <chr>, TypeRare <dbl>, AttCount <dbl>, ## # Att1_1 <dbl>, Att2_1 <dbl>, Att3_1 <dbl>, Att4_1 <dbl>, Att5_1 <dbl>, ## # Att6_1 <dbl>, Att7_1 <dbl>, Min <dbl>, Avg <dbl>, Rank <dbl>, ## # RankNum <dbl>, Type.y <chr>, From <chr>, To <chr>, last_price_ETH <dbl>, ## # last_price_USD <dbl>, Txn <date> ``` --- ## Crypto Punks Last Sold Price vs Uniqueness Score ```r scaleFUN <- function(x) sprintf("%.2f", x) x_breaks = c(0.01,0.02,0.03,0.05,0.07, 0.1,0.2,0.3,0.5,0.7,1,2,3,5,7,10,20,30,50, 70, 100,200, 300, 500, 700, 1000, 2000, 3000, 5000, 7000, 10000, 20000, 30000, 50000, 70000, 100000) rank_and_transaction %>% ggplot(aes(x=RankNum, y=last_price_ETH))+ geom_point()+ geom_smooth(aes(colour = ..y..), se=FALSE, size=2)+ scale_colour_gradient(low = "red", high = "green")+ labs(title = "Crypto Punks Last Sold Price vs Uniqueness Score", y = "Last Sold Price (ETH)", x = "Uniqueness Score")+ scale_x_continuous(breaks = scales::pretty_breaks(n=20))+ scale_y_log10(breaks = x_breaks, labels = scaleFUN)+ theme_bw()+ theme(legend.position = "none") ``` --- <img src="cryptopunks_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> --- ## Distribution of Last Price Sold ```r scaleFUN <- function(x) sprintf("%.2f", x) x_breaks = c(0.01,0.03, 0.05,0.1,0.3, 0.5, 1, 3,5, 10, 30,50, 100) rank_and_transaction %>% ggplot(aes(x=last_price_ETH))+ geom_histogram(aes(fill=Type.x),color ="black", boundary=0)+ scale_y_continuous(breaks = scales::pretty_breaks(n=10))+ scale_x_log10(breaks = x_breaks, labels = scaleFUN)+ facet_wrap(~Type.x, scales = "free_y")+ theme(axis.text.x = element_text(angle = 45, vjust=0.9))+ labs(title = "Distribution of Last Price Sold by Type", x="Last Sale Price (ETH)")+ theme(legend.position = "none") ``` --- <img src="cryptopunks_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> --- ## Conclusion This data science case study taught us how to use analytics to understand the market for Crypto Punk NFTs. We used graphs to show not only that sales have been increasing over time, but to objectively validate that the rarity of a punk has an impact on its selling price. Can you think of other interesting analysis? If so, please feel free to share! <br> <br> <br> <p align="right"> <img src="Cut_outs/Cut_out_07.png" width="200px" height="200px"> </p>