class: center, middle, inverse, title-slide # Interacting with and Analyzing
Numerai Network Growth ## with GraphQL and
ggplot2
### Omni Analytics Group --- ## Background Numerai is a modern-day hedge fund where a large pool of anonymous data scientists submit stock market predictions in a weekly tournament. These predictions are then crowd-sourced into an AI-based ensemble model that performs trades on behalf of Numerai. Over the last 3 years, the Numerai network has seen explosive growth and this case study will teach you how to analyze that growth using R's graphQL capabilities. ![](images/numerai.png) --- ## GraphQL Numerai's API uses GraphQL as a backend, which is a structured query language much like SQL itself. - Developed initially as an internal project by Facebook. - Provides a method for development of APIs much like REST. - Flexible and rich compared to REST and therefore may be less suitable for more simple web APIs. - Where REST APIs are organized as a collection of endpoints, GraphQL is organized as a collection of types and fields with their associated datatype specification. Let's take a look at how to interact with GraphQL in R... <div style="text-align: center;"><img src="images/graphql.png" width=500></div> --- ## GraphQL in R There are two primary methods of accessing the Numerai data: - Directly, using the `ghql` R package. - Indirectly, by downloading the parsed JSON data from the Numerai API and reading it into R. We will use `ghql` to keep the steps reproducible. <p align="right"> <img src="Cut_outs/Cut_out_05.png" width="200px" height="200px"> </p> --- ## `ghql` Let's begin by installing `ghql`: ```r install.packages("ghql") ``` Next, we connect to the Numerai API: ```r library(ghql) con <- GraphqlClient$new( url = "https://api-tournament.numer.ai/" ) ``` This connection object maintains the GraphQL client connection to the Numerai API server. Note that it is an <a href="https://adv-r.hadley.nz/r6.html">R6-style object</a> and hence is initialized with the GraphqlClient’s new() function --- ## Making a Leaderboard Query We can perform a query with the following: ```r qry <- Query$new() #create a new query qry$query('leaderboard', '{ v2Leaderboard { username corrRep mmcRep return_52Weeks return_13Weeks } }') result <- con$exec(qry$queries$leaderboard) #execute query ``` Note that we begin with the initialization of a new instance of the R6 `Query` class, and then call the `query()` method, passing in two arguments: - The name of the resulting object. - The raw GraphQL query that is to be executed. Here 'leaderboard' indicates the name of the query. --- ## Viewing the Results <img src="stickers/book1.png" width="70px" height="70px"> A quick peak at the raw values returned shows JSON data that we need to parse using the `fromJSON()` function, in order to create a data frame: ```r print(paste0(substring(result, 1, 50), "...")) ``` ``` ## [1] "{\"data\":{\"v2Leaderboard\":[{\"corrRep\":0.06688481132..." ``` And the parsed data: ```r nmr <- fromJSON(result)[[1]]$v2Leaderboard head(nmr, 3) %>% kable() ``` <table> <thead> <tr> <th style="text-align:right;"> corrRep </th> <th style="text-align:right;"> mmcRep </th> <th style="text-align:right;"> return_13Weeks </th> <th style="text-align:right;"> return_52Weeks </th> <th style="text-align:left;"> username </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.0668848 </td> <td style="text-align:right;"> 0.0269104 </td> <td style="text-align:right;"> 99.52905 </td> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> mdl3 </td> </tr> <tr> <td style="text-align:right;"> 0.0651065 </td> <td style="text-align:right;"> 0.0270910 </td> <td style="text-align:right;"> 101.81269 </td> <td style="text-align:right;"> 430.3838 </td> <td style="text-align:left;"> benben11 </td> </tr> <tr> <td style="text-align:right;"> 0.0631207 </td> <td style="text-align:right;"> 0.0194339 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:left;"> anna13 </td> </tr> </tbody> </table> --- ## 52 Week Returns Now we can use `ggplot()` to visualize aspects of the data! ```r ggplot(data = nmr, aes(x = return_52Weeks)) + geom_histogram(fill = "blue", colour = "black") + labs(title = "52 Week Returns on Staked NMR in the 'Classic' Tournament", x = "52 Week Returns") ``` <img src="index_files/figure-html/unnamed-chunk-7-1.png" width="600" style="display: block; margin: auto;" /> --- ## 13 Weekly Returns We can use `ggplot()` to visualize weekly returns. ```r ggplot(data = nmr, aes(x = return_13Weeks)) + geom_histogram(fill = "blue", colour = "black") + labs(title = "13 Week Returns on Staked NMR in the 'Classic' Tournament", x = "13 Week Returns") ``` <img src="index_files/figure-html/unnamed-chunk-8-1.png" width="600" style="display: block; margin: auto;" /> --- ## 52 Week Returns vs 13 Week Returns ```r ggplot(data = nmr, aes(x = return_13Weeks, y = return_52Weeks)) + geom_point() + geom_smooth(method = "lm") + labs(title = "52 Week Returns vs 13 Week Returns") ``` <img src="index_files/figure-html/unnamed-chunk-9-1.png" width="600" style="display: block; margin: auto;" /> --- ## Correlation vs MMC ```r ggplot(data = nmr, aes(x = corrRep, y = mmcRep)) + geom_point() + geom_smooth(method = "lm") + labs(title = "Correlation vs MMC") ``` <img src="index_files/figure-html/unnamed-chunk-10-1.png" width="600" style="display: block; margin: auto;" /> --- ## Conclusion You can now do the following using `ghql` and `jsonlite` R packages: * Connect to Numerai API * Create a Query * Execute a query * Parse the data Try it yourself! <p align="right"> <img src="Cut_outs/Cut_out_16f.png" width="200px" height="200px"> </p>