Large Scale Empirical Ethereum Smart Contract Analysis

RFC Blockchain & Payment Channel Networks

Liam Wachter

What are smart contracts?

solidity smart contract

What are smart contracts?

solidity eth interaction

What are smart contracts?

evm bytecode compiled

What are smart contracts?

evm bytecode deployment

Why is this interesting?

  • ETH has a market capitalization in the three digit billions
    • Can have real impact on the non-crypto economy [Boa22]
  • Methodology useful for
    • Layer 1 Development
    • Standardization
    • Security audits
  • Reproduction and extension of earlier work
  • A lot of marketing hype about them
smart contract hype article smart contract hype article smart contract hype article

What can be learned about the usage of Ethereum smart contracts by analysing public blockchain data

Alternate Approaches

twitter graph
(Image from github.com/silver84/Ethereum-community-toolset)

Social Media Analysis

  • Attention != Usage
  • Attention artificially generated with giveaways

Alternate Approaches

Systematic Literature Review

  • There are at least two:
    • [MCG18]
    • [LSL19]
  • Problems
    • Many trends not discussed in academic literature
    • Academic work more focused on technology than application
    • Results depend on selection criteria

Data Collection

  • Taking all create transactions is not enough
    • Create events can be hidden deep in transactions
    • Execution with tracing necessary
  • Taking all create events might not be what we want
    • Contracts can self-destruct
  • Take all contracts that have not yet called self-destruct
    • Be aware: Contracts can be reinitialized and destroyed multiple times

Data Collection

  • Obtaining traces requires instrumenting the EVM
    • For comparison: A Full node requires 12 TB SSD space
  • Public datasets: Google big query [DM19], dune, flipsidecrypto
    • Insufficiently documented
    • Limited free access

Related Work

Publication Data Collection Methodology Study Period
[KLM18] modified geth client until January 2018
[FB17] modified parity client until May 2017
[BP17] all SCs with published source (811) until September 2016
[AS20] modified variant of parity unknown
[Ren+21] Google Big Query -> tx > 1 -> has source unknown
Ethereum Blockchain analysis is getting harder
number of transactions over time
not considering side chains

Data Collection

Used for this research:

  • Google Big Query Dataset
    • Extracted all smart contracts
    • Reorganized them in a SQL database
  • dune.com
  • Crossed checked with flipsidecrypto.com

Most deployed smart contracts

Syntactic equality

Semantic equality not possible (Rice's theorem)

26 377 528 currently deployed (until August 30)

Top 25 Contracts

Reverse Engineering

  • Look for Source Code: Etherscan, Swarm
  • Disassemble and Decompile
  • Look at interactions on Etherscan
  • Search for similar code (e.g., on sourcegraph)

Top 25 Contracts

Addresses with the most gas usage

Last 30 days
  1. 278.80b: Seaport - Seaport
  2. 243.97b: Uniswap V3 - Swap Router 02
  3. 106.24b: Tether - Tether USD (USDT)
  4. 69.39b: Uniswap V2 - Router 02
  5. 64.92b: Gemswap - Gemswap
  6. 52.98b: ENS - Ethregistrar Controller 3
  7. 48.10b: Oneinch V4 - Aggregation Router v4
  8. 45.75b: Coinbase Contract
  9. 40.35b: Circle - USDC
  10. 32.26b: Zeroex - Exchange Proxy

Addresses with the most gas usage

Addresses with the most calls

Gas Tokens

  • Exploit the gas refund system
  • London hard fork removed gas refunds [EIP3298]
  • Made gas tokens worthless
  • Can we see this trend in token deployments

Gas Tokens

number of gastokens

Open Questions: How to incentivize state cleanup in a persistent storage?

Contracts by Lifecycle

contract creations

Function Representation in Bytecode

  • Compilers produce a jump tables
  • According to EVM ABI standard:
    • External functions: first 4 byes of msg.data
    • sha3('balanceOf(address,uint256)')[:4]

ERC implementations

  • ERC (Ethereum Request for Comments)
  • Searched through all final ERCs
  • Extracted function signatures wherever applicable

ERC implementations

  • 28 contract signatures
  • from 20 ERCs
  • Checked every smart contract against the signatures
signatures

Most common ERCs

  • 2 603 328: Light Contract Ownership (ERC 5313)
  • 574 421: Contract Ownership (ERC 173)
  • 205 651: Token (ERC 20)
  • 4 179: Non-Fungible Token (ERC 721)
  • 1 389: Token Receiver (ERC 1155 Token Receiver)

Rarly implemented ERCs

  • 523: Standard Signature Validation Method (ERC 1271)
  • 172: Payable Token (ERC 1363)
  • 130: Standard Interface Detection (ERC 165)
  • 121: ERC777 Token (ERC 777)
  • 76: ENS Resolver (ERC 137)
  • 66: Flash Borrower (ERC 3156)
  • 51: NFT Royalty Standard (ERC 2981)
  • 50: Reverse ENS Resolver (ERC 181)
  • 18: ERC777 Token Sender (ERC 777)
  • 10: Pseudo-introspection Registry Contract (ERC 820)
  • 3: ERC1363 Spender (ERC 1363)

Not implemented at all

(On main chain)

Flash Lender (ERC 3156), Token Receiver (ERC 777), Multi Token(ERC 1155), Receiver (ERC 1363), Abstract Storage Bonds (ERC 3475), Semi-Fungible Token(ERC 3525), Slot Approvable (ERC 3525), Slot Enumerable (ERC 3525), Secure Offchain Data Retrieval (ERC 3668), EIP721 Consumable (ERC 4400), Tokenized Vaults (ERC 4626), Rental NFT (ERC 4907)

Low adoption of RFCs

Can also be seen in transaction data, e.g., ERC 1820:

This standard defines a universal registry smart contract where any address (contract or regular account) can register which interface it supports and which smart contract is responsible for its implementation.

Called by only 10 addresses

Conclusion

  • Large scale ethereum analysis methodology
  • Most popular contracts are mostly trading/speculation related
  • Impact of the change in gas refunds
  • Low adoption of many RFCs

Bibliography

[AS20] Monika di Angelo and Gernot Salzer. “Characterizing types of smart contracts in the ethereum landscape”.
In: International Conference on Financial Cryptography and Data Security. Springer. 2020, pp. 389–404.

[Boa22] Board of Governors of the Federal Reserve System. Financial Stability Report. Tech. rep. May 2022.

[BP17] Massimo Bartoletti and Livio Pompianu. “An empirical analysis of smart contracts: platforms, applications, and design patterns”.
In: International conference on financial cryptography and data security. Springer. 2017, pp. 494–509.

[DM19] Allen Day and Evgeny Medvedev. “Ethereum in BigQuery: a public dataset for smart contract analytics”. In: Google Cloud Blog (2019).

[FB17] Michael Fröwis and Rainer Böhme. “In code we trust?” In: Data privacy management, cryptocurrencies and blockchain technology. Springer, 2017, pp. 357–372.

[KLM18] Lucianna Kiffer, Dave Levin, and Alan Mislove. “Analyzing ethereum’s contract topology”. In: Proceedings of the Internet Measurement Conference 2018. 2018, pp. 494–499.

[LSL19] Elva Leka, Besnik Selimi, and Luis Lamani. “Systematic literature review of blockchain applications: Smart contracts”.
In: 2019 International Conference on Information Technologies (InfoTech). IEEE. 2019, pp. 1–3.

[MCG18] Daniel Macrinici, Cristian Cartofeanu, and Shang Gao. “Smart contract applications within blockchain technology: A systematic mapping study”.
In: Telematics and Informatics 35.8 (2018), pp. 2337–2354.

[Ren+21] Meng Ren et al. “Empirical evaluation of smart contract testing:what is the best choice?”
In: Proceedings of the 30th ACM SIGSOFTInternational Symposium on Software Testing and Analysis. 2021,pp. 566–579.