Beginner

 

How to Run an Archive Node on BNB Smart Chain

The Binance Smart Chain (BSC) is a fast, low-cost, and efficient blockchain network built on the Ethereum network. Running an archive node is an essential part of maintaining the health and security of the Binance Smart Chain network. An archive node provides a complete and permanent record of all transactions on the network. In this tutorial, you will learn how to run an archive node on Binance Smart Chain, providing you with a deeper understanding of the network and the opportunity to contribute to its growth and stability. Whether you are a seasoned developer or just starting out with blockchain technology, this tutorial will provide you with the necessary information to run an archive node on the Binance Smart Chain.

What is an archive node?

Simply speaking, an archive node is a full node running with an additional special option, --gcmode archive. It stores all the historical data of the blockchain starting from the genesis block. As compared to a typical full node that just holds all the state change data for some latest blocks, an archive node always stores them for each block.

Why is an archive node important?

Developers are limited to querying the limited recent blocks to check the balance of an address and the state of a smart contract with a full node. It is hard to get all that they want as the blockchain is moving forward at the same time, while they can query any block at a specific point in time with an archive node. Archive nodes are used by various applications on the blockchain for challenging use cases, including but not limited to the followings:

  • The automatic trading system needs historical data to optimize the trading model

  • Verification modules need state data to verify transactions on time

  • Analytical tools need full historical data to do data analysis

  • Exchange in some wallets depends on archive node for fast and efficient transfers

Suggested Requirements

Running an archive node will take a high cost as it includes all the block and state change data. First of all, it needs a disk with sufficient capacity; besides this, the CPU and disk performance should be good enough to catch up with the latest block height.

How to run an archive node for the BSC mainnet?

Run with a Geth client

1.1 Run one segment archive node with a snapshot

A segment archive node is a node that has all the data from one starting block height to one ending block height, such as [19000000, latest]. To create such one archive node, you need a snapshot with starting block number, less than 19000000.

  • Command to run:

./geth --config local_config_dir/config.toml --datadir local_data_dir --pprof.addr 0.0.0.0 --rpc.allow-unprotected-txs --rpccorsdomain * --light.serve 50 --cache 5000 --metrics --snapshot=true --rangelimit --gcmode archive --txlookuplimit 0 --syncmode full --pprof

 

1.2 Build one full archive node with segmented archive nodes

Instead of putting all archive data on a single Geth instance, it is suggested to create multiple segment instances, each of them only serving part of the chain. To get better performance, it is suggested that each segment’s size is better to control under 4TB. There will be about 35TB of data in all(up to June 2022). For all BSC snapshots, you can refer to the Free public Binance Smart Chain Archive Snapshot. The owner has put all BSC archive snapshots on S3. As described this path is public but is configured as requester-pays. This means you’ll need an AWS account in order to download them. After having all the segments, you need one proxy to dispatch the requests to the right segment. And thanks to the owner, one proxy was also implemented. Please follow the owner’s guide to build.

Run with an Erigon client

Erigon has supported BSC mainnet. You can also refer to the Free public Binance Smart Chain Archive Snapshot for the guide to running a BSC archive node with an Erigon client. The owner has switched to using an Erigon client for a BSC archive node recently. You can download the archive snapshot which is a tarball from aws s3. The s3 path is “s3://public-blockchain-snapshots/bsc/erigon-latest.tar.zstd”. This path is public but is configured as requester-pays. Also, this means you’ll need an AWS account in order to download it.

  • Command to download to local dir:

aws s3 cp --request-payer=requester  "s3://public-blockchain-snapshots/bsc/erigon-latest.tar.zstd"   local_data_dir

tar --use-compress-program=unzstd -xvf erigon-latest.tar.zstd

 

  • Command to run:

./erigon --chain=bsc --datadir  local_data_dir

The known Issue with an Erigon client is that it does not really keep up with the latest blocks as mentioned in the GitHub. If you want to keep up with the latest blocks it is suggested to run a BSC archive node with a high-performance disk such as NVME, or run a BSC full node with a Geth client at the same time which means you need one proxy that will ask Erigon if it has the block height and if not forward it to the Geth client.

Comparison between Geth and Erigon

  • Data size

    Up to now(June 2022), the whole data size is about 35TB with Geth client while it is about 6TB with Erigon client, much smaller.

  • Maturity

    Erigon is new and not yet battle-tested while Geth has been running for a long time, more stable. Archive nodes with Geth client can support all RPC APIs while some of them are not supported well by Erigon client such as eth_getProof.

  • Complexity

    It is easier to run one BSC archive node with an Erigon client than that with a Geth client. And it is nearly the same complexity if you want to keep up the latest blocks with a Erigon archive node & a Geth full node at the same time.

All in all, people can choose one of the methods above to run a BSC archive node based on their own requirements.

With this, you complete this workshop successfully!!