How to Run an Archive Node on BNB Smart Chain
The Binance Smart Chain (BSC) is a fast, low-cost, and efficient blockchain network built on the Ethereum network. Running an archive node is an essential part of maintaining the health and security of the Binance Smart Chain network. An archive node provides a complete and permanent record of all transactions on the network. In this tutorial, you will learn how to run an archive node on Binance Smart Chain, providing you with a deeper understanding of the network and the opportunity to contribute to its growth and stability. Whether you are a seasoned developer or just starting out with blockchain technology, this tutorial will provide you with the necessary information to run an archive node on the Binance Smart Chain.
What is an archive node?
Simply speaking, an archive node is a full node running with an additional special option,
--gcmode archive. It stores all the historical data of the blockchain starting from the genesis block. As compared to a typical full node that just holds all the state change data for some latest blocks, an archive node always stores them for each block.
Why is an archive node important?
Developers are limited to querying the limited recent blocks to check the balance of an address and the state of a smart contract with a full node. It is hard to get all that they want as the blockchain is moving forward at the same time, while they can query any block at a specific point in time with an archive node. Archive nodes are used by various applications on the blockchain for challenging use cases, including but not limited to the followings:
The automatic trading system needs historical data to optimize the trading model
Verification modules need state data to verify transactions on time
Analytical tools need full historical data to do data analysis
Exchange in some wallets depends on archive node for fast and efficient transfers
Running an archive node will take a high cost as it includes all the block and state change data. First of all, it needs a disk with sufficient capacity; besides this, the CPU and disk performance should be good enough to catch up with the latest block height.
How to run an archive node for the BSC mainnet?
Run with a Geth client
1.1 Run one segment archive node with a snapshot
A segment archive node is a node that has all the data from one starting block height to one ending block height, such as [19000000, latest]. To create such one archive node, you need a snapshot with starting block number, less than 19000000.
Command to run:
./geth --config local_config_dir/config.toml --datadir local_data_dir --pprof.addr 0.0.0.0 --rpc.allow-unprotected-txs --rpccorsdomain * --light.serve 50 --cache 5000 --metrics --snapshot=true --rangelimit --gcmode archive --txlookuplimit 0 --syncmode full --pprof
1.2 Build one full archive node with segmented archive nodes
Instead of putting all archive data on a single Geth instance, it is suggested to create multiple segment instances, each of them only serving part of the chain. To get better performance, it is suggested that each segment’s size is better to control under 4TB. There will be about 35TB of data in all(up to June 2022). For all BSC snapshots, you can refer to the Free public Binance Smart Chain Archive Snapshot. The owner has put all BSC archive snapshots on S3. As described this path is public but is configured as requester-pays. This means you’ll need an AWS account in order to download them. After having all the segments, you need one proxy to dispatch the requests to the right segment. And thanks to the owner, one proxy was also implemented. Please follow the owner’s guide to build.
Run with an Erigon client
Erigon has supported BSC mainnet. You can also refer to the Free public Binance Smart Chain Archive Snapshot for the guide to running a BSC archive node with an Erigon client. The owner has switched to using an Erigon client for a BSC archive node recently. You can download the archive snapshot which is a tarball from aws s3. The s3 path is “s3://public-blockchain-snapshots/bsc/erigon-latest.tar.zstd”. This path is public but is configured as requester-pays. Also, this means you’ll need an AWS account in order to download it.
Command to download to local dir:
aws s3 cp --request-payer=requester "s3://public-blockchain-snapshots/bsc/erigon-latest.tar.zstd" local_data_dir tar --use-compress-program=unzstd -xvf erigon-latest.tar.zstd
Command to run:
./erigon --chain=bsc --datadir local_data_dir
The known Issue with an Erigon client is that it does not really keep up with the latest blocks as mentioned in the GitHub. If you want to keep up with the latest blocks it is suggested to run a BSC archive node with a high-performance disk such as NVME, or run a BSC full node with a Geth client at the same time which means you need one proxy that will ask Erigon if it has the block height and if not forward it to the Geth client.
Comparison between Geth and Erigon
Up to now(June 2022), the whole data size is about 35TB with Geth client while it is about 6TB with Erigon client, much smaller.
Erigon is new and not yet battle-tested while Geth has been running for a long time, more stable. Archive nodes with Geth client can support all RPC APIs while some of them are not supported well by Erigon client such as eth_getProof.
It is easier to run one BSC archive node with an Erigon client than that with a Geth client. And it is nearly the same complexity if you want to keep up the latest blocks with a Erigon archive node & a Geth full node at the same time.
All in all, people can choose one of the methods above to run a BSC archive node based on their own requirements.