Intermediate
How to run a core node – Part 2: Publishing History Archives and Running
Welcome to Part 2 of our guide on how to run a core node for the Stellar network. In this section, we will cover the steps to publish history archives and run your core node. We will start by discussing the importance of publishing history archives and the different methods available such as caching and local history archives using nginx or Amazon S3. We will also cover how to backfill a history archive to ensure a complete history of the network. Next, we will go over the steps to start Stellar Core, interact with your instance, and join the network. We will also discuss how to establish connections to other peers, observe consensus, and catch up to ensure your node is synced with the network. Additionally, we will cover logging and validator maintenance, including recommended steps to perform as part of maintenance and special considerations during quorum set updates. By the end of this guide, you will have a fully functional core node up and running on the Stellar network.
Publishing History Archives
If you want to run a Full Validator or an Archiver, you need to set up your node to publish a history archive. You can host an archive using a blob store such as Amazon’s S3 or Digital Ocean’s spaces, or you can simply serve a local archive directly via an HTTP server such as Nginx or Apache. If you’re setting up a Basic Validator, you can skip this section. No matter what kind of node you’re planning to run, make sure to set it up to get
history, which is covered in Configuration.
Caching and History Archives
You can significantly reduce the data transfer costs associated with running a public History archive by using common caching techniques or a CDN.
Three simple rules apply to caching the History archives:
-
Do not cache the archive state file
.well-known/stellar-history.json
(“Cache-Control: no-cache”) -
Do not cache HTTP 4xx responses (“Cache-Control: no-cache”)
-
Cache everything else for as long as possible (> 1 day)
Local History Archive Using nginx
To publish a local history archive using nginx:
-
Add a history configuration stanza to your
/etc/stellar/stellar-core.cfg
:
-
Example
[HISTORY.local] get="cp /mnt/xvdf/stellar-core-archive/node_001/{0} {1}" put="cp {0} /mnt/xvdf/stellar-core-archive/node_001/{1}" mkdir="mkdir -p /mnt/xvdf/stellar-core-archive/node_001/{0}"
-
Run new-hist to create the local archive
# sudo -u stellar stellar-core –conf /etc/stellar/stellar-core.cfg new-hist local
This command creates the history archive structure:
-
Example
# tree -a /mnt/xvdf/stellar-core-archive/ /mnt/xvdf/stellar-core-archive └── node_001 ├── history │ └── 00 │ └── 00 │ └── 00 │ └── history-00000000.json └── .well-known └── stellar-history.json 6 directories, 2 files
-
Configure a virtual host to serve the local archive (Nginx)
-
Example
server { listen 80; root /mnt/xvdf/stellar-core-archive/node_001/; server_name history.example.com; # default is to deny all location / { deny all; } # do not cache 404 errors error_page 404 /404.html; location = /404.html { add_header Cache-Control "no-cache" always; } # do not cache history state file location ~ ^/.well-known/stellar-history.json$ { add_header Cache-Control "no-cache" always; try_files $uri; } # cache entire history archive for 1 day location / { add_header Cache-Control "max-age=86400"; try_files $uri; } }
Amazon S3 History Archive
To publish a history archive using Amazon S3:
-
Add a history configuration stanza to your
/etc/stellar/stellar-core.cfg
:
-
TOML
[HISTORY.s3] get='curl -sf http://history.example.com/{0} -o {1}' # Cached HTTP endpoint put='aws s3 cp --region us-east-1 {0} s3://bucket.name/{1}' # Direct S3 access
-
Run new-hist to create the s3 archive
# sudo -u stellar stellar-core –conf /etc/stellar/stellar-core.cfg new-hist s3
-
Serve the archive using an Amazon S3 static site
-
Optionally place a reverse proxy and CDN in front of the S3 static site
-
Example
server { listen 80; root /srv/nginx/history.example.com; index index.html index.htm; server_name history.example.com; # use google nameservers for lookups resolver 8.8.8.8 8.8.4.4; # bucket.name s3 static site endpoint set $s3_bucket "bucket.name.s3-website-us-east-1.amazonaws.com"; # default is to deny all location / { deny all; } # do not cache 404 errors error_page 404 /404.html; location = /404.html { add_header Cache-Control "no-cache" always; } # do not cache history state file location ~ ^/.well-known/stellar-history.json$ { add_header Cache-Control "no-cache" always; proxy_intercept_errors on; proxy_pass http://$s3_bucket; proxy_read_timeout 120s; proxy_redirect off; proxy_buffering off; proxy_set_header Host $s3_bucket; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } # cache history archive for 1 day location / { add_header Cache-Control "max-age=86400"; proxy_intercept_errors on; proxy_pass http://$s3_bucket; proxy_read_timeout 120s; proxy_redirect off; proxy_buffering off; proxy_set_header Host $s3_bucket; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } }
Backfilling a history archive
Given the choice, it’s best to configure your history archive prior to your node’s initial synch to the network. That way your validator’s history publishes as you join/synch to the network.
However, if you have not published an archive during the node’s initial synch, it’s still possible to use the stellar-archivist command line tool to mirror, scan, and repair existing archives.
Using the SDF package repositories you can install stellar-archivist
by running apt-get install stellar-archivist
The steps required to create a History archive for an existing validator — in other words, to upgrade a Basic Validator to a Full Validator — are straightforward:
-
Stop your stellar-core instance (
systemctl stop stellar-core
) -
Configure a history archive for the new node
-
TOML
[HISTORY.local] get="cp /mnt/xvdf/stellar-core-archive/node_001/{0} {1}" put="cp {0} /mnt/xvdf/stellar-core-archive/node_001/{1}" mkdir="mkdir -p /mnt/xvdf/stellar-core-archive/node_001/{0}"
-
Run new-hist to create the local archive
# sudo -u stellar stellar-core –conf /etc/stellar/stellar-core.cfg new-hist local
-
Example
# tree -a /mnt/xvdf/stellar-core-archive/ /mnt/xvdf/stellar-core-archive └── node_001 ├── history │ └── 00 │ └── 00 │ └── 00 │ └── history-00000000.json └── .well-known └── stellar-history.json 6 directories, 2 file
-
Start your Stellar Core instance (
systemctl start stellar-core
) -
Allow your node to join the network and watch it start publishing a few checkpoints to the newly created archive
-
Example
2019-04-25T12:30:43.275 GDUQJ [History INFO] Publishing 1 queued checkpoints [16895-16895]: Awaiting 0/0 prerequisites of: publish-000041ff
At this stage your validator is successfully publishing its history, which enables other users to join the network using your archive (although it won’t allow them to CATCHUP_COMPLETE=true
as the archive only has partial network history).
Complete History Archive
If you decide to publish a complete archive — which enables other users to join the network from the genesis ledger — it’s also possible to use stellar-archivist
to add all missing history data to your partial archive, and to verify the state and integrity of your archive. For example:
-
Example
# stellar-archivist scan file:///mnt/xvdf/stellar-core-archive/node_001 2019/04/25 11:42:51 Scanning checkpoint files in range: [0x0000003f, 0x0000417f] 2019/04/25 11:42:51 Checkpoint files scanned with 324 errors 2019/04/25 11:42:51 Archive: 3 history, 2 ledger, 2 transactions, 2 results, 2 scp 2019/04/25 11:42:51 Scanning all buckets, and those referenced by range 2019/04/25 11:42:51 Archive: 30 buckets total, 30 referenced 2019/04/25 11:42:51 Examining checkpoint files for gaps 2019/04/25 11:42:51 Examining buckets referenced by checkpoints 2019/04/25 11:42:51 Missing history (260): [0x0000003f-0x000040ff] 2019/04/25 11:42:51 Missing ledger (260): [0x0000003f-0x000040ff] 2019/04/25 11:42:51 Missing transactions (260): [0x0000003f-0x000040ff] 2019/04/25 11:42:51 Missing results (260): [0x0000003f-0x000040ff] 2019/04/25 11:42:51 No missing buckets referenced in range [0x0000003f, 0x0000417f] 2019/04/25 11:42:51 324 errors scanning checkpoints
As you can tell from the output of the scan command, some history, ledger, transactions, and results are missing from the local history archive.
You can repair the missing data using stellar-archivist’s repair command combined with a known full archive — such as the SDF public history archive:
# stellar-archivist repair http://history.stellar.org/prd/core-testnet/core_testnet_001/ file:///mnt/xvdf/stellar-core-archive/node_001/
- Example
2019/04/25 11:50:15 repairing http://history.stellar.org/prd/core-testnet/core_testnet_001/ -> file:///mnt/xvdf/stellar-core-archive/node_001/ 2019/04/25 11:50:15 Starting scan for repair 2019/04/25 11:50:15 Scanning checkpoint files in range: [0x0000003f, 0x000041bf] 2019/04/25 11:50:15 Checkpoint files scanned with 244 errors 2019/04/25 11:50:15 Archive: 4 history, 3 ledger, 263 transactions, 61 results, 3 scp 2019/04/25 11:50:15 Error: 244 errors scanning checkpoints 2019/04/25 11:50:15 Examining checkpoint files for gaps 2019/04/25 11:50:15 Repairing history/00/00/00/history-0000003f.json 2019/04/25 11:50:15 Repairing history/00/00/00/history-0000007f.json 2019/04/25 11:50:15 Repairing history/00/00/00/history-000000bf.json ... 2019/04/25 11:50:22 Repairing ledger/00/00/00/ledger-0000003f.xdr.gz 2019/04/25 11:50:23 Repairing ledger/00/00/00/ledger-0000007f.xdr.gz 2019/04/25 11:50:23 Repairing ledger/00/00/00/ledger-000000bf.xdr.gz ... 2019/04/25 11:51:18 Repairing results/00/00/0e/results-00000ebf.xdr.gz 2019/04/25 11:51:18 Repairing results/00/00/0e/results-00000eff.xdr.gz 2019/04/25 11:51:19 Repairing results/00/00/0f/results-00000f3f.xdr.gz ... 2019/04/25 11:51:39 Repairing scp/00/00/00/scp-0000003f.xdr.gz 2019/04/25 11:51:39 Repairing scp/00/00/00/scp-0000007f.xdr.gz 2019/04/25 11:51:39 Repairing scp/00/00/00/scp-000000bf.xdr.gz ... 2019/04/25 11:51:50 Re-running checkpoing-file scan, for bucket repair 2019/04/25 11:51:50 Scanning checkpoint files in range: [0x0000003f, 0x000041bf] 2019/04/25 11:51:50 Checkpoint files scanned with 5 errors 2019/04/25 11:51:50 Archive: 264 history, 263 ledger, 263 transactions, 263 results, 241 scp 2019/04/25 11:51:50 Error: 5 errors scanning checkpoints 2019/04/25 11:51:50 Scanning all buckets, and those referenced by range 2019/04/25 11:51:50 Archive: 40 buckets total, 2478 referenced 2019/04/25 11:51:50 Examining buckets referenced by checkpoints 2019/04/25 11:51:50 Repairing bucket/57/18/d4/bucket-5718d412bdc19084dafeb7e1852cf06f454392df627e1ec056c8b756263a47f1.xdr.gz 2019/04/25 11:51:50 Repairing bucket/8a/a1/62/bucket-8aa1624cc44aa02609366fe6038ffc5309698d4ba8212ef9c0d89dc1f2c73033.xdr.gz 2019/04/25 11:51:50 Repairing bucket/30/82/6a/bucket-30826a8569cb6b178526ddba71b995c612128439f090f371b6bf70fe8cf7ec24.xdr.gz ...
A final scan of the local archive confirms that it has been successfully repaired
# stellar-archivist scan file:///mnt/xvdf/stellar-core-archive/node_001
-
Example
2019/04/25 12:15:41 Scanning checkpoint files in range: [0x0000003f, 0x000041bf] 2019/04/25 12:15:41 Archive: 264 history, 263 ledger, 263 transactions, 263 results, 241 scp 2019/04/25 12:15:41 Scanning all buckets, and those referenced by range 2019/04/25 12:15:41 Archive: 2478 buckets total, 2478 referenced 2019/04/25 12:15:41 Examining checkpoint files for gaps 2019/04/25 12:15:41 Examining buckets referenced by checkpoints 2019/04/25 12:15:41 No checkpoint files missing in range [0x0000003f, 0x000041bf] 2019/04/25 12:15:41 No missing buckets referenced in range [0x0000003f, 0x000041bf]
Start your stellar-core instance (systemctl start stellar-core
), and you should have a complete history archive being written to by your full validator.
Running
Starting Stellar Core
Once you’ve set up your environment, configured your node, set up your quorum set, and selected archives to get
history from, you’re ready to start Stellar Core.
Use a command equivalent to:
$ stellar-core run
At this point, you’re ready to observe your node’s activity as it joins the network.
You may want to skip ahead and review the logging section to familiarize yourself with Stellar Core’s output.
Interacting With Your Instance
When your node is running, you can interact with Stellar Core via an administrative HTTP endpoint. Commands can be submitted using command-line HTTP tools such as curl
, or by running a command such as
$ stellar-core http-command <http-command>
That HTTP endpoint is not intended to be exposed to the public internet. It’s typically accessed by administrators, or by a mid-tier application to submit transactions to the Stellar network.
See commands for a description of the available commands.
Joining the Network
Your node will go through the following phases as it joins the network:
Establishing Connection to Other Peers.
You should see authenticated_count
increase.
-
JSON
"peers" : { "authenticated_count" : 3, "pending_count" : 4 },
Observing Consensus
Until the node sees a quorum, it will say:
-
JSON
"state" : "Joining SCP"
After observing consensus, a new field quorum
will display information about network decisions. At this point the node will switch to “Catching up“:
-
JSON
"quorum" : { "qset" : { "ledger" : 22267866, "agree" : 5, "delayed" : 0, "disagree" : 0, "fail_at" : 3, "hash" : "980a24", "missing" : 0, "phase" : "EXTERNALIZE" }, "transitive" : { "intersection" : true, "last_check_ledger" : 22267866, "node_count" : 21 } }, "state" : "Catching up",
Catching up
This is a phase where the node downloads data from archives. The state will start with something like:
-
JSON
"state" : "Catching up", "status" : [ "Catching up: Awaiting checkpoint (ETA: 35 seconds)" ]
And then go through the various phases of downloading and applying state such as
-
JSON
"state" : "Catching up", "status" : [ "Catching up: downloading ledger files 20094/119803 (16%)" ]
You can specify how far back your node goes to catch up in your config file. If you setCATCHUP_COMPLETE
to true
, your node will replay the entire history of the network, which can take a long time. Weeks. Satoshipay offers a parallel catchup script to speed up the process, but you only need to replay the complete network history if you’re setting up a Full Validator. Otherwise, you can specify a starting point for catchup using CATCHUP_RECENT
. See the complete example configuration for more details.
Synced
When the node is done catching up, its state will change to:
-
JSON
"state" : "Synced!"
Logging
Stellar Core sends logs to standard output and stellar-core.log
by default, configurable as LOG_FILE_PATH
.
Log messages are classified by progressive priority levels:TRACE
, DEBUG
, INFO
, WARNING
, ERROR
and FATAL
. The logging system only emits those messages at or above its configured logging level.
The log level can be controlled by configuration, the -ll
command-line flag, or adjusted dynamically by administrative (HTTP) commands. To do so, run:
$ stellar-core http-command "ll?level=debug"
while your system is running.
Log levels can also be adjusted on a partition-by-partition basis through the administrative interface. For example the history system can be set to DEBUG-level logging by running:
$ stellar-core http-command "ll?level=debug&partition=history"
Against a running system.
The default log level is INFO
, which is moderately verbose and should emit progress messages every few seconds under normal operation.
Validator maintenance
Maintenance here refers to anything involving taking your validator temporarily out of the network (to apply security patches, system upgrade, etc).
As an administrator of a validator, you must ensure that the maintenance you are about to apply to the validator is safe for the overall network and for your validator.
Safe means that the other validators that depend on yours will not be affected too much when you turn off your validator for maintenance and that your validator will continue to operate as part of the network when it comes back up.
If you are changing some settings that may impact network wide settings such as protocol version, review the section on network configuration.
If you’re changing your quorum set configuration, also read the section on what to do.
Recommended steps to perform as part of a maintenance
We recommend performing the following steps in order (repeat sequentially as needed if you run multiple nodes).
-
Advertise your intention to others that may depend on you. Some coordination is required to avoid situations where too many nodes go down at the same time.
-
Dependencies should assess the health of their quorum, refer to the section “Understanding quorum and reliability”.
-
If there is no objection, take your instance down
-
When done, start your instance that should rejoin the network
-
The instance will be completely caught up when it’s both
Synced
and there is no backlog in uploading history.
Special considerations during quorum set updates
Sometimes an organization needs to make changes that impact other’s quorum sets:
-
taking a validator down for long period of time
-
adding new validators to their pool
In both cases, it’s crucial to stage the changes to preserve quorum intersection and general good health of the network:
-
removing too many nodes from your quorum set before the nodes are taken down : if different people remove different sets the remaining sets may not overlap between nodes and may cause network splits
-
adding too many nodes in your quorum set at the same time : if not done carefully can cause those nodes to overpower your configuration
Recommended steps are for the entity that adds/removes nodes to do so first between their own nodes, and then have people reflect those changes gradually (over several rounds) in their quorum configuration.