Client Troubleshooting

See https://youtu.be/nY2SFjaDXHw?si=Sd-ko1lTmpgkz_VH&t=1560

Process running

Verify that the process is running (run on validator machine). In a new terminal window, shh into your server

ps aux | grep agave-validator

You should see a line in the output that includes agave-validator with all the flags that were added to your validator.sh script.

Check the logs

Make sure your validator is producing reasonable log output (run on validator machine). In a new terminal window, ssh into your validator machine, switch users to the sol user and tail the logs:

su - sol
tail -f /home/sol/logs/agave-validator.log

Check version

Check the version you are starting with (run on validator machine). Useful if your validator is in some kind of restart loop

grep -B1 'Starting validator with' /home/sol/logs/agave-validator.log -A50

Check PoH speed

grep -B1 'PoH speed check' /home/sol/logs/agave-validator.log

https://discord.com/channels/428295358100013066/1187805174803210341/1346883323020050492 have you tried running on testnet, just to check? cpu may not be fast enough for poh if you grep for poh speed check in the log, post it here. you'll have to add the log to your startup cmd grep 'PoH speed check' log/validator.log

https://discord.com/channels/428295358100013066/1187805174803210341/1347136943825227776 Backup validator mainnet (EPYC 9254) [2025-03-05T11:42:00.837392249Z INFO solana_core::validator] PoH speed check: computed hashes per second 16740697, target hashes per second 10000000

Check Gossip

Make sure that the validator has registered itself with the gossip network (run anywhere)

# search validator by keypair and see IP address and RPC ports
# (useful when you want to grab a snapshot from a validator directly)
solana -ut gossip | grep <validator-identity-pubkey> 

# Output example:
# IP Address      | Identity                                     | Gossip | TPU   | TPU-QUIC | RPC Address           | Version | Feature Set
# ----------------+----------------------------------------------+--------+-------+----------+-----------------------+---------+----------------
# 185.209.178.99  | hytUYBP59GaVyiqG2ebrDozwoziVd17V5HYRPHp5R2W  | 8001   | 8003  | 8009     | none                  | 2.1.13  | 1725507508
# 139.178.68.207  | 5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on | 8001   | 8004  | 8009     | 139.178.68.207:80     | 1.14.17 | 3488713414

Check voting readyness

Verify that your validator is ready to be a voting participant of the network (run anywhere).

After you have verified that your validator is in gossip, you should stake some SOL to your validator. Once the stake has activated (which happens at the start of the next epoch)

solana -ut validators

solana -ut validators | head -n 3 # to see column headers
# Identity | Vote Account | Commission | Last Vote | Root Slot | Skip Rate | Credits | Version | Active Stake

solana-keygen pubkey validator-identity-keypair.json # get the validator identity pubkey

# At the begining, when your validator doesn't have any stake yet, you need to add the flag --keep-unstaked-delinquents
solana -ut validators --keep-unstaked-delinquents | grep <validator-identity-pubkey>
# 5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on  FX6NNbS5GHc2kuzgTZetup6GZX6ReaWyki8Z8jC7rbNG  100%  197434166 (  0)  197434133 (  0)   2.11%   323614  1.14.17   2450110.588302720 SOL (1.74%)

Check catchup speed

Check catchup speed with solana catchup (run on validator machine)

It tells you how far behind the network your validator is and how quickly you are catching up

solana -ut catchup /home/sol/keys-testnet/validator-identity-keypair.json

solana -ut catchup --our-localhost 8899
# ⠄ 77 slot(s) behind (us:320589449 them:320589526), our node is gaining at 6.0 slots/second (AVG: 5.0 slots/second, ETA: slot 320589524 in 

solana -ut catchup --our-localhost 8899
# hytUYBP59GaVyiqG2ebrDozwoziVd17V5HYRPHp5R2W has caught up (us:320587019 them:320587015)

Agave-Monitor

Monitor with agave-validator monitor (run on validator machine)

agave-validator -l /mnt/ledger/ monitor

# Output if you have firewall issues:
# Ledger location: /mnt/ledger/
# ⠤ Unable to connect to validator: Connection refused (os error 111)                                                                       ⠲ Unable to connect to validator: Connection refused (os error 111)                                                                       ⠴ Unable to connect to validator: Connection refused (os error 111)                                                                       ⠦ Unable to connect to validator: Connection refused (os error 111)                                                                       ⠒ Unable to connect to validator: Connection refused (os error 111)                                                                       ⠄ Unable to connect to validator: Connection refused (os error 111)

Check ports

Check that ports 8801, ... are open after agave-validator is running (run on validator machine)

It can take a few minutes for the process to open ports after the validator started

sudo netstat -ntlp # check open ports

Check connection is possible from validator machine to the network entry points (run on validator machine)

telnet entrypoint.testnet.solana.com 8001
telnet entrypoint2.testnet.solana.com 8001
telnet entrypoint3.testnet.solana.com 8001

nc -vz entrypoint.testnet.solana.com 8001
nc -vz entrypoint2.testnet.solana.com 8001
nc -vz entrypoint3.testnet.solana.com 8001

Check reachability

Check connection is possible from the outside to your validator machine (run anywhere)

nc -vz <YOUR_VALIDATOR_HOSTNAME> 8001 # gossip port you use. # this port is only open after the agave-validator is running

nc -vz <YOUR_VALIDATOR_HOSTNAME> 8900 # this port is only open after the agave-validator is running

Check gossip entrypoint

Make sure network entry points resolve to the expected IP addresses (run anywhere). I've seen issues with outdated DNS server on Solana's side:

nslookup entrypoint.testnet.solana.com # resolved to 35.203.170.30 at the time of writing
nslookup entrypoint2.testnet.solana.com # resolved to 139.178.94.143 at the time of writing

Check NAT

Check if you are behind a NAT (run on validator machine)

sudo apt install inetutils-traceroute
traceroute <YOUR_VALIDATOR_HOSTNAME>
# If you only see one hop, then you are not behind a NAT
# traceroute to 88.20.3.135 (88.20.3.135), 64 hops max
#   1   88.20.3.135  0.510ms  0.357ms  0.293ms

Get snapshot manually

Manually getting snapshot from another validator

# stop the validator service
systemctl stop sol

# get the IP address of the other validator
solana -ut gossip | grep <the_other_validator_identity_pubkey>
# then copy the THE_OTHER_VALIDATOR_IP_ADDRESS:8899 (if using the standard port)

# go to your snapshots dir. this might be the ledger directory (/mnt/ledger) if
# you didn't specified a snapshot path in your startup script
cd /mnt/snapshots

# donload the snapshot
wget --trust-server-names http://THE_OTHER_VALIDATOR_IP_ADDRESS:8899/snapshot.tar.bz
wget --trust-server-names http://THE_OTHER_VALIDATOR_IP_ADDRESS:8899/incremental-snapshot.tar.bz

# restart the sol service
systemctl start sol

# monitor the validator
agave-validator -l /mnt/ledger/ monitor

# check also with catchup, sometimes `monitor` reports more slots behind thant `catchup`
solana -ut catchup /home/sol/keys-testnet/identity.json

Last updated

Was this helpful?