Continuous Monitoring

Establish the principles and guidelines, ensuring the confidentiality, integrity, and availability of information, in accordance with ISO/IEC 27001.

Scope

This policy applies to all physical and virtual servers that are part of the Solana infrastructure, including validator nodes, RPCs, indexers, and monitoring servers, in both production and development environments.

General Principles

Continuous monitoring of logs, performance metrics, access, and security events.
Tools used: Grafana, Watchtower.
Automated alerts for critical events (authentication failures, configuration changes, unauthorized access).
Secure log retention for a minimum of 12 months.
Periodic review of monitoring systems and their configurations.

Technical Procedures. Solana Node Monitoring:

Enable RPC metrics on the Solana node with --enable-rpc-metrics.
Configure Prometheus to collect metrics from port 9100.
Create dashboards in Grafana for TPS, latency, CPU/RAM usage, and validator status.

Exporting Metrics

Relevant metrics on a Solana server:

TPS (Transactions per Second)
Slot time y block time
Validator Status (leader, delinquent, etc.)
Use of CPU, RAM, disk
Network latency and connected peers

For monitoring our validator, we use Telegraf, a lightweight metrics collection agent. It runs directly on the validator nodes and gathers various hardware metrics.

For validator-specific metrics (such as block production, vote credits, identity balance, etc.), we rely on the Stakeconomy scripts. All collected metrics are sent to an external time-series database powered by InfluxDB.

Dashboards:

Solana exposes metrics in Prometheus format from the endpoint:

http://<NODE_IP>:9100/metrics

You can use the Node Exporter and configure Prometheus to collect:

OS metrics
Solana-validator process metrics

For more information visit our section: setup-validator-metrics

Monitoring with Watchtower

The watchtower is recommended to be installed in a separate box. We use watchtower for monitoring and alerting identity keys for Mainnet and Testnet. Critical metrics such as Identity balance and validator health are checked every minute.

Automated Alerts

We use Watchtower for monitoring the validator's health across the Solana cluster. Watchtower runs on a separate machine and continuously checks validator status. If it detects any issues (delinquency, low balance), it sends alerts through multiple channels such as:

Telegraf
Discord

Prerequisites

Solana CLI
Python
Telegram Groups (Mainnet and Testnet)
Discord WEBHOOK

Hardware Alerts:

Define alerts on critical dashboards recommended by Solana, for example:

TPS < 100 → possible congestion
CPU usage > 90% → risk of falling
RAM Usage > 90%

For the installation and configuration of Grafana/influx see documentation:

setup-validator-metrics

Log Retention and Auditing

Logs are stored by default in /home/sol/logs/agave-validators.log Recommended:

Daily rotation.
Uploading to a centralized system.

Automate the review with custom tools or scripts that detect:

Connection retries
Configuration changes
Synchronization errors

For more information, visit our section:

inspecting-logs

Complementary Services

Solana Beach / Compass: Public APIs for network status.
Stakewiz: Validator metrics and reputation.
Solana Validator Health Check: Community scripts for health checks.

Good practices aligned with ISO 27001

Integrity: Use hashes or digital signatures on exported logs.
Availability: Redundancy in metric storage.
Confidentiality: Encrypt export channels (TLS in Prometheus).
Audit: Maintain logs of metric access.
Alerts: Configure alerts in Grafana for critical events (e.g., node downtime or high latency).

PreviousVulnerability Management

Last updated 5 months ago

Was this helpful?

hashtagScope

hashtagGeneral Principles

hashtagTechnical Procedures. Solana Node Monitoring:

hashtagExporting Metrics

hashtagDashboards:

hashtagMonitoring with Watchtower

hashtagAutomated Alerts

hashtagHardware Alerts:

hashtagLog Retention and Auditing

hashtagComplementary Services

hashtagGood practices aligned with ISO 27001