Introduction

solana-exporter (Solana Rust Exporter) is an advanced, modular monitoring solution for Solana validator nodes. The executable part is implemented in Rust and was initially based on the Golang original by CertusOne but now provides additional functionality. It comprises

  • a Prometheus exporter executable written in Rust,

  • a Docker container suitable for docker-compose,

  • sample Grafana dashboards.

Typical uses include

  • monitoring Solana validator nodes by the nodes' operator,

  • managing a stake pool,

  • monitoring the cluster health.

This guide explains how to set the monitoring stack up and how to stay on top of things using Grafana alerts.

Basics

This chapter covers the setup, configuration and installation of solana-exporter, as well as integration with other tools.

Installation

To install the latest version of solana-exporter, run:

cargo install solana-exporter

Note that the libraries libssl and libudev should be installed on the target machine as well. Refer to your distro's package manager and documentation for guidance on how to install them.

After installation, run

solana-exporter generate

to set up a default configuration file. By default, the generate command will place a config file inside ~/.solana-exporter; this directory will be automatically created, unless the -o flag is present to override the output location.

Running as a service

Run this as a systemd service by a non-root user with a script like this one:

[Unit]
Description=Solana Exporter
After=solana.service
Requires=solana.service

[Service]
User=solana
Restart=always
RestartSec=20
ExecStart=/home/solana/.cargo/bin/solana-exporter

[Install]
WantedBy=multi-user.target

Running as a Docker container

solana-exporter is also available as a container which can be pulled by

docker pull rustiq/solana-exporter:0.4.1

Then run it with

docker run -d \
-v /path/to/your/config.toml:/etc/solana-exporter/config.toml \
-v solana-exporter-data:/exporter rustiq/solana-exporter:0.4.1

We recommend that the config.toml file be bind-mounted to the container, so you have easy access to it on the host machine. However, the persistent database should be stored in a named volume.

Configuration

After running solana-exporter generate, a template config file will be created in either the specified location or the default directory (~/.solana-exporter). This page explains the individual variables and how they affect the exporter.

Sample configuration

rpc = 'http://localhost:8899'
target = '0.0.0.0:9179'
vote_account_whitelist = []
staking_account_whitelist = []

[maxmind]
username = 'username'
password = 'password'

  • rpc - the location of the JSON-RPC node. This can be a local RPC node, or a public one.
    • Remark: Public nodes usually have a rate-limiting policy in place that makes usage with solana-exporter difficult (e.g., delayed response times).
  • target - the target address/port to export Prometheus gauges to.
  • vote_account_whitelist - an array that instructs the exporter to only export statistics related to the specified vote pubkeys, and their corresponding node pubkeys if found.
  • staking_account_whitelist - an array that instructs the exporter to only export APY statistics related to the specified staking pubkeys.
  • [maxmind] - The exporter can optionally use MaxMind's GeoIP2 Precision City Service to export decentralisation-related metrics. However, this requires you to sign up for a MaxMind account and regularly top-up your account with credits.
    • username - the username of the API key.
    • password - the password of the API key.

Important note on vote_account_whitelist and staking_account_whitelist

As explained above, whitelists can be used to restrict the amount APY statistics exported. However, care should be taken if you modify the whitelists and then reload solana-exporter. In particular, for the following gauges:

  • solana_current_staking_apy
  • solana_average_staking_apy

if any newly-appearing (e.g., formerly excluded) pubkeys were not fetched for a particular epoch, then solana-exporter will not attempt to "back-fill" data for either the current or past epochs. Therefore, those pubkeys will be missing from the gauges until the next epoch begins. This is due to the fact that the exporter only scrapes the ledger for rewards data once at the beginning of every epoch.

To force a re-fresh of the current epoch's rewards, delete the database.

Overriding the config file location

  • Standalone program: The default location is ~/.solana-exporter/config.toml. Override this with the -c flag.
  • Docker container: Change the bind-mount location.

Overriding the database location

To speed up processing and reduce unnecessary network traffic, solana-exporter uses a persistent database to cache some requests.

  • Standalone program: The default location is ~/.solana-exporter/persistent.db. Override this with the -d flag.
  • Docker container: The location cannot be overridden; the exporter expects a database to be mounted in /exporter/.

Setting up Prometheus

solana-exporter uses Prometheus to export, monitor and aggregate data for use in other utilities. Refer to Prometheus' documentation on how to get it running on your machine.

If querying a public RPC port, solana-exporter can be run from anywhere, not necessarily from the validator machine.

If querying a private RPC port, install Prometheus on the validator machine. Add the following snippet to the scrape_configs section of the prometheus.yml config file:

  - job_name: solana
    static_configs:
      - targets: ['localhost:9179']

Restart Prometheus. Now the solana-exporter metrics should be available to view at http://localhost:9179/metrics. If running on the validator machine, it is highly advisable to only open the Prometheus datasource port to the Grafana machine. This can be achieved with nftables. Here is an example /etc/nftables.conf:

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
    chain input {
        type filter hook input priority 0;
        # allow connection to Prometheus datasource only locally and from Grafana
        ip saddr { 127.0.0.1, <Grafana IP> } tcp dport 9090 accept
        tcp dport 9090 drop
        # allow connection to Prometheus exporter endpoints only internally
        ip saddr != 127.0.0.1 tcp dport 9100 drop
        ip saddr != 127.0.0.1 tcp dport 9179 drop
    }
    chain forward {
        type filter hook forward priority 0;
    }
    chain output {
        type filter hook output priority 0;
    }
}

Note the order of commands. An accept clause should appear before the corresponding drop clause.

When solana-exporter is used on a mainnet validator node, Grafana must always run on a different machine to circumvent potential DDoS attacks on the validator. In the Grafana dashboard, add the Prometheus data source http://<Validator IP>:9090. Then import the rustiq.json using that data source.

Setting up Grafana

After solana-exporter exports gauges and metrics to Prometheus, it may be useful to visualise these metrics. Grafana allows you to create custom dashboards using Prometheus as a data source.

The repository includes a basic dashboard (rustiq.json) that shows off a range of gauges that solana-exporter can export using graphs. Some included visualisations include the monitoring of basic cluster statistics such as transactions per second, skipped slots percentage. Decentralisation metrics such as the distribution of staked SOL across datacenters can also be tracked (MaxMind account required).

To use the provided dashboard, first add the Prometheus data source. Then, import the dashboard.

Grafana Dashboard

For an in-depth explanation of each exported gauge, see then next chapter.

Sample screenshots

Pie chart of activated stake by datacenter Pie chart of activated stake by datacenter (Gauge used: solana_active_validators_dc_stake)

Heatmap of average skipped slot of validators Heatmap of average skipped slot of validators (Gauge used: solana_skipped_slot_percent)

Heatmap of average staking APY over 5 epochs Heatmap of average staking APY over 5 epochs (Gauge used: solana_average_staking_apy)

Exported feeds

This chapter explains in detail all the feeds that solana-exporter exports to Prometheus.

solana_active_validators

Description

Total number of validators in the cluster, grouped by the current status (either current or delinquent).

Sample output

solana_active_validators{status="current"} 838
solana_active_validators{status="delinquent"} 61

solana_validator_delinquent

Description

Whether a validator is delinquent. Given a vote pubkey label, the value (0 = false, 1 = true) shows whether the validator with that vote pubkey is delinquent.

Sample output

solana_validator_delinquent{pubkey="13DmkMhdpmJJu7nU2ozAyPiKuopZbYShMHV3JAA7YVYC"} 1
solana_validator_delinquent{pubkey="13HNYUVBVHgJSfNKvgXgKia3bywzXabGzQjFyMQxLMjS"} 0
solana_validator_delinquent{pubkey="13zyX9jfGy1RvM28LcdqfLwR4VSowXx6whAL6AcFERCk"} 0
solana_validator_delinquent{pubkey="14YCghb1uYPreALx6arirtPAnoGghoPH2Ac6gCmNQdq7"} 0
solana_validator_delinquent{pubkey="1gqv7KGm888nQXsJoNFwGaDkNERUBztuekjzK3J3T7a"} 0
solana_validator_delinquent{pubkey="21ryEourynXqhpLe1DsFz8yoeFKSXE14T8bKBFmzcYzt"} 1

Example usage

count(solana_validator_delinquent == 0) will return a time series of all active validators.

count(solana_validator_delinquent == 1) returns a time series of all delinquent validators.

These two queries are equivalent to solana_active_validators.

solana_validator_activated_stake

Description

The activated stake of a validator vote account pubkey, in lamports.

Sample output

solana_validator_activated_stake{pubkey="2BGBakG9kjmZAaygVmAuUrBSqBZt8p5FVabbStxHEXUj"} 189767375985727
solana_validator_activated_stake{pubkey="2BJUTarkNTNtiqn6g7mfZc5fjaF3sSnvV73dnuwuZnxV"} 182505753341154
solana_validator_activated_stake{pubkey="2DZVYFkMVumN7wuTSjDCAqWmpu9ydE7UoeYSJVWhNVEQ"} 26246477171200
solana_validator_activated_stake{pubkey="2Dwg3x37yN4q8SyrrwDaRPGQTp14atcwMPewe3Y8FDoL"} 181695250479495
solana_validator_activated_stake{pubkey="2EoaPgNSGbB3JyP7nSfiK5Wq3eME3LgbbEbdPim4CnVm"} 0
solana_validator_activated_stake{pubkey="2PC2DCk8C7n4jhXA427a1qL9CFRu7PtrzKHixU6LXocN"} 184130856467877

solana_validator_last_vote

Description

The last voted slot of a validator vote account pubkey.

Sample output

solana_validator_last_vote{pubkey="13DmkMhdpmJJu7nU2ozAyPiKuopZbYShMHV3JAA7YVYC"} 91417048
solana_validator_last_vote{pubkey="13HNYUVBVHgJSfNKvgXgKia3bywzXabGzQjFyMQxLMjS"} 91431043
solana_validator_last_vote{pubkey="13zyX9jfGy1RvM28LcdqfLwR4VSowXx6whAL6AcFERCk"} 91430967
solana_validator_last_vote{pubkey="14YCghb1uYPreALx6arirtPAnoGghoPH2Ac6gCmNQdq7"} 91431043
solana_validator_last_vote{pubkey="1gqv7KGm888nQXsJoNFwGaDkNERUBztuekjzK3J3T7a"} 91431042
solana_validator_last_vote{pubkey="21ryEourynXqhpLe1DsFz8yoeFKSXE14T8bKBFmzcYzt"} 7393690

solana_validator_root_slot

Description

The root slot of a validator vote account pubkey.

Sample output

solana_validator_root_slot{pubkey="2naPB8XC4FWp4er8M2nxDsphXHEyQH2CSQMFaSHxWWd8"} 91448691
solana_validator_root_slot{pubkey="2nj17ZX4Mwj9yFx7ATBzhN7RzVQfWiSjCdMeeH6NBXe8"} 91448692
solana_validator_root_slot{pubkey="2oxQJ1qpgUZU9JU84BHaoM1GzHkYfRDgDQY9dpH5mgGn"} 91448692
solana_validator_root_slot{pubkey="2rNaaG1yKrGY5KU2H8gGWtQzMitoWHcdVroMpVDg5AYy"} 91448692
solana_validator_root_slot{pubkey="2vxNDV7aAbrb4Whnxs9LiuxCsm9oubX3c1hozXPsoD97"} 91448690
solana_validator_root_slot{pubkey="2xT5m25m9mDkdpLV1mT5fWkvH15jGy1dm7anNQLzzYGA"} 91448689

solana_transaction_count

Description

Total number of confirmed transactions since genesis.

Sample output

solana_transaction_count 23854763230

Example usage

rate(solana_transaction_count[5m])

returns a time series of the average transactions per second (TPS) in the cluster, calculated by a rolling average over 5 minutes.

solana_slot_height

Description

The last confirmed slot height.

Sample output

solana_slot_height 91448726

solana_current_epoch

Description

The current epoch of the cluster.

Sample output

solana_current_epoch 211

solana_current_epoch_first_slot

Description

The first slot of the current epoch.

Sample output

solana_current_epoch_first_slot 91152000

solana_current_epoch_last_slot

Description

The last slot of the current epoch.

Sample output

solana_current_epoch_last_slot 91584000

solana_active_validators_isp_count

Description

The count of activate validators, grouped by their ISP. Given the isp_name label is the name of the ISP returned by MaxMind, the value is the number of validators with a node IP address belonging to that ISP.

Sample output

solana_active_validators_isp_count{isp_name="7heaven LLC"} 1
solana_active_validators_isp_count{isp_name="Adman LLC"} 1
solana_active_validators_isp_count{isp_name="Advanced Solutions LLC"} 1
solana_active_validators_isp_count{isp_name="Alibaba"} 1
solana_active_validators_isp_count{isp_name="Amazon"} 1
solana_active_validators_isp_count{isp_name="Amazon.com"} 61
solana_active_validators_isp_count{isp_name="Beeline"} 1

Remarks

This gauge will not be exported if no MaxMind API key is present in config.toml.

Caching

The output of this gauge relies on cached data; the exporter retains the geolocation information of an IP address for one week before considering it stale and re-acquiring it from MaxMind.

solana_active_validators_isp_stake

Description

The sum of stakes held by active validators, grouped by their ISP. Given isp_name is the name of the ISP returned by MaxMind, the value is the sum of stake (in lamports) in active validators who have a node IP address belonging that ISP.

Sample output

solana_active_validators_isp_stake{isp_name="7heaven LLC"} 188227282425345
solana_active_validators_isp_stake{isp_name="Adman LLC"} 6189723438718
solana_active_validators_isp_stake{isp_name="Advanced Solutions LLC"} 118047122517759
solana_active_validators_isp_stake{isp_name="Alibaba"} 164798876213690
solana_active_validators_isp_stake{isp_name="Amazon"} 6114765640910
solana_active_validators_isp_stake{isp_name="Amazon.com"} 135796402624727330
solana_active_validators_isp_stake{isp_name="Beeline"} 180442822928047

Remarks

This gauge will not be exported if no MaxMind API key is present in config.toml.

Caching

The output of this gauge relies on cached data; the exporter retains the geolocation information of an IP address for one week before considering it stale and re-acquiring it from MaxMind.

solana_active_validators_dc_stake

Description

The sum of stakes held by active validators, grouped by their datacenter location. Given dc_identifier is a semi-unique identifier assigned to each datacenter (see below), the value is the sum of stake (in lamports) in active validators who have a node IP address inside said datacenter.

Sample output

solana_active_validators_dc_stake{dc_identifier="11524-US-Portland"} 945063491010
solana_active_validators_dc_stake{dc_identifier="12212-CA-Toronto"} 393894227085511
solana_active_validators_dc_stake{dc_identifier="132203-US-Santa Clara"} 138086443681311
solana_active_validators_dc_stake{dc_identifier="13830-US-Dallas"} 386674347619072
solana_active_validators_dc_stake{dc_identifier="138982-CN"} 3058846690875748
solana_active_validators_dc_stake{dc_identifier="14618-US"} 6114765640910
solana_active_validators_dc_stake{dc_identifier="14618-US-Ashburn"} 210033359690576
solana_active_validators_dc_stake{dc_identifier="15169-BE-Brussels"} 737964600374534

Remarks

This gauge will not be exported if no MaxMind API key is present in config.toml.

The identifier is of the format:

{AS number}-{ISO-3166-1 Alpha-2 code}-{City name}

if a city name is available, otherwise:

{AS number}-{ISO-3166-1 Alpha-2 code}

Caching

The output of this gauge relies on cached data; the exporter retains the geolocation information of an IP address for one week before considering it stale and re-acquiring it from MaxMind.

solana_leader_slots

Description

The number of validated and skipped leader slots per validator node identity account pubkey.

Sample output

solana_leader_slots{pubkey="8E9KWWqX1JMNu1YC3NptLA6M8cGqWRTccrF6T1FDnYRJ",status="skipped"} 192
solana_leader_slots{pubkey="8E9KWWqX1JMNu1YC3NptLA6M8cGqWRTccrF6T1FDnYRJ",status="validated"} 84
solana_leader_slots{pubkey="8RsYRsi6f3hiK4EhyLS22Cy5KkrNbuidVYmsaYR1Xx78",status="skipped"} 2342
solana_leader_slots{pubkey="8RsYRsi6f3hiK4EhyLS22Cy5KkrNbuidVYmsaYR1Xx78",status="validated"} 1194
solana_leader_slots{pubkey="9YVpEeZf8uBoUtzCFC6SSFDDqPt16uKFubNhLvGxeUDy",status="skipped"} 118

solana_skipped_slot_percent

Description

The percentage of skipped slots per validator node identity account pubkey.

Sample output

solana_skipped_slot_percent{pubkey="8E9KWWqX1JMNu1YC3NptLA6M8cGqWRTccrF6T1FDnYRJ"} 69.56521739130434
solana_skipped_slot_percent{pubkey="8RsYRsi6f3hiK4EhyLS22Cy5KkrNbuidVYmsaYR1Xx78"} 66.23303167420815
solana_skipped_slot_percent{pubkey="9YVpEeZf8uBoUtzCFC6SSFDDqPt16uKFubNhLvGxeUDy"} 46.09375
solana_skipped_slot_percent{pubkey="NNetet8BiymZxMBWLRPCcNGcBPZDBeEcpgtfTSwdFPX"} 31.779661016949152

solana_current_staking_apy

Description

The APY of a given vote account pubkey based on last epoch's performance (in percent).

Sample output

solana_current_staking_apy{pubkey="5BAi9YGCipHq4ZcXuen5vagRQqRTVTRszXNqBZC6uBPZ"} 6.449820442689558
solana_current_staking_apy{pubkey="8jxSHbS4qAnh5yueFp4D9ABXubKqMwXqF3HtdzQGuphp"} 6.434121594142694
solana_current_staking_apy{pubkey="F5b1wSUtpaYDnpjLQonCZC7iyFvizLcNqTactZbwSEXK"} 7.195850076956045
solana_current_staking_apy{pubkey="irKsY8c3sQur1XaYuQ811hzsEQJ5Hq3Yu3AAoXYnp8W"} 3.2552769395926884

Remarks

Be sure to understand this gauge's behaviour when pubkey_whitelist is modified.

Caching

At the beginning of each epoch, the exporter fetches all reward transactions from the starting slots of the epoch. The staking rewards, and the duration of the previous epoch, are used to calculate the APY of the current epoch. This is only ever done once per epoch.

solana_average_staking_apy

Description

The APY of a given vote account pubkey averaged over a few past epochs (in percent).

Sample output

solana_average_staking_apy{pubkey="5BAi9YGCipHq4ZcXuen5vagRQqRTVTRszXNqBZC6uBPZ"} 2.544996416812742
solana_average_staking_apy{pubkey="8jxSHbS4qAnh5yueFp4D9ABXubKqMwXqF3HtdzQGuphp"} 2.5342297952374553
solana_average_staking_apy{pubkey="F5b1wSUtpaYDnpjLQonCZC7iyFvizLcNqTactZbwSEXK"} 2.8351747690563456
solana_average_staking_apy{pubkey="irKsY8c3sQur1XaYuQ811hzsEQJ5Hq3Yu3AAoXYnp8W"} 1.7458550503327919

Remarks

Be sure to understand this gauge's behaviour when vote_account_whitelist is modified.

Caching

At the beginning of each epoch, the exporter fetches all reward transactions from the starting slots of the epoch. The staking rewards, and the duration of the previous epoch, are used to calculate the APY of the current epoch. This is only ever done once per epoch.

To calculate the average staking APY, the exporter fetches the stored staking APY of the past few epochs and uses them. If a validator pubkey does not appear for a particular past epoch, then that epoch is excluded from calculation - instead of being treated as 0%.

solana_staking_commission

Description

The commission charged by staked validators, in percent (0-100).

Sample output

solana_staking_commission{pubkey="5BAi9YGCipHq4ZcXuen5vagRQqRTVTRszXNqBZC6uBPZ"} 10
solana_staking_commission{pubkey="8jxSHbS4qAnh5yueFp4D9ABXubKqMwXqF3HtdzQGuphp"} 10
solana_staking_commission{pubkey="F5b1wSUtpaYDnpjLQonCZC7iyFvizLcNqTactZbwSEXK"} 0
solana_staking_commission{pubkey="irKsY8c3sQur1XaYuQ811hzsEQJ5Hq3Yu3AAoXYnp8W"} 10

solana_validator_rewards

Description

Cumulative validator rewards of a given vote account pubkey in lamports.

Sample output

solana_validator_rewards{pubkey="5BAi9YGCipHq4ZcXuen5vagRQqRTVTRszXNqBZC6uBPZ"} 1058306737820
solana_validator_rewards{pubkey="8jxSHbS4qAnh5yueFp4D9ABXubKqMwXqF3HtdzQGuphp"} 68242612701
solana_validator_rewards{pubkey="irKsY8c3sQur1XaYuQ811hzsEQJ5Hq3Yu3AAoXYnp8W"} 24030806166

solana_node_pubkey_balances

Description

Balances of node identity accounts in lamports. Those balances are used to fund vote transactions produced by the voting nodes. Since a zero balance would lead to the node identity account being garbage-collected, one can set a reminder alarm to refill that account.

Sample output

solana_node_pubkey_balances{pubkey="4YGgmwyqztpJeAi3pzHQ4Gf9cWrMHCjZaWeWoCK6zz6X"} 6792793021
solana_node_pubkey_balances{pubkey="FoigPJ6kL6Gth5Er6t9d1Nkh96Skadqw63Ciyjxc1f8H"} 33408113791
solana_node_pubkey_balances{pubkey="G2TBEh2ahNGS9tGnuBNyDduNjyfUtGhMcssgRb8b6KfH"} 170569140828
solana_node_pubkey_balances{pubkey="zeroT6PTAEjipvZuACTh1mbGCqTHgA6i1ped9DcuidX"} 224893658626

solana_node_versions

Desctiption

The count of node versions.

Sample output

solana_node_versions{version="1.6.21"} 2
solana_node_versions{version="1.7.10"} 2

solana_nodes

Description

The number of nodes.

Sample output

solana_nodes 4

solana_average_slot_time

Description

The average slot time in the current epoch, in seconds.

Sample output

solana_average_slot_time 0.5642300440729666

Remarks

The exporter calculates this metric using the slot index of the current epoch and their respective timestamps. Therefore, when using whitelists, this gauge will not reflect the performance of the whitelisted vote pubkeys.

Examples

This chapter contains example usages of solana-exporter.

Monitoring a stakepool

Although solana-exporter can export statistics about an entire cluster, many of its gauges can be limited to a particular subset of pubkeys. This can be useful for when the specified RPC node has rate-limits, and only a few pubkeys need to be monitored.

Example config.toml

rpc = 'https://api.mainnet-beta.solana.com/'
target = '0.0.0.0:9179'
vote_account_whitelist = [
    "5BAi9YGCipHq4ZcXuen5vagRQqRTVTRszXNqBZC6uBPZ",
    "F5b1wSUtpaYDnpjLQonCZC7iyFvizLcNqTactZbwSEXK",
    "8jxSHbS4qAnh5yueFp4D9ABXubKqMwXqF3HtdzQGuphp",
    "irKsY8c3sQur1XaYuQ811hzsEQJ5Hq3Yu3AAoXYnp8W"
]
staking_account_whitelist = [
    "Ckk4SDm6z7vucvN7cSXBi26sohtrWaQj3zxSEBtiQXgJ",
    "ACNaRHi3t6h3bbS2x5qt1AVkMaCws8qSRRzoMdSF481X",
    "H8CHD1cqybdYQyhvpdxBFLnX9tyUzhPaouVACxANCwNF",
    "AfXa9ndMAHHQ6jHC2rQ79FLKwq8VDYAgRAPhjKN6grn3",
    "HXK8UkRvjHYv1RVgJJnWtmB9AhSeYRgckpjLcaR7y1Yb",
    "HJTyg3FpikHurh5Hi5fdBYCKd1YrVQe2BY3AQCMycUmX",
    "ANyjwHQXR6oiuPQCTXkq7ENFT1q8aMVBrXcPKTYXXmb2",
    "6BaRjonCB91dFfhNRT4C1EndNydJdEfxheYPtApKyY9y"
]

[maxmind]
username = 'username'
password = 'password'

This is a sample configuration file that instructs solana-exporter to only export statistics related to the specified pubkeys, using a public RPC node. By not fetching information about all validators, it is possible to avoid slow scrape times due to rate limiting.

Be sure to understand the behaviour of APY gauges when modifying pubkey_whitelist .

Monitoring using Grafana

After solana-exporter has been appropriately configured, set up Prometheus and Grafana with the sample dashboard supplied. You should see statistics on your stakepool.

Stake pool APY metrics Stake pool APY metrics

Stake pool decentralisation metrics Stake pool decentralisation metrics

Stake pool metrics Stake pool metrics

Using Grafana alerts

After setting up solana-exporter to monitor a stakepool, it may be useful to set up Grafana alerts to notify relevant operators of critical changes, and even automatically act to fix potential issues.

This example uses the new Grafana 8 Alerts.

Creating an alert

For extra documentation on how to create and manage Grafana alerts, refer to the official documentation.

Node delinquency

In this example, we will make an alert that fires if a voting pubkey becomes delinquent for over 12 consecutive hours.

  1. Create a Grafana managed alerting rule. Give it an appropriate name and place it in a folder.

    Alert type

  2. Set up the query to be alerted on. You need three queries:

    • Prometheus (A) - solana_validator_delinquent{} == 1. This filters for all delinquent (1) nodes. Select an appropriate time-range. For example, a "now-15m to now" time range means that, when evaluating, the alert system treats any duration of delinquency during the last 15 minutes as delinquency.
    • Expression (B) - Reduce (A) by using the Count function. This counts how many delinquent nodes there are.
    • Expression (C) - Use the Math expression $B > 0. This expression will be used to fire an alert.

    Queries to be alerted on

  3. Define the alert condition to use (C) and evaluate every 1m for 12h. This instructs Grafana to check for delinquency every minute, and firing an alert if a node is delinquent for more than 12 hours. Set the "Alert state if no data or all values are null" to be "OK", as the Prometheus query returns nothing if all nodes are active. Finally, choose if you wish timeouts or query errors to fire an alert.

To test this alert, invert the query (A) with solana_validator_delinquent{} == 0 and temporarily reduce the firing threshold from 12h to a more responsive value. Within a few minutes, an alert should be fired.

Receiving alert notifications

See Grafana contact points on the multitude of ways of receiving Grafana alerts.