Scaling statsd exporter throughput on a single host

Hi :wave:,

I’m following up on a question I asked over on prometheus/statsd_exporter.

Basically I want to get higher throughput through the exporter, but I am not sure the best way to go about it. I believe I’m getting bottlenecked by the exporter at around ~360k events processed per second, and I’d like to be closer to ~3 million per second.

Is there potentially any way to take advantage of SO_REUSEPORT to run multiple exporters in parallel, or any other workarounds to get there?

In this case I would really like to avoid sharding my application because of a bottleneck in the exporter.

I could try to have prometheus scrape multiple ports on the same instance, but it’s a bit annoying to open up 10+ ports for the exporter.

There are more details on setup in the GitHub issue.

Thanks a lot! :grinning:

If you’re looking at 3m events/second on a single node, I would highly recommend considering not using statsd. There are far more efficient ways to instrument with Prometheus for that amount of scale.

If this is your application, what language is it written in? Using a native client library is probably a much better solution.

While it’s possible to do, it would take some work to add a packet processing thread pool to the statsd_exporter.

The application is written in Rust.

What this application is doing is pulling AWS ALB logs off of S3, parsing through them, and then writing out the resulting metrics to prometheus. This allows us to analyze the request paths and break down the metrics on a per-endpoint basis. Because the S3 API is fairly limited, it’s a bit of a pain to break up the log processor (and adds overhead to deployment).

I’ll try using the rust prometheus client and see how that goes, I was forgetting that this is possible. Thanks!

Yes, I don’t know much about the rust client. But 3M/sec with Go would be 0.1 CPU. (about 25ns of CPU per Observe()) I assume the Rust client should be able to reach that level of performance.

I agree that you should use native instrumentation if at all possible.

That being said, a few people did greatly optimize the statsd exporter for centralized use cases. I would recommend using the TCP endpoint, so that you get backpressure instead of silently dropped metrics. Avoid regex matches, they are significantly more expensive. Pay attention to the metric name cache metrics and make sure it is large enough for the hot metric set.

Yeah using the native client worked out much better. Thanks for all the info!

Going to link this presentation on rust-prometheus here.

Using the native rust client on a single thread I was able to get in the ~15m observations/second range, dropping down into the 5-10m observations/second range depending on the number of labels on a single core (**Intel** (R) Core(TM) i7-8750H CPU @ 2.20GHz):

Example code for anyone interested:

extern crate lazy_static;

extern crate prometheus;
use prometheus::{Encoder, HistogramVec, TextEncoder};

use hyper::service::{make_service_fn, service_fn};
use hyper::{Body, Request, Response, Server};
use rand::Rng;
use std::convert::Infallible;

lazy_static! {
    static ref RANDOM_HIST: HistogramVec =
        register_histogram_vec!("example_random_hist", "Random value.", &[], vec![5.0]).unwrap();

async fn metrics_service(_req: Request<Body>) -> Result<Response<Body>, Infallible> {
    let encoder = TextEncoder::new();
    let mut buffer = vec![];
    let mf = prometheus::gather();
    encoder.encode(&mf, &mut buffer).unwrap();
        .header(hyper::header::CONTENT_TYPE, encoder.format_type())

fn start_metrics_endpoint() {
    let addr = ([0, 0, 0, 0], 9102).into();
    let make_metrics_service = make_service_fn(|_conn| async {
        // service_fn converts our function into a `Service`
        Ok::<_, Infallible>(service_fn(metrics_service))
    let server = Server::bind(&addr).serve(make_metrics_service);


async fn main() {

    let mut rng = rand::thread_rng();
    loop {
        let n1: f64 = rng.gen_range(0.0..10.0);