HTTP "Host" header in scrape target

Hi! I’ve been fiddling with a problem I’ve been having and I’m hoping the community here might provide some help.

I’m working on an environment with a load balancer before two webservers. Each webserver has a few vhosts on port 80, and a few of those expose metrics (at the standard /metrics path). These metrics are unique for each vhost on each webserver. The metrics can be reached on each webserver by specifying an HTTP Host header, eg like this: curl -H 'Host: myapp.com' http://web01.internal.com. I don’t expose the metrics through the load balancer, because I don’t want Prometheus to mix metrics of multiple different webservers: each vhost+webserver should be a unique target. In short, curl http://myapp.com/metrics isn’t a viable metric target.

Now, I’ve gotten so far as to dynamically getting a list of supported vhosts on each webserver by using puppetdb_sd_configs and then targetting the webservers by using relabel_configs. However, I can’t seem to set the Host header in a scrape_config job… The only header I see that is settable, is an Authorization header, and that is on the job level, not the target level.

I’ve tried working around the issue by testing or thinking up a few things:

  • Looked at the blackbox exporter, but that one doesn’t return the body
  • Adding an additional unique port to each vhost, but that seems like an unnecessary security risk
  • Adding an additional server alias to each vhost that points to the webserver instead, but that would mean adding (vhosts*webservers) DNS entries every time
  • Modifying the default vhost on each webserver to have a special URL that only answers to Prometheus servers, and can perform proxy requests to itself with a modified Host header that it gets from the request URI → Apache yak that doesn’t like to be shaved
  • Modifying the vhost configuration of vhosts that provide /metrics in a similar way to the previous point, but instead of proxying to itself, they proxy to a different webserver → Requires going through the load balancer again, which isn’t part of the actual problem, and I can’t filter on only allowing requests from Prometheus servers
  • Could ask the devs of the application to modify the applications’ /metrics endpoint to work with an HTTP parameter that specifies a webserver (eg /metrics?node=web01), but that would only be a solution for this exact application, and require valueable dev time
  • Create a custom exporter on the Prometheus server, which basically emulates the curl command at the top, with customizable arguments → Seems like an unnecessary extra step for something so trivial

I feel like there should be some special meta-label like __host__ similar to __address__ that can set the Host header in scrapes, but I couldn’t find it. Is there one? Is this on the roadmap somewhere? Am I looking at this problem all wrong? Are there any alternative solutions available that I haven’t thought of? Looking for any feedback here.

Would it not be possible to make http://web01.internal.com/metrics, http://web02.internal.com/metrics, etc. work?

Well, no, because that points to a default vhost which doesn’t have any metrics. Each webserver hosts multiple websites, but they are all on port 80. In Apache config language:

<VirtualHost *:80>
  ServerName default
  # doesnt provide /metrics
</VirtualHost>
<VirtualHost *:80>
  ServerName myapp.com
  # provides /metrics
</VirtualHost>
<VirtualHost *:80>
  ServerName myotherapp.com
  # doesnt provide /metrics
</VirtualHost>

So if a request comes in on http://web01.internal.com, the first/default <VirtualHost> directive is relevant. And if I request http://myapp.com, it points me to the load balancer instead, which does point me to the correct <VirtualHost> directive in one of the webservers, but a random webserver each scrape, so the metrics would be all over the place.

Okay so an update for those interested. I worked around the problem by using one of the methods I listed in my initial post (option 3).

  • I added ServerAlias directives in my Apache vhost config files. These are unique per vhost per webnode, so like “myapp-com-web01-internal-com”.
  • Then I modified my DNS zonefiles to use bind’s awesome $GENERATE directive. This lets me just add one line per vhost to automagically add CNAME records for all webnodes.

Hope this helps others facing the same issue.