HTTP "Host" header in scrape target

hbro · September 27, 2022, 1:24pm

Hi! I’ve been fiddling with a problem I’ve been having and I’m hoping the community here might provide some help.

I’m working on an environment with a load balancer before two webservers. Each webserver has a few vhosts on port 80, and a few of those expose metrics (at the standard /metrics path). These metrics are unique for each vhost on each webserver. The metrics can be reached on each webserver by specifying an HTTP Host header, eg like this: curl -H 'Host: myapp.com' http://web01.internal.com. I don’t expose the metrics through the load balancer, because I don’t want Prometheus to mix metrics of multiple different webservers: each vhost+webserver should be a unique target. In short, curl http://myapp.com/metrics isn’t a viable metric target.

Now, I’ve gotten so far as to dynamically getting a list of supported vhosts on each webserver by using puppetdb_sd_configs and then targetting the webservers by using relabel_configs. However, I can’t seem to set the Host header in a scrape_config job… The only header I see that is settable, is an Authorization header, and that is on the job level, not the target level.

I’ve tried working around the issue by testing or thinking up a few things:

Looked at the blackbox exporter, but that one doesn’t return the body
Adding an additional unique port to each vhost, but that seems like an unnecessary security risk
Adding an additional server alias to each vhost that points to the webserver instead, but that would mean adding (vhosts*webservers) DNS entries every time
Modifying the default vhost on each webserver to have a special URL that only answers to Prometheus servers, and can perform proxy requests to itself with a modified Host header that it gets from the request URI → Apache yak that doesn’t like to be shaved
Modifying the vhost configuration of vhosts that provide /metrics in a similar way to the previous point, but instead of proxying to itself, they proxy to a different webserver → Requires going through the load balancer again, which isn’t part of the actual problem, and I can’t filter on only allowing requests from Prometheus servers
Could ask the devs of the application to modify the applications’ /metrics endpoint to work with an HTTP parameter that specifies a webserver (eg /metrics?node=web01), but that would only be a solution for this exact application, and require valueable dev time
Create a custom exporter on the Prometheus server, which basically emulates the curl command at the top, with customizable arguments → Seems like an unnecessary extra step for something so trivial

I feel like there should be some special meta-label like __host__ similar to __address__ that can set the Host header in scrapes, but I couldn’t find it. Is there one? Is this on the roadmap somewhere? Am I looking at this problem all wrong? Are there any alternative solutions available that I haven’t thought of? Looking for any feedback here.

stuart · September 27, 2022, 1:47pm

Would it not be possible to make http://web01.internal.com/metrics, http://web02.internal.com/metrics, etc. work?

hbro · September 27, 2022, 1:56pm

Well, no, because that points to a default vhost which doesn’t have any metrics. Each webserver hosts multiple websites, but they are all on port 80. In Apache config language:

<VirtualHost *:80>
  ServerName default
  # doesnt provide /metrics
</VirtualHost>
<VirtualHost *:80>
  ServerName myapp.com
  # provides /metrics
</VirtualHost>
<VirtualHost *:80>
  ServerName myotherapp.com
  # doesnt provide /metrics
</VirtualHost>

So if a request comes in on http://web01.internal.com, the first/default <VirtualHost> directive is relevant. And if I request http://myapp.com, it points me to the load balancer instead, which does point me to the correct <VirtualHost> directive in one of the webservers, but a random webserver each scrape, so the metrics would be all over the place.

hbro · October 26, 2022, 1:50pm

Okay so an update for those interested. I worked around the problem by using one of the methods I listed in my initial post (option 3).

I added ServerAlias directives in my Apache vhost config files. These are unique per vhost per webnode, so like “myapp-com-web01-internal-com”.
Then I modified my DNS zonefiles to use bind’s awesome $GENERATE directive. This lets me just add one line per vhost to automagically add CNAME records for all webnodes.

Hope this helps others facing the same issue.

Topic		Replies	Views
Blackbox - http probe not working but https does Exporters and Metrics	3	4029	March 8, 2024
Blackbox_exporter and SNI Exporters and Metrics	1	145	January 16, 2025
Dns_sd_config federation from IP targets behind proxies Prometheus server	0	427	March 22, 2022
Only some metrics of a target are not scraped General Help/Support	3	397	February 9, 2022
Stripping protocol and optional port from target General Help/Support	1	204	January 28, 2024

HTTP "Host" header in scrape target

Related topics