My monitoring stack was 59% of my DNS traffic, so I cached it
I had a Pi-hole pinned at 100 percent CPU. Giving its container more cores fixed the symptom, and that is its own story. While I was in there, the query log told me something I did not expect: a single host, my monitoring stack, was responsible for 59 percent of every DNS query Pi-hole was answering. Adding cores treated the symptom. This is the part where I went after the cause.
Where the queries were coming from
The monitoring host runs Prometheus and a stack of exporters. Prometheus scrapes its targets on a schedule, and to scrape a target by hostname it first has to resolve that hostname. Mine had 31 active targets across 21 unique hostnames, scraping every 15 seconds. With no caching in the middle, every one of those scrapes sent a fresh DNS query, and it sent two: an A lookup for the IPv4 address and an AAAA lookup for IPv6. Twenty-one hostnames, doubled for A and AAAA, every 15 seconds, forever. That worked out to about 370 queries a minute, and it was 59 percent of Pi-hole’s entire load.
So the real fix sits upstream of Pi-hole: stop sending it the same questions hundreds of times a minute.
A small cache in front of the firehose
The answer is a caching resolver on the monitoring host itself, so repeated lookups are served locally and only genuine misses ever reach Pi-hole. I used dnsmasq for it, a tiny, boring, extremely good caching DNS server.
Pointing the Docker containers at it took one wrinkle. A container’s resolv.conf points at 127.0.0.11, Docker’s own embedded resolver, and you do not edit that directly. Instead you tell Compose what the embedded resolver should forward to:
services:
prometheus:
dns:
- 172.18.0.1 # the docker bridge gateway, where dnsmasq listens
Now Prometheus’s lookups go to Docker’s resolver, which forwards to dnsmasq on the host, which answers from cache and only forwards a real miss upstream to Pi-hole.
Two settings that were stopping it from caching anything
Standing the cache up was easy. Getting it to actually cache took two corrections, and both were me misunderstanding dnsmasq.
First, I reached for local-ttl to set a minimum cache time, and it did nothing. The logs showed every query still being forwarded. The reason is that local-ttl only applies to names dnsmasq serves from /etc/hosts. It has no effect on responses that came from an upstream server, which is all of mine. The knob I actually wanted was min-cache-ttl, which forces a minimum lifetime on cached upstream responses regardless of the short TTLs they arrive with. I set that, and the A queries started caching.
Second, the AAAA queries, half of all the lookups, still forwarded every single time. The clue was in the log: the AAAA responses came back as NODATA-IPv6, meaning the hostname exists but has no IPv6 address. That is a perfectly valid answer and worth caching, but my config had no-negcache set, which tells dnsmasq to never cache negative answers. So every AAAA lookup for an IPv4-only host went all the way to Pi-hole, every time. Removing no-negcache and adding neg-ttl=1800 to keep negative answers for 30 minutes fixed the other half of the problem.
The config that finally worked:
cache-size=10000
min-cache-ttl=1800 # force a 30-minute floor on cached upstream answers
max-cache-ttl=3600 # cap at one hour
neg-ttl=1800 # cache NODATA / negative answers for 30 minutes too
The result
The next time I checked the cache, it was returning 792 hits against 5 misses over a two-minute window, a 99.4 percent hit rate. Query volume from the monitoring host dropped from about 370 a minute to 87-130, a 65 to 76 percent cut. Together with giving the Pi-hole container more cores, that took it from pinned at 100 percent to sitting around 25-30 percent with capacity to spare.
Lessons
- Look at who is actually querying before you scale the thing being queried. One host was 59 percent of my DNS load, and it was my own monitoring. The cheapest query is the one you never send.
local-ttlonly covers/etc/hosts, so it is the wrong knob for caching upstream answers. To force caching of upstream responses with short TTLs, usemin-cache-ttl.- Cache your negatives.
NODATA-IPv6is a valid answer. Withno-negcacheset, every AAAA lookup for an IPv4-only host forwards forever, and AAAA is half of your lookups. - Cache at the noisy edge. Putting the resolver on the host doing the querying kept those queries off the network and off Pi-hole entirely, instead of just helping the central server survive them.
Related reading
Migrating Pi-hole from a Raspberry Pi to a Proxmox LXC
Replacing pi2.internal (Raspberry Pi 4) with pihole01, a Proxmox LXC container, as the new Pi-hole master. The migration itself was uneventful; the surprises were in TLS, Pi-hole v6 exporter auth, and Grafana label relabeling.
Getting Nebula-Sync working with Pi-hole v6: stale passwords and a redirect trap
One of my Pi-hole replicas had zero local DNS records while the master had 78. The trail led through Pi-hole v6 app passwords that did not match my vault, and an HTTP to HTTPS redirect that quietly broke the sync client.
Nebula-Sync, Pi-hole v6 API keys, and the app_sudo teleporter gotcha
After rotating credentials, Nebula-Sync started failing with auth and teleporter errors. The replica needed webserver.api.app_sudo enabled, not just updated API keys in the env file.
Ready to Transform Your Career?
Let's work together to unlock your potential and achieve your professional goals.