Skip to content
Infrastructure

Getting Nebula-Sync working with Pi-hole v6: stale passwords and a redirect trap

By Victor Da Luz
pihole nebula-sync dns homelab ansible

I run more than one Pi-hole so that DNS survives any single box dying: a master in a Proxmox LXC container, a second replica in another LXC, and a third on the QNAP NAS. Keeping their configuration identical by hand does not work, which is what Nebula-Sync is for: it reads the master’s configuration and pushes it to the replicas.

Except it was not pushing anything. I noticed when I queried a hostname against the third Pi-hole and got nothing back. The master had 78 local DNS records. The replica had zero.

The symptom

The Nebula-Sync logs made it clear this was not a partial failure. Authentication against the Pi-holes was failing with 401 errors, so no sync had been completing at all. The replica was not slightly behind; it had never received the records. Every query that hit it for an internal hostname was quietly falling through to upstream DNS and failing.

That is the nasty part of a redundant DNS setup with broken replication: everything looks healthy. The replica answers queries, blocklists work, the dashboard is green. You only find out the local records are missing when a client happens to land on that instance and asks for an internal name.

Root cause one: the env file did not match the vault

Pi-hole v6 replaced the old single API token with app passwords: dedicated credentials for API clients, created in the web UI under Settings, separate from the password you log into the dashboard with. Nebula-Sync authenticates with those via Pi-hole’s /api/auth endpoint, and it reads them from an env file. Mine lives at /etc/nebula-sync/nebula-sync.env, deployed by Ansible, with one entry per instance:

PRIMARY=http://127.0.0.1:8080|<master app password>
REPLICAS=http://192.168.x.13:8080|<replica app password>,http://192.168.x.14:8080|<replica app password>
FULL_SYNC=true
RUN_GRAVITY=true

The app passwords in that file were stale. The correct ones were sitting in the Ansible vault, but the env file on disk had older values that no longer matched what the Pi-holes expected. Hence the 401s: Nebula-Sync was knocking with credentials that had drifted out of date.

This is the classic two-sources-of-truth failure. The vault was right, the deployed file was wrong, and nothing compared them. Updating the env file from the vault fixed authentication immediately.

Root cause two: the HTTPS redirect

With auth fixed, the second problem surfaced. Plain HTTP calls to the Pi-hole API were not getting API responses; they were being answered with a redirect to HTTPS, and Nebula-Sync’s API client does not follow redirects. For a browser, a bounce to HTTPS is correct behavior. For a machine-to-machine API client it is a wall.

The mechanism is Pi-hole v6’s webserver port syntax. The port config is a comma-separated list where suffix flags change behavior: s marks a TLS port, and r redirects all of that port’s traffic to the first configured TLS port. My Pi-holes were configured with 8080r,8443s, so every plain-HTTP request to 8080, API calls included, got bounced to 8443.

I made three changes on the live boxes to clear the path:

  • Dropped the r flag from the HTTP port, so port 8080 serves API requests instead of redirecting them. This is the change that actually mattered.
  • Set the Pi-hole domain setting back to the pi.hole default. The dashboards had been using a custom domain that pointed at the HTTPS side; resetting it took that name out of the API path entirely.
  • Pointed Nebula-Sync at IP addresses instead of hostnames: 127.0.0.1 for the master talking to itself, LAN IPs for the replicas. That removed hostname resolution and TLS naming complications from sync traffic.

Repairing the replica

Fixing sync going forward did not fix the replica’s empty record set by itself, so I pushed the 78 records across with the Ansible playbook I already use for managing Pi-hole local DNS entries, then ran Nebula-Sync manually:

sudo systemctl start nebula-sync.service
journalctl -u nebula-sync.service -n 50 --no-pager

The journal ended with Sync completed, this time with all three instances authenticating. After that I verified the result the only way that counts, by asking the replicas directly:

dig +short internal-hostname @192.168.x.14

Records present on all three.

The follow-through was only partial, and it is worth being honest about that. The app passwords (from vault) and the IP-based Nebula-Sync URLs made it back into the Ansible host vars, so a redeploy keeps those. The domain and port changes did not; they lived only on the boxes, while the repo kept the old values. That is the same two-sources-of-truth failure that caused root cause one, wearing different clothes, and it means a full redeploy from the repo would have quietly resurrected the redirect.

Lessons

  • A DNS replica with broken replication looks healthy. It answers queries and shows a green dashboard while silently missing every local record. Monitor replication directly; a liveness check cannot see this failure.
  • Pi-hole v6 app passwords are not the web UI password. API clients authenticate with dedicated app passwords; if you are still passing the dashboard password to an API client, that is a 401 waiting to happen.
  • Secrets drift when there are two copies. The vault and the deployed env file disagreed and nothing noticed. The deploy pipeline now owns the env file, so the vault is the only place a password lives.
  • Redirects are for browsers. Pi-hole v6’s r port flag bounces everything on that port to TLS, and an API client that will not follow a redirect fails completely, not gracefully. A port serving plain-HTTP API clients must not carry the r flag.
  • A fix applied on the box but not in the repo is a time bomb. The stale passwords that started this incident and the live-only domain and port changes that ended it are the same failure: the repo has to own the config, or the next deploy undoes the fix.

Related reading

Infrastructure

Migrating Pi-hole from a Raspberry Pi to a Proxmox LXC

Replacing pi2.internal (Raspberry Pi 4) with pihole01, a Proxmox LXC container, as the new Pi-hole master. The migration itself was uneventful; the surprises were in TLS, Pi-hole v6 exporter auth, and Grafana label relabeling.

Read

Ready to Transform Your Career?

Let's work together to unlock your potential and achieve your professional goals.