Skip to content
Infrastructure

Automating Unbound Root Key Recovery After Power Failures

By Victor Da Luz
unbound pihole dns dnssec systemd homelab raspberry-pi

My homelab runs Pi-hole on two Raspberry Pi nodes with Unbound as the recursive resolver. After a power failure or hard reset, DNS would sometimes stay broken until I fixed it by hand. The failure mode was always the same. Unbound refused to start because /var/lib/unbound/root.key was corrupted, with duplicate or conflicting DNSSEC trust anchor data from an unclean write.

This post is about automating recovery so I don’t have to SSH in and delete the file every time it happens.

What root.key is

When Unbound validates DNSSEC, it needs a trust anchor for the DNS root zone. That anchor is stored on disk as root.key (by default under /var/lib/unbound/root.key). It is the starting point for the chain of trust; without a valid trust anchor, the validator will not resolve DNSSEC-signed answers the way you expect.

Unbound can fetch and refresh that anchor, but the file has to be consistent on disk. If the machine loses power while the file is being written, you can end up with a broken root.key and a resolver that refuses to start.

The problem

Unbound validates root.key on startup. If the file is bad, the validator exits and the service never comes up. DNS is down until someone removes the file and restarts Unbound. Usually a two-minute fix, but still manual.

I’d been doing the obvious thing:

sudo rm /var/lib/unbound/root.key
sudo systemctl restart unbound.service

That works, but only after I’d already noticed broken DNS and logged in. I wanted the box to fix itself.

Why it happens

After an unclean shutdown you can get duplicate anchors, conflicting versions, or a half-written file. Unbound’s strict validation is the right behavior for security; it just means recovery has to happen before the validator runs.

The approach

I run a small maintenance script in ExecStartPre. Before Unbound’s real ExecStart runs, the script removes any existing root.key, then regenerates a fresh anchor using unbound-helper or unbound-anchor so the file is present and consistent before Unbound reads it. It does not try to detect corruption, and it does not back up the old file first. The goal is a known-good anchor on every start, not forensic preservation of a bad file.

Tradeoffs I’m fine with:

  • The anchor is rebuilt on every service start, not only after corruption. That’s simpler than trying to detect “how bad” the file is.
  • Unbound already knows how to anchor itself; I’m just making sure it never reads a bad file first.

Reference implementation

The maintenance script and related files for this pattern live in homelab-tools / unbound-root-key-recovery on GitHub. That’s the portable reference; my Ansible role templates the same logic for Pi-hole nodes in the homelab repo.

Implementation

Three pieces: the script, a systemd override, and Ansible to deploy both.

1. Maintenance script

The script (validate-root-key.sh) runs before ExecStart. In short it:

  1. Logs via logger with the journal tag unbound-root-key-validator.
  2. If root.key exists at the configured path, removes it (rm -f). There is no backup step; the next step recreates the file.
  3. Regenerates the trust anchor immediately, using /usr/libexec/unbound-helper root_trust_anchor_update when available, otherwise unbound-anchor -a (and fixes ownership when needed). If neither tool exists, it logs a warning and still exits 0 so systemd still attempts to start Unbound.

The script always exits 0 so systemd continues to ExecStart.

2. Systemd override

Drop-in at /etc/systemd/system/unbound.service.d/override.conf:

[Service]
ExecStartPre=/usr/local/bin/validate-root-key.sh

That runs the script before Unbound’s main binary.

3. Ansible

I wired this into the existing Pi-hole role: template validate-root-key.sh.j2 to /usr/local/bin/validate-root-key.sh, create the unbound.service.d directory, install override.conf, reload systemd. Variables like unbound_root_key_path keep the path consistent with the rest of the role.

Logging

Removal and regeneration events show up in the journal:

journalctl -t unbound-root-key-validator -p notice
journalctl -t unbound-root-key-validator -f

What I verified

I reproduced the old failure mode (bad root.key), restarted Unbound, watched ExecStartPre run, confirmed the service came up and a new anchor appeared, and checked the journal for the validator tag. Same outcome as the manual fix, without SSH.

Reflection

The lesson for me was to stop trying to be clever about detecting corruption. Remove the file before the validator runs, regenerate the anchor with the helper tools, log it, move on. That’s easier to reason about than heuristics on half-corrupt files, and it matches how I was already fixing it by hand.

Since this went in, power blips haven’t left me with dead DNS waiting for a human. The journal still tells me when the pre-start hook ran, so I’m not flying blind. I just don’t have to babysit the resolver anymore.

Related reading

Ready to Transform Your Career?

Let's work together to unlock your potential and achieve your professional goals.