Automating RouterOS updates with Ansible in the homelab
RouterOS updates are easy to postpone. The edge router and core switch are not like a container you can roll back in five minutes if something goes sideways. A failed hop mid-update can mean walking to the rack or driving home. At the same time, never updating is its own risk. I wanted a repeatable path that bakes in backup, separates package updates from RouterBOARD firmware, and proves SSH comes back before I declare victory.
This is how I wired that into Ansible for my MikroTik gear, what I still treat as manual judgment calls, and what I would tighten next.
The problem
Manual updates scale poorly across multiple RouterOS devices. Clicking or pasting commands in Winbox works, but it does not leave a clean audit trail in the same place as the rest of the lab. I already drive DHCP, leases, and large chunks of config from playbooks. Updates were the odd job that still lived outside that loop.
The other wrinkle is that RouterOS packages and RouterBOARD firmware are related but not identical. You can be current on the OS build and still owe a board firmware step, or the other way around. I wanted one playbook that walks the full sequence without skipping the reboot waits.
What I built
The playbook lives in my homelab repo at iac/ansible/playbooks/infrastructure/routeros-update.yml. It targets the routeros inventory group over SSH using the community.routeros collection with ansible_network_os: community.routeros.routeros and connection: ansible.netcommon.network_cli.
Rough flow:
- Snapshot state with
/system package printand/system routerboard printso the run has context in the log. - Backup before change: binary backup (
/system backup save) plus a text export (/export file=…) so I have both a restore point and something I can diff in git-friendly form. - Check for RouterOS packages with
/system package update check-for-updates. If the output says a new version is available and the check did not error, run/system package update install, then wait for SSH on port 22 from the control host withwait_for, and hit/system identity printto confirm the session works. - Re-read RouterBOARD info after the OS step. If
current-firmwareandupgrade-firmwarediffer, run/system routerboard upgrade, then/system rebootif needed, wait again, and verify.
The playbook is defensive about connectivity. Update checks can fail when the device cannot reach MikroTik’s download infrastructure. In that case it warns and skips the install rather than assuming silence means “up to date.”
You can aim at one box with extra vars, for example -e device=switch01, or rely on the default host pattern. For gear that routes the whole house, I still treat one device at a time as the sane default. Parallel forks against every router and switch at once is a great way to learn how much you rely on the network being up.
Risks I keep in my head
Automation does not remove outage windows. A RouterOS upgrade still reboots. A firmware step can reboot again. Anything that depends on the device being reachable (VPN, DNS path, management VLAN) will wobble during the window. That is why backups run first and why I watch the wait steps instead of walking away immediately.
Monitoring noise is the other side. If Uptime Kuma or anything else pings those IPs, you will get alerts during every reboot. Silencing maintenance windows is still on my list. The playbook is not a substitute for telling humans “I am touching the router.”
Plane and the ticket loop
I use self-hosted Plane for lab work. The longer-term story is to connect “updates available” to a ticket so the work is visible and scheduled instead of living only in Ansible logs. That is separate from the playbook itself. The playbook answers how to apply updates once I decide to run it. Plane answers when and why it hit the top of the queue. Related lab tracking for this work ties back to LAB-75 in my workspace.
What I verified
After a full run, devices landed on RouterOS 7.20.6 with matching RouterBOARD firmware 7.20.6 on the hardware I exercised. The important part was not the exact numbers. It was that package print, routerboard print, and SSH all agreed after each wait block.
Lessons learned
- Treat OS packages and board firmware as one pipeline, not two unrelated chores. The playbook mirrors the real order: OS first, re-check board, then firmware if needed.
- Backups before install are non-negotiable. Binary plus export gives you restore and readability.
- Parallelism is a policy choice. Ansible may fork across hosts by default. For this class of device, serial execution (or explicit
-e device=) matches how much downtime I can stand. - Failed update checks are signal. Skipping the install when
check-for-updateserrors avoids a false sense of security. - Alerts need a maintenance story if I do not want every reboot to page me.
References
Related reading
Auditing static DHCP leases in RouterOS: ten mismatches and four missing devices
What happens when your DHCP config drifts from your network state file. How I found fourteen lease issues in RouterOS and fixed them with Ansible and a device-by-device review.
Selective internet access for IoT devices with RouterOS address lists
VLAN 30 blocks all IoT traffic from the internet by default. This is how I punch selective holes for specific devices without rewriting the firewall per device.
Fixing the VLAN 30 IoT DNS Isolation Leak to Pi-hole
Pi-hole logs showed IoT devices on the isolated VLAN hitting internal DNS anyway. The cause was RouterOS DHCP plus firewall rule order, not a single mis-ticked box.
Ready to Transform Your Career?
Let's work together to unlock your potential and achieve your professional goals.