Researching monitoring solutions for a homelab - Victor Da Luz

After setting up the Proxmox cluster, I needed a monitoring solution to keep track of services and notify me when things go down. I had specific requirements: it needed to be self-hosted, support notifications through Apprise, work well in an LXC container, and handle both internal services and external websites.

I’d used Uptime Kuma in my previous Docker Swarm setup, but I wanted to evaluate other options. Maybe there was something better suited to the new Proxmox architecture, or something I hadn’t considered. I researched several solutions to make sure I was choosing the right tool for the job.

The evaluation process taught me something about how different tools approach monitoring, and why the choice matters for homelab infrastructure.

What I was looking for

Self-hosted capability was non-negotiable. I wanted full control over the monitoring data and no dependency on external services. Everything needed to run on my Proxmox cluster.

Notification support was critical, especially Apprise integration. Apprise lets me route notifications through multiple channels, which is useful for ensuring I see alerts regardless of where I am. Native Apprise support would be ideal, but webhook integration would work too.

The solution needed to work well in an LXC container. Following my architecture decision to use one LXC container per service, the monitoring tool had to be lightweight and efficient. Heavy resource requirements would defeat the purpose.

I needed to monitor different types of services. HTTP and HTTPS endpoints for web services, TCP for services that don’t have web interfaces, ping for basic connectivity, and DNS queries. Certificate expiration monitoring would be useful too.

Historical uptime data mattered. I wanted to track uptime trends over time, not just current status. This helps identify patterns and understand service reliability.

A web-based interface was essential. I’d be checking status regularly, and a clean, user-friendly interface makes that easier.

Evaluating the options

I looked at several different solutions to understand what was available. Each had different strengths and approaches to monitoring.

Some solutions I evaluated were less actively developed or had limited notification options. I wanted something with clear maintenance, active development, and good notification support. Resource efficiency was also important since everything would run in LXC containers on the Proxmox cluster.

Statping is a Go-based solution that’s efficient. But the project’s maintenance status was unclear, and development seemed less active. The notification options were limited, and the interface wasn’t as user-friendly as I wanted.

Cachet is a PHP-based solution that has maintenance concerns and less active development. The monitoring capabilities were limited, the setup was more complex than necessary, and it lacked Apprise integration. It felt like a tool from a different era.

Uptime Kuma was the solution I’d used before, and it still seemed like the best fit. It’s a Node.js application that’s lightweight and actively maintained. It supports all the monitor types I needed, has native Apprise integration via webhook, and includes certificate expiration monitoring. The interface is user-friendly.

The resource requirements were reasonable. About 512MB of RAM and minimal CPU usage, which makes it perfect for an LXC container deployment. It uses SQLite by default, which is simple and sufficient for homelab scale, though PostgreSQL is available if needed.

While there’s no native Prometheus exporter, metrics are available via the API. This means Prometheus integration is possible if needed, though it would require some setup. For my current needs, this wasn’t a concern.

Why Uptime Kuma made sense

I ended up choosing Uptime Kuma for several reasons. It met all the requirements, had active development, and I already knew it worked well from previous experience.

Active development matters for homelab tools. When you’re running infrastructure that needs to keep working, you want tools that are maintained and updated. Uptime Kuma has regular updates and new features, which suggests it’s not going to be abandoned.

The Apprise integration is straightforward. Uptime Kuma supports Apprise via webhook, which means I can route notifications through Apprise to all my notification channels. This gives me flexibility in how I receive alerts.

Comprehensive monitoring capabilities cover everything I need. HTTP and HTTPS monitoring with keyword detection, TCP connectivity checks, ping monitoring, DNS queries, and certificate expiration tracking. It handles both internal services and external websites.

Resource efficiency makes it perfect for LXC deployment. At around 512MB of RAM and minimal CPU usage, it fits well in the one-container-per-service architecture. The SQLite database keeps things simple and efficient.

Familiarity counts too. Having used it before meant I knew it worked reliably and understood how to configure it. Starting with something proven reduces the learning curve and potential issues.

Setting it up in Proxmox

The deployment followed my architecture principles. One LXC container running Uptime Kuma, native Node.js installation without Docker, and straightforward resource allocation.

I created an unprivileged LXC container with Debian 13. Static IP assignment, SSH key authentication, and basic resource limits. The installation runs as a systemd service, which integrates well with the container lifecycle.

Traefik handles reverse proxy duties with HTTPS. Using a wildcard certificate via Cloudflare’s DNS-01 challenge, and WebSocket support for real-time updates. I had to fix the ping endpoint routing to work over HTTPS, but that was straightforward.

Monitor configuration is automated via Python script. The script uses the Socket.IO API to create monitors from YAML definitions stored in my infrastructure-as-code repository. This keeps monitor configuration version-controlled and repeatable.

Proxmox replication provides data backup. The container is replicated from node01 to node02 every 15 minutes via ZFS replication. This creates a copy of the container data on the secondary node, but the service won’t automatically start there if the primary node fails. For automatic failover, you’d need to configure Proxmox HA separately. The replication ensures the data is preserved, which is useful for recovery, but it’s not automatic high availability.

The challenges I encountered

A few things required troubleshooting during setup.

API authentication was more complex than expected. Uptime Kuma’s API keys are only for the Prometheus metrics endpoint. Monitor management requires Socket.IO authentication with username and password. I updated my automation script to use the uptime-kuma-api Python library for proper authentication.

The database schema had an unexpected requirement. The library didn’t include a required conditions field in monitor data, which caused SQLite constraint violations. I implemented a monkey-patch to add an empty conditions array to the monitor data before creation.

Traefik routing needed adjustment. The ping endpoint was only accessible on the HTTP entrypoint, but health checks needed HTTPS. I added an explicit router for the /ping path on the HTTPS entrypoint to make health checks work correctly.

These were minor issues that were straightforward to solve. The overall setup process was smooth, and the automation makes it easy to manage monitors going forward.

What’s next

The monitoring solution is up and running, tracking internal services like Traefik, Pi-hole, Proxmox nodes, and Home Assistant. The next steps are configuring Apprise notifications and adding external website monitoring for my personal sites.

I’ll add more services as they’re deployed to the Proxmox cluster. The automated configuration makes it easy to add new monitors, and the YAML-based approach keeps everything version-controlled and documented.

The choice of Uptime Kuma has worked well so far. It’s lightweight, reliable, and fits the architecture. For a homelab monitoring solution, it strikes a good balance between features and resource efficiency.