Building a two-node Proxmox storage architecture with ZFS

After migrating from Docker Swarm to Proxmox, I needed to make a critical decision about storage architecture. The cluster has two nodes with 8GB RAM each and newly installed 1TB SATA SSDs. I had to choose between shared storage solutions and local storage with replication.

This is the story of why I chose ZFS with local replication, how I set it up, and what I learned along the way.

Why ZFS over shared storage?

When planning Proxmox storage, I faced three main options. Each had different trade-offs for a two-node homelab cluster.

Ceph distributed storage was attractive for its fault tolerance. But Ceph requires a minimum of three nodes for proper redundancy and quorum. With only two nodes, I’d lack proper redundancy. Ceph also introduces significant operational complexity for a homelab environment. It’s powerful, but more than I need for this setup.

NFS shared storage would be simpler to set up and enable live migration. But it creates a single point of failure at the NAS. More importantly, it doesn’t match my infrastructure philosophy of using the cluster nodes as the primary compute and storage resource. The NAS should handle backups, not active storage.

ZFS with Proxmox replication was the option that made sense. Each node maintains its own ZFS pool. Proxmox handles replication through snapshots, syncing data between nodes automatically. No shared storage means no single point of failure for container data. This aligns with my principle that the cluster manages its own resources, while the NAS handles backups only.

I chose ZFS replication. It gives me redundancy without adding infrastructure dependencies, and it’s simpler than Ceph for a two-node cluster. The trade-off is that I can’t do live migration until both pools are ready, but that’s acceptable for my use case.

The storage architecture

Both nodes have identical configuration. The NVMe drives host the Proxmox operating system using ext4, with about 70GB used. The SATA SSDs form ZFS pools named pve-containers, with about 888GB available per node.

Effective capacity is 1TB total, not 2TB. Since data replicates rather than stripes, I get redundancy but not additional capacity. Both nodes store the same data, so if one fails, the other has everything.

8GB RAM per node is adequate for ZFS with container workloads. ZFS uses RAM for caching, but for container workloads in a homelab, 8GB is sufficient. I’m not running large databases or heavy I/O workloads that would require more memory.

My services baseline is around 30GB. That includes Traefik, Dashy, Uptime Kuma, Prometheus and Grafana, Plane, and Gitea. This leaves about 950GB headroom for application growth, temporary snapshots during replication, and future services.

The NAS role is backups only, not active storage. This keeps concerns separated. The cluster manages its own storage and replication, while the NAS serves as a backup destination. No active dependencies on the NAS means no single point of failure.

Setting up ZFS

Creating the ZFS pool was straightforward. On node01, I created the pool with a single command:

zpool create pve-containers /dev/sda

The pool name must be identical on both nodes for replication to work. Proxmox replication relies on matching pool names, so I made sure to use the same name on both nodes.

I verified the creation by checking pool and filesystem status:

zpool list
zfs list

The output showed 888GB available out of 894GB on the SATA SSD. The difference accounts for ZFS metadata and overhead, which is normal and expected.

Adding the pool to Proxmox required going through the web interface. In Datacenter → Storage → Add, I configured it with these settings: the ID matches the pool name pve-containers, the ZFS pool is pve-containers, content types are disk image and container (no need for ISO or snippets in a container-only setup), and I left nodes blank so it’s available to all cluster nodes.

The storage appeared in Proxmox immediately. Both nodes could see it, and I was ready to create containers that would use ZFS for their filesystems.

Testing with a container

I deployed a minimal Alpine Linux LXC container to verify storage worked. The container had VMID 100, hostname test-nginx, storage on pve-containers, and resources set to 1 core, 256MB RAM, and 5GB disk.

The container started successfully and stored its filesystem on the ZFS pool. This confirmed that ZFS integration was working correctly and that containers could be created and run from the new storage.

But then I hit a network issue that taught me something about VLAN configuration.

The VLAN tag gotcha

The container had network configured but couldn’t reach other nodes. Network debugging showed RX packets at zero and TX packets at 27. The container was transmitting but receiving nothing.

The issue was VLAN configuration. The container was configured with VLAN tag 20, which is the server VLAN in my network. When you set a VLAN tag on a container in Proxmox, it tags the traffic before it reaches the bridge.

The problem was that my network infrastructure expects untagged traffic on the physical port. The physical port is configured as an access port for the main subnet, which typically uses VLAN 1 (the default untagged PVID) or whatever the port is configured for. By setting tag=20 on the container, Proxmox was sending tagged traffic that the network infrastructure rejected or couldn’t handle properly.

The fix was to remove the VLAN tag from the container’s network interface. This made the container send untagged traffic, which matches what the physical port expects. Containers don’t need explicit VLAN tags when the network expects untagged traffic on the main bridge.

After removing the VLAN tag, packets flowed immediately and the container reached other hosts on the network.

The takeaway is that VLAN configuration needs to match what the physical network expects. If the port is configured for untagged traffic, containers should send untagged traffic. VLAN tags on containers are for scenarios where the physical network is set up to handle tagged traffic from guests, not when the port expects untagged frames.

Completing the setup: node02 repair and rejoin

After setting up node01 with ZFS storage, node02 had a hardware issue. The SATA ribbon cable connecting the SSD was broken. The node remained in the cluster but couldn’t use its storage pool until the cable was replaced.

Once the physical repair was complete, bringing node02 back to full functionality was straightforward. I verified that node02 was still a cluster member, just offline. The cluster was running with reduced quorum using node01 and the qdevice witness.

On node02, I created the identical pool:

zpool create pve-containers /dev/sda

Both nodes now show pve-containers storage available with about 860GB each. The storage was already configured in Proxmox from when node02 was originally part of the cluster, so it appeared automatically once the pool was created.

With both pools active, replication can now be configured for containers as they’re created. Proxmox replication jobs will sync data from node01 to node02 automatically. The cluster is fully operational with redundant storage. Both nodes can run containers independently, and replication ensures data redundancy without requiring shared storage.

Storage sizing rationale

With 1TB per node and about 30GB baseline for services, I have plenty of headroom. Traefik uses about 1GB, Dashy about 500MB, Uptime Kuma about 2GB with history, Prometheus and Grafana about 8GB depending on retention policy, Plane about 5GB for database and app, and Gitea about 3GB depending on repository size.

The remaining 950GB provides room for application growth and data accumulation. Temporary snapshots during replication need space, and I’ll be adding more services over time.

Prometheus storage growth is the key variable to monitor. Metrics databases grow over time, and retention policy directly affects storage requirements. I plan to start with a conservative retention policy and adjust based on actual usage patterns.

Storage monitoring will be important. I’ll need to track pool usage, snapshot sizes, and replication performance to ensure the architecture scales as services grow.

What’s next

For replication, I’ll configure replication jobs for containers as they’re created. The direction will be node01 to node02, keeping it simple with one-way replication. I’ll test container replication and failover behavior to understand how the system behaves during node failures.

I’ll monitor replication performance and adjust schedules if needed. Replication happens via snapshots, so the frequency and timing affect both resource usage and recovery point objectives.

For backups, I’ll configure NAS storage as a backup destination only. Automated snapshot backups to the NAS will provide an additional layer of protection beyond replication. The NAS becomes a backup target rather than active storage, maintaining the separation of concerns.

I’ll document backup procedures in runbooks so the process is repeatable and maintainable. Good documentation makes it easier to recover from problems and understand the system’s behavior.

The trade-offs

This architecture trades some simplicity for resilience. I can’t do live migration until both pools are ready, which means containers need to be stopped and started on the other node rather than migrated live. But this is acceptable for my use case.

The benefits are significant. No shared storage means no single point of failure. The cluster is self-sufficient for storage, with the NAS handling backups only. ZFS provides snapshots and replication capabilities that are essential for this architecture.

The separation of OS and container storage improves performance. NVMe handles the operating system and Proxmox overhead, while SATA SSDs handle container storage. This keeps OS performance fast while providing adequate storage capacity.

Building storage architecture is about understanding trade-offs. There’s no perfect solution, only solutions that fit your constraints and requirements. For a two-node homelab cluster, ZFS with local replication provides the right balance of simplicity, redundancy, and capability.

Victor Da Luz

Building a two-node Proxmox storage architecture with ZFS

Building a two-node Proxmox storage architecture with ZFS

Why ZFS over shared storage?

The storage architecture

Setting up ZFS

Testing with a container

The VLAN tag gotcha

Completing the setup: node02 repair and rejoin

Storage sizing rationale

What’s next

The trade-offs

Ready to Transform Your Career?