f3s: Kubernetes with FreeBSD - Part 6: Storage

Published at 2025-07-13T16:44:29+03:00, last updated Wed 19 Mar 2026

This is the sixth blog post about the f3s series for self-hosting demands in a home lab. f3s? The "f" stands for FreeBSD, and the "3s" stands for k3s, the Kubernetes distribution used on FreeBSD-based physical machines.

2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage

2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation

2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts

2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs

2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network

2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage (You are currently reading this)

2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

2025-12-14 f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

2026-04-02 f3s: Kubernetes with FreeBSD - Part 9: GitOps with ArgoCD

f3s logo

Table of Contents

Introduction

In the previous posts, we set up a WireGuard mesh network. In the future, we will also set up a Kubernetes cluster. Kubernetes workloads often require persistent storage for databases, configuration files, and application data. Local storage on each node has significant limitations:

This post implements a robust storage solution using:

The result is a highly available, encrypted storage system that survives node failures while providing shared storage to all Kubernetes pods.

Other than what was mentioned in the first post of this blog series, we aren't using HAST, but `zrepl` for data replication. Read more about it later in this blog post.

Additional storage capacity

We add 1 TB of additional storage to each of the nodes (`f0`, `f1`, `f2`) in the form of an SSD drive. The Beelink mini PCs have enough space in the chassis for the extra space.

./f3s-kubernetes-with-freebsd-part-6/drives.jpg

Upgrading the storage was as easy as unscrewing, plugging the drive in, and then screwing it back together again. The procedure was uneventful! We're using two different SSD models (Samsung 870 EVO and Crucial BX500) to avoid simultaneous failures from the same manufacturing batch.

We then create the `zdata` ZFS pool on all three nodes:

To verify that we have a different SSD on the second node (the third node has the same drive as the first):

ZFS encryption keys

ZFS native encryption requires encryption keys to unlock datasets. We need a secure method to store these keys that balances security with operational needs:

Using USB flash drives as hardware key storage provides a convenient and elegant solution. The encrypted data is unreadable without physical access to the USB key, protecting against disk theft or improper disposal. In production environments, you may use enterprise key management systems; however, for a home lab, USB keys offer good security with minimal complexity.

UFS on USB keys

We'll format the USB drives with UFS (Unix File System) rather than ZFS for simplicity. There is no need to use ZFS.

Let's see the USB keys:

USB keys

To verify that the USB key (flash disk) is there:

Let's create the UFS file system and mount it (done on all three nodes `f0`, `f1` and `f2`):

USB keys stuck in

Generating encryption keys

The following keys will later be used to encrypt the ZFS file systems. They will be stored on all three nodes, serving as a backup in case one of the keys is lost or corrupted. When we later replicate encrypted ZFS volumes from one node to another, the keys must also be available on the destination node.

After creation, these are copied to the other two nodes, `f1` and `f2`, into the `/keys` partition (I won't provide the commands here; create a tarball, copy it over, and extract it on the destination nodes).

Configuring `zdata` ZFS pool encryption

Let's encrypt our `zdata` ZFS pool. We are not encrypting the whole pool, but everything within the `zdata/enc` data set:

All future data sets within `zdata/enc` will inherit the same encryption key.

Migrating Bhyve VMs to an encrypted `bhyve` ZFS volume

We set up Bhyve VMs in a previous blog post. Their ZFS data sets rely on `zroot`, which is the default ZFS pool on the internal 512GB NVME drive. They aren't encrypted yet, so we encrypt the VM data sets as well now. To do so, we first shut down the VMs on all three nodes:

After this, we rename the unencrypted data set to `_old`, create a new encrypted data set, and also snapshot it as `@hamburger`.

Once done, we import the snapshot into the encrypted dataset and also copy some other metadata files from `vm-bhyve` back over.

We also have to make encrypted ZFS data sets mount automatically on boot:

As you can see, the VM is running. This means the encrypted `zroot/bhyve` was mounted successfully after the reboot! Now we can destroy the old, unencrypted, and now unused bhyve dataset:

To verify once again that `zroot/bhyve` and `zroot/bhyve/rocky` are now both encrypted, we run:

ZFS Replication with `zrepl`

Data replication is the cornerstone of high availability. While CARP handles IP failover (see later in this post), we need continuous data replication to ensure the backup server has current data when it becomes active. Without replication, failover would result in data loss or require shared storage (like iSCSI), which introduces a single point of failure.

Understanding Replication Requirements

Our storage system has different replication needs:

The 1-minute replication window is perfectly acceptable for my personal use cases. This isn't a high-frequency trading system or a real-time database—it's storage for personal projects, development work, and home lab experiments. Losing at most 1 minute of work in a disaster scenario is a reasonable trade-off for the reliability and simplicity of snapshot-based replication. Additionally, in the case of a "1 minute of data loss," I would likely still have the data available on the client side.

Why use `zrepl` instead of HAST? While HAST (Highly Available Storage) is FreeBSD's native solution for high-availability storage and supports synchronous replication—thus eliminating the mentioned 1-minute window—I've chosen `zrepl` for several important reasons:

FreeBSD HAST

Installing `zrepl`

First, install `zrepl` on both hosts involved (we will replicate data from `f0` to `f1`):

Then, we verify the pools and datasets on both hosts:

Since we have a WireGuard tunnel between `f0` and f1, we'll use TCP transport over the secure tunnel instead of SSH. First, check the WireGuard IP addresses:

Let's create a dedicated dataset for NFS data that will be replicated:

Afterwards, we create the `zrepl` configuration on `f0`:

We're using two separate replication jobs with different intervals:

The FreeBSD VM is only used for development purposes, so it doesn't require as frequent replication as the NFS data. It's off-topic to this blog series, but it showcases how `zrepl`'s flexibility in handling different datasets with varying replication needs.

Furthermore:

Configuring `zrepl` on `f1` (sink)

On `f1` (the sink, meaning it's the node receiving the replication data), we configure `zrepl` to receive the data as follows:

Enabling and starting `zrepl` services

We then enable and start `zrepl` on both hosts via:

To check the replication status, we run:

Monitoring replication

You can monitor the replication progress with:

zrepl status

With this setup, both `zdata/enc/nfsdata` and `zroot/bhyve/freebsd` on `f0` will be automatically replicated to `f1` every 1 minute (or 10 minutes in the case of the FreeBSD VM), with encrypted snapshots preserved on both sides. The pruning policy ensures that we keep the last 10 snapshots while managing disk space efficiently.

The replicated data appears on `f1` under `zdata/sink/` with the source host and dataset hierarchy preserved:

This is by design - `zrepl` preserves the complete path from the source to ensure there are no conflicts when replicating from multiple sources.

Verifying replication after reboot

The `zrepl` service is configured to start automatically at boot. After rebooting both hosts:

The timestamps confirm that replication resumed automatically after the reboot, ensuring continuous data protection. We can also write a test file to the NFS data directory on `f0` and verify whether it appears on `f1` after a minute.

Understanding Failover Limitations and Design Decisions

Our system intentionally fails over to a read-only copy of the replica in the event of the primary's failure. This is due to the nature of `zrepl`, which only replicates data in one direction. If we mount the data set on the sink node in read-write mode, it would cause the ZFS dataset to diverge from the original, and the replication would break. It can still be mounted read-write on the sink node in case of a genuine issue on the primary node, but that step is left intentionally manual. Therefore, we don't need to fix the replication later on manually.

So in summary:

Mounting the NFS datasets

To make the NFS data accessible on both nodes, we need to mount it. On `f0`, this is straightforward:

On `f1`, we need to handle the encryption key and mount the standby copy:

Note: The dataset is mounted at the same path (`/data/nfs`) on both hosts to simplify failover procedures. The dataset on `f1` is set to `readonly=on` to prevent accidental modifications, which, as mentioned earlier, would break replication. If we did, replication from `f0` to `f1` would fail like this:

cannot receive incremental stream: destination zdata/sink/f0/zdata/enc/nfsdata has been modified since most recent snapshot

To fix a broken replication after accidental writes, we can do:

And replication should work again!

Troubleshooting: Files not appearing in replication

If you write files to `/data/nfs/` on `f0` but they don't appear on `f1`, check if the dataset is mounted on `f0`?

If it shows `no`, the dataset isn't mounted! This means files are being written to the root filesystem, not ZFS. Next, we should check whether the encryption key is loaded:

You can also verify that files are in the snapshot (not just the directory):

This issue commonly occurs after a reboot if the encryption keys aren't configured to load automatically.

Configuring automatic key loading on boot

To ensure all additional encrypted datasets are mounted automatically after reboot as well, we do:

Important notes:

Troubleshooting: zrepl Replication Not Working

If `zrepl` replication is not working, here's a systematic approach to diagnose and fix common issues:

Check if zrepl Services are Running

First, verify that `zrepl` is running on both nodes:

Check zrepl Status for Errors

Use the status command to see detailed error information:

Fixing "No Common Snapshot" Errors

This is the most common replication issue, typically occurring when:

**Error message example:**

**Solution: Clean up conflicting snapshots on receiver**

**Verification that replication is working:**

Network Connectivity Issues

If replication fails to connect:

Encryption Key Issues

If encrypted replication fails:

Monitoring Ongoing Replication

After fixing issues, monitor replication health:

This troubleshooting process resolves the most common `zrepl` issues and ensures continuous data replication between your storage nodes.

CARP (Common Address Redundancy Protocol)

High availability is crucial for storage systems. If the storage server goes down, all NFS clients (which will also be Kubernetes pods later on in this series) lose access to their persistent data. CARP provides a solution by creating a virtual IP address that automatically migrates to a different server during failures. This means that clients point to that VIP for NFS mounts and are always contacting the current primary node.

How CARP Works

In our case, CARP allows two hosts (`f0` and `f1`) to share a virtual IP address (VIP). The hosts communicate using multicast to elect a MASTER, while the other remain as BACKUP. When the MASTER fails, the BACKUP automatically promotes itself, and the VIP is reassigned to the new MASTER. This happens within seconds.

Key benefits for our storage system:

FreeBSD CARP

Stunnel

Configuring CARP

First, we add the CARP configuration to `/etc/rc.conf` on both `f0` and `f1`:

Update: Sun 4 Jan 00:17:00 EET 2026 - Added `advskew 100` to f1 so f0 always wins CARP elections when it comes back online after a reboot.

Whereas:

Next, update `/etc/hosts` on all nodes (`f0`, `f1`, `f2`, `r0`, `r1`, `r2`) to resolve the VIP hostname:

This allows clients to connect to `f3s-storage-ha` regardless of which physical server is currently the MASTER.

CARP State Change Notifications

To correctly manage services during failover, we need to detect CARP state changes. FreeBSD's devd system can notify us when CARP transitions between MASTER and BACKUP states.

Add this to `/etc/devd.conf` on both `f0` and `f1`:

Next, we create the CARP control script that will restart stunnel when the CARP state changes:

Update: Fixed the script at Sat 3 Jan 23:55:11 EET 2026 - changed `$1` to `$2` because devd passes `$subsystem $type`, so the state is in the second argument.

Note that `carpcontrol.sh` doesn't do anything useful yet. We will provide more details (including starting and stopping services upon failover) later in this blog post.

To enable CARP in `/boot/loader.conf`, run:

Then reboot both hosts or run `doas kldload carp` to load the module immediately.

NFS Server Configuration

With ZFS replication in place, we can now set up NFS servers on both `f0` and `f1` to export the replicated data. Since native NFS over TLS (RFC 9289) has compatibility issues between Linux and FreeBSD (not digging into the details here, but I couldn't get it to work), we'll use stunnel to provide encryption.

Setting up NFS on `f0` (Primary)

First, enable the NFS services in rc.conf:

Update: 08.08.2025: I've added the domain to `nfsuserd_flags`

And we also create a dedicated directory for Kubernetes volumes:

We also create the `/etc/exports` file. Since we're using stunnel for encryption, ALL clients must connect through stunnel, which appears as localhost (`127.0.0.1`) to the NFS server:

The exports configuration:

To start the NFS services, we run:

Configuring Stunnel for NFS Encryption with CARP Failover

Using stunnel with client certificate authentication for NFS encryption provides several advantages:

Stunnel integrates seamlessly with our CARP setup:

The key insight is that stunnel binds to the CARP VIP. When CARP fails over, the VIP is moved to the new master, and stunnel starts there automatically. Clients maintain their connection to the same IP throughout.

Creating a Certificate Authority for Client Authentication

First, create a CA to sign both server and client certificates:

Install and Configure Stunnel on `f0`

The configuration includes:

Setting up NFS on `f1` (Standby)

Repeat the same configuration on `f1`:

And to configure stunnel on `f1`, we run:

CARP Control Script for Clean Failover

With stunnel configured to bind to the CARP VIP (192.168.1.138), only the server that is currently the CARP MASTER will accept stunnel connections. This provides automatic failover for encrypted NFS:

This ensures that clients always connect to the active NFS server through the CARP VIP. To ensure clean failover behaviour and prevent stale file handles, we'll update our `carpcontrol.sh` script so that:

This approach ensures clients can only connect to the active server, eliminating stale handles from the inactive server:

Update: Fixed the script at Sat 3 Jan 23:55:11 EET 2026 - changed `$1` to `$2` because devd passes `$subsystem $type`, so the state is in the second argument.

CARP Management Script

To simplify CARP state management and failover testing, create this helper script on both `f0` and `f1`:

Now you can easily manage CARP states and auto-failback:

Automatic Failback After Reboot

When `f0` reboots (planned or unplanned), `f1` takes over as CARP MASTER. To ensure `f0` automatically reclaims its primary role once it's fully operational, we'll implement an automatic failback mechanism. With:

Update: Fixed the script at Sun 4 Jan 00:04:28 EET 2026 - removed the NFS service check because when f0 is BACKUP, NFS services are intentionally stopped by carpcontrol.sh, which would prevent auto-failback from ever triggering.

The marker file identifies that the ZFS data set is mounted correctly. We create it with:

We add a cron job to check every minute:

The enhanced CARP script provides integrated control over auto-failback. To temporarily turn off automatic failback (e.g., for `f0` maintenance), we run:

And to re-enable it:

To check whether auto-failback is enabled, we run:

The failback attempts are logged to `/var/log/carp-auto-failback.log`!

So, in summary:

This ensures `f0` automatically resumes its role as primary storage server after any reboot, while providing administrative control when needed.

Client Configuration for NFS via Stunnel

To mount NFS shares with stunnel encryption, clients must install and configure stunnel using their client certificates.

Configuring Rocky Linux Clients (`r0`, `r1`, `r2`)

On the Rocky Linux VMs, we run:

Note: Each client must use its certificate file (`r0-stunnel.pem`, `r1-stunnel.pem`, `r2-stunnel.pem`, or `earth-stunnel.pem` - the latter is for my Laptop, which can also mount the NFS shares).

NFSv4 user mapping config on Rocky

Update: This section was added 08.08.2025!

For this, we need to set the `Domain` in `/etc/idmapd.conf` on all 3 Rocky hosts to `lan.buetow.org` (remember, earlier in this blog post we set the `nfsuserd` domain on the NFS server side to `lan.buetow.org` as well!)

We also need to increase the inotify limit, otherwise nfs-idmapd may fail to start with "Too many open files":

And afterwards, we need to run the following on all 3 Rocky hosts:

and then, safest, reboot those.

Testing NFS Mount with Stunnel

To mount NFS through the stunnel encrypted tunnel, we run:

Note: The mount uses localhost (`127.0.0.1`) because stunnel is listening locally and forwarding the encrypted traffic to the remote server.

Testing CARP Failover with mounted clients and stale file handles:

To test the failover process:

After a CARP failover, NFS clients may experience "Stale file handle" errors because they cached file handles from the previous server. To resolve this manually, we can run:

For the automatic recovery, we create a script:

And we create the systemd service as follows:

And we also create the systemd timer (runs every 10 seconds):

To enable and start the timer, we run:

Note: Stale file handles are inherent to NFS failover because file handles are server-specific. The best approach depends on your application's tolerance for brief disruptions. Of course, all the changes made to `r0` above must also be applied to `r1` and `r2`.

Updated Wed 19 Mar 2026: Added automatic pod restart after NFS remount

The script now also tracks whether a mount was fixed via the `MOUNT_FIXED` variable. After a successful remount, it queries kubectl for pods on the local node that are stuck in `Unknown`, `Pending`, or `ContainerCreating` state and force-deletes them. Kubernetes then automatically reschedules these pods, which will now succeed because the NFS mount is healthy again. Without this, pods that hit a stale mount would remain broken until manually deleted, even after the underlying NFS issue was resolved.

Complete Failover Test

Here's a comprehensive test of the failover behaviour with all optimisations in place:

Failover Timeline:

Benefits of the Optimised Setup:

Important Considerations:

Update: Upgrade to 4TB drives

Update: 27.01.2026 I have since replaced the 1TB drives with 4TB drives for more storage capacity. The upgrade procedure was different for each node!

Upgrading f1 (simpler approach)

Since f1 is the replication sink, the upgrade was straightforward:

Upgrading f0 (using ZFS resilvering)

For f0, which is the primary storage node, I used ZFS resilvering to avoid data loss:

This was a one-time effort on both nodes - after a reboot, everything was remembered and came up normally. Here are the updated outputs:

We're still using different SSD models on f1 (WD Blue SA510 4TB) to avoid simultaneous failures:

Conclusion

We've built a robust, encrypted storage system for our FreeBSD-based Kubernetes cluster that provides:

Some key lessons learned are:

Future Storage Explorations

While `zrepl` provides excellent snapshot-based replication for disaster recovery, there are other storage technologies worth exploring for the f3s project:

MinIO for S3-Compatible Object Storage

MinIO is a high-performance, S3-compatible object storage system that could complement our ZFS-based storage. Some potential use cases:

MooseFS for Distributed High Availability

MooseFS is a fault-tolerant, distributed file system that could provide proper high-availability storage:

Both technologies could run on top of our encrypted ZFS volumes, combining ZFS's data integrity and encryption features with distributed storage capabilities. This would be particularly interesting for workloads that need either S3-compatible APIs (MinIO) or transparent distributed POSIX storage (MooseFS). What about Ceph and GlusterFS? Unfortunately, there doesn't seem to be great native FreeBSD support for them. However, other alternatives also appear suitable for my use case.

Read the next post of this series:

f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

Other *BSD-related posts:

2026-04-02 f3s: Kubernetes with FreeBSD - Part 9: GitOps with ArgoCD

2025-12-14 f3s: Kubernetes with FreeBSD - Part 8b: Distributed Tracing with Tempo

2025-12-07 f3s: Kubernetes with FreeBSD - Part 8: Observability

2025-10-02 f3s: Kubernetes with FreeBSD - Part 7: k3s and first pod deployments

2025-07-14 f3s: Kubernetes with FreeBSD - Part 6: Storage (You are currently reading this)

2025-05-11 f3s: Kubernetes with FreeBSD - Part 5: WireGuard mesh network

2025-04-05 f3s: Kubernetes with FreeBSD - Part 4: Rocky Linux Bhyve VMs

2025-02-01 f3s: Kubernetes with FreeBSD - Part 3: Protecting from power cuts

2024-12-03 f3s: Kubernetes with FreeBSD - Part 2: Hardware and base installation

2024-11-17 f3s: Kubernetes with FreeBSD - Part 1: Setting the stage

2024-04-01 KISS high-availability with OpenBSD

2024-01-13 One reason why I love OpenBSD

2022-10-30 Installing DTail on OpenBSD

2022-07-30 Let's Encrypt with OpenBSD and Rex

2016-04-09 Jails and ZFS with Puppet on FreeBSD

E-Mail your comments to `paul@nospam.buetow.org`

Back to the main site