Disaggregated filesystem for multi-kernel and multi-host shared memory.
DAXFS operates directly on DAX-capable memory (persistent memory, CXL memory, or DMA buffers) via direct load/store access. Multiple independent kernels or CXL-connected hosts sharing memory get a unified storage layer: shared namespace, cooperative page cache, and zero-copy CPU/GPU access over aggregated distributed storage.
Not for traditional disks. DAXFS requires byte-addressable memory with DAX support.
It cannot run on block devices; the entire design assumes direct memory pointer access
and synchronization with cmpxchg.
- Zero-copy reads - Direct memory access, no page cache overhead
- Lock-free writes - CAS-based hash overlay, no locks between participants in one cache-coherence domain (see Coherence model)
- Shared page cache - Demand-paged cache in DAX memory, visible to all participants sharing a cache-coherent domain
- Multi-kernel namespace - Each kernel instance exports local storage into a shared filesystem (multikernel, single coherence domain); cross-host CXL is gated future work (see Coherence model)
- Flexible backing - Physical address, DAX device, or dma-buf
- Security by simplicity - Flat directory format, bounded validation, no pointer chasing
- LLM inference serving - Multiple GPU kernels share model weights through daxfs; one copy in shared memory serves all instances, cold start goes from minutes to seconds
- Multikernel/multi-host - Shared rootfs across kernel instances or CXL-connected hosts with cooperative caching
- CXL memory pooling - Common filesystem across CXL-connected hosts with lock-free concurrent access
- GPU/accelerator - Zero-copy access to data via dma-buf
- Container rootfs - Shared base image with writable overlay per container
| Filesystem | Limitation for this use case |
|---|---|
| tmpfs/ramfs | Per-instance, N containers = N copies in memory |
| overlayfs | No multi-kernel/multi-host support, copy-up on write, page cache overhead |
| erofs | Read-only, fscache is per-kernel so N kernels = N cache copies |
| famfs | Single-writer metadata, no shared caching, no CAS coordination (see below) |
| cramfs | Block I/O + page cache, no direct memory mapping |
Both DAXFS and FamFS target CXL shared memory, but they differ fundamentally in architecture:
| DAXFS | FamFS | |
|---|---|---|
| Coordination model | Peer-to-peer via cmpxchg |
Single master, clients replay metadata log |
| Writes | Lock-free CAS overlay, any host can write concurrently | Master creates files; clients default read-only, user manages coherency if writable |
| Shared caching | Cooperative page cache (pcache) across all hosts, clock-based eviction | None; each node manages its own access |
| Allocation | Self-contained image with internal bump allocator | Per-file extent lists allocated by master |
| File operations | Create, read, write (COW), delete (tombstone) | Pre-allocate only (no append, truncate, or delete) |
| Image model | Self-contained: superblock + base image + overlay + pcache in one region | No images; files are individually mapped extents |
| Coherence model | Lock-free cmpxchg within one hardware cache-coherence domain (multikernel); cross-host CXL requires CXL 3.0 hardware coherence and is gated/unvalidated (see Coherence model) |
Single-writer log; user manages coherency |
| Layered storage | Base image + overlay (shared base with per-instance COW) | No layering concept |
FamFS is a thin mapping layer that exposes pre-allocated files on shared memory. DAXFS is a general-purpose shared in-memory filesystem that uses shared-memory atomics for lock-free coordination within a cache-coherence domain: concurrent writes, cooperative caching, and layered storage without a central coordinator. Cross-host CXL operation requires hardware coherence and is gated future work; see Coherence model.
make # build kernel module + tools
make cleanRequires Linux 5.11+ and CONFIG_FS_DAX enabled in the target kernel.
# Create a static read-only image
mkdaxfs -d /path/to/rootfs -o image.daxfs
# Create and mount from DMA heap (read-only)
mkdaxfs -d /path/to/rootfs -H /dev/dma_heap/system -s 256M -m /mnt
# Split mode: metadata+overlay+cache in DAX, file data in backing file (writable)
mkdaxfs -d /path/to/rootfs -H /dev/dma_heap/mk -m /mnt -o /data/rootfs.img
# Empty mode: writable filesystem with no base image
mkdaxfs --empty -H /dev/dma_heap/mk -m /mnt -s 256M
# Custom overlay sizing
mkdaxfs -d /path/to/rootfs -o image.daxfs -O 128M -B 131072
# Create at physical address, then mount separately
mkdaxfs -d /path/to/rootfs -p 0x100000000 -s 256M
mount -t daxfs -o phys=0x100000000,size=0x10000000 none /mnt
# Split mode mount with backing file
mount -t daxfs -o phys=ADDR,size=SIZE,backing=/data/rootfs.img none /mnt| Option | Description |
|---|---|
-d, --directory DIR |
Source directory |
-o, --output FILE |
Output file (backing file in split mode) |
-H, --heap PATH |
Allocate from DMA heap |
-m, --mountpoint DIR |
Mount after creating (required with -H) |
-p, --phys ADDR |
Write to physical address via /dev/mem |
-s, --size SIZE |
Override allocation size |
-O, --overlay SIZE |
Overlay pool size (enables writes; default 64M in split/empty) |
-B, --buckets N |
Overlay bucket count (power of 2; default 65536) |
-C, --pcache-slots N |
Page cache slot count (power of 2; auto in split mode) |
-E, --empty |
Empty mode: overlay + pcache only, no base image |
-V, --validate |
Validate image on mount |
phys=ADDR, size=SIZE, validate (check untrusted data),
backing=PATH (backing file for split mode).
For dma-buf backing, use the new mount API (fsopen/fsconfig/fsmount) with
FSCONFIG_SET_FD to pass the dma-buf fd.
# Show memory layout and status
daxfs-inspect status -m /mnt/daxfs
# Show overlay hash table details (bucket utilization, entry types, pool usage)
daxfs-inspect overlay -m /mnt/daxfs
# Inspect via physical address
daxfs-inspect status -p 0x100000000 -s 256M| Mode | Layout | Description |
|---|---|---|
| Static | [Super][Base Image] |
Read-only, base image embedded in DAX |
| Split | [Super][Base Image][Overlay][PCache] |
Writable, metadata+overlay in DAX, file data in backing file |
| Empty | [Super][Overlay][PCache] |
Writable, no base image, all content via overlay |
The overlay replaces traditional journaling or log-structured writes with a CAS-based hash table on DAX memory. Multiple kernels or CXL hosts can write concurrently with no locks.
- Open addressing with linear probing, 16-byte buckets
- Atomic insert via
cmpxchgon bucket'sstate_keyfield (FREE→USED) - Bump allocator for pool entries (atomic fetch-and-add on
pool_alloc) - Entry types: inode metadata, data pages (4KB COW), directory entries with tombstone deletion
Key encoding (63 bits):
- Data:
(ino << 20) | pgoff(up to 1M pages per file) - Inode:
(ino << 20) | 0xFFFFF(sentinel pgoff) - Dirent:
FNV-1a(parent_ino, name)(63-bit hash)
Read path: overlay → base image → pcache (backing store). Write path: COW from base image into overlay data page.
Direct-mapped cache in DAX memory for backing store mode. Within a single hardware cache-coherence domain (e.g. multiple kernel instances on one coherent machine), the cache is visible to all participants via hardware coherence with no software coherency protocol. Cross-host CXL sharing requires hardware coherence (CXL 3.0) and is gated future work; see Coherence model.
- 3-state machine: FREE → PENDING → VALID, all transitions via
cmpxchg - Multi-file tags:
tag = (ino << 20) | pgoff, multiple backing files share one cache - Host fills, spawns wait: host kernel reads backing file into PENDING slots; spawn kernels busy-poll until VALID
- Pre-warming:
mkdaxfspre-populates cache slots at image creation time
Defined in include/daxfs_format.h (version 7).
| Region | Content |
|---|---|
| Superblock | Magic, version, region offsets (4KB) |
| Base image | Read-only snapshot: inode table + data (optional) |
| Overlay | CAS hash table + bump-allocated pool (optional, enables writes) |
| Page cache | Shared cache slots for backing store mode (optional) |
Base image (flat format):
- Inode table: fixed 64-byte entries
- Data area: file contents + directory entry arrays
- Directories store
daxfs_direntarrays (271 bytes each, 255-char max name)
Overlay (hash table):
- Header (4KB): magic, bucket count, pool offsets, atomic counters
- Bucket array:
bucket_count × 16bytes, open addressing - Pool: variable-size entries (inodes 32B, data pages 4104B, dirents ~280B)
Page cache:
- Header (4KB): magic, slot count, offsets, pending counter
- Slot metadata:
slot_count × 16bytes (state_tag + ref_bit) - Slot data:
slot_count × 4KBpages
DAXFS uses a flat directory format designed for safe handling of untrusted images:
| Property | Benefit |
|---|---|
| Flat directories | No linked lists, no cycle attacks |
| Fixed-size dirents | Bounded iteration, trivial validation |
| Inline names | No string table indirection |
| Mount-time validation | Optional validate mount option |
- No mknod support (device nodes, FIFOs, sockets not supported)
- Filename max 255 characters (matches VFS NAME_MAX)
- Overlay pool entries are recycled via per-type free lists, but the pool itself is not compacted
- Multi-file pcache tag supports up to ~1M pages per file (4GB with 4KB pages)
- Overlay hash table size is fixed at creation time