Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions notes/custom_pointer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Research Notes: Custom Pointers for Oscars

This document answers the questions from issue #86 about adding custom pointers to oscars

Our main goal is to find a reliable way to point to memory on the heap. Unlike normal pointers, we want these pointers to work even if the operating system loads the program at a different memory address next time. This makes it possible to "pin" specific objects and easily serialize/deserialize the heap.

We can build a custom pointer by combining our `MempoolAllocator` design with ideas from modern GC research.

Reference: https://kyju.org/blog/tokioconf-2026/#a-sketch-of-a-real-raw-pointer-based-gc

## 1. What is the most optimal representation for that pointer?

To be able to serialize and deserialize the heap, we cannot use regular memory addresses (`*mut T` or `NonNull<T>`). Regular addresses change every time we run the program. Instead, we need a stable ID.

Since we use a `MempoolAllocator`, which organizes memory into blocks called pools, the best choice is a **Segmented ID**

### Segmented ID Representation
A custom pointer should just be a 32 bit number, `u32` or `NonZeroU32` so `Option<CustomPtr>` stays small

This 32 bit number is split into two parts:
- **`pool_id`:** Tells us which pool the object is in.
- **`slot_idx`:** Tells us the exact slot within that pool.

**Why this is the best choice:**
1. **Fits Mempool Perfectly:** `MempoolAllocator` already organizes memory into Pools and Slots, this ID directly matches that setup.
2. **Saves Memory:** Using a 32 bit number instead of a 64 bit pointer cuts the size of all GC references in half, making the program faster because more data fits in the CPU cache.
3. **Easy to Serialize/Deserialize:** The ID is just a logical coordinate (`pool_id`, `slot_idx`), not a physical memory address. When we deserialize a serialized heap, these coordinates still point to the correct objects, even if the OS puts the pools in a different physical location.

## 2. What is the API for a custom pointer?

Because a custom pointer is just an index and not a real pointer, so it cannot safely use the `core::ops::Deref` trait. we can't turn a number into a reference without knowing where the memory is actually stored.

Instead, we use branding, i.e. we wrap the 32 bit number in type `Gc<'gc, T>`

### The `Gc` Wrapper
```rust
use core::num::NonZeroU32;
use core::marker::PhantomData;

/// 32 bit number
#[derive(Copy, Clone, PartialEq, Eq)]
#[repr(transparent)]
pub struct CustomPtr(NonZeroU32);

/// GC pointer
#[derive(Copy, Clone)]
#[repr(transparent)]
pub struct Gc<'gc, T: ?Sized> {
ptr: CustomPtr,
_marker: PhantomData<(&'gc (), *const T)>,
}
```

### The `Deref` Problem
Right now, the new `mark_sweep_branded` and `null_collector_branded` APIs wrap real physical pointers (`NonNull`) and implement `Deref`. This makes them easy to use.

If we change our pointers to be 32 bit Segmented IDs, **we will lose the ability to use `Deref`**.

Instead, developers will have to pass the pointer back to the GC context (`MutationContext` or `ArenaCtx`) to read the data:

```rust
impl<'gc> MutationContext<'gc> {
/// Turns the custom pointer into a real Rust ref
pub fn get<T: Trace>(&self, gc: Gc<'gc, T>) -> &T {
// Looks up the memory using the pool_id and slot_idx
}
}
```

If we really want to keep `Deref` for ease of use, there are two workarounds:
1. **Thread-Local Storage (TLS):** Put the `MempoolAllocator` in a `thread_local!` variable. This lets `Deref` look up the memory secretly behind the scenes. This is easy to use but makes it harder to move the heap between threads.
2. **Hybrid Approach:** Keep using real pointers (`NonNull`) for `Gc` during normal code execution so `Deref` works. But, create a new `HeapPtr` type that uses the Segmented ID only when we need to pin or serialize the object to disk.

### Benefits of Thread safety
A huge benefit of using a 32 bit index is that it is totally harmless on its own. we can't read the memory without the `MutationContext`.

Because of this, `Gc<'gc, T>` can safely implement `Send` and `Sync`. We can safely pass these pointers between different threads, even if the data they point to can be mutated, like `Cell`

```rust
// Safe because Gc is just 32 bit number
unsafe impl<'gc, T> Send for Gc<'gc, T> {}
unsafe impl<'gc, T> Sync for Gc<'gc, T> {}
```

## 3. How should memory stores and loads work?

To read or write memory, we must always use the context that owns the `MempoolAllocator`.

### Looking up the Memory
When we call `ctx.get(gc)` or `ctx.get_mut(gc)`, the context breaks the 32 bit number into its two parts and finds the memory:

```rust
impl CustomPtr {
#[inline(always)]
pub fn pool_id(&self) -> usize {
(self.0.get() >> 20) as usize // top 12 bits
}

#[inline(always)]
pub fn slot_idx(&self) -> usize {
(self.0.get() & 0x000F_FFFF) as usize // bottom 20 bits
}
}

// Inside MutationContext::get:
let pool_id = gc.ptr.pool_id();
let slot_idx = gc.ptr.slot_idx();

// find the pool
let pool = &self.heap.pools[pool_id];

// find the exact slot
let value_ref = pool.get_slot(slot_idx);
```

### Performance
Even though this requires two lookups, finding the pool and then finding the slot, it is extremely fast. The table of pools is small and stays in the CPU cache. The speed gained from using 32 bit pointers more than makes up for this tiny delay.

### Serializing and Deserializing the Heap
With this design, serializing the heap to disk is very easy:

1. **Pause Changes:** Make sure no code is currently modifying the heap.
2. **Serialize Pools:** Loop through all the pools in the `MempoolAllocator`. Serialize their metadata (like the ID) and write the raw bytes of all used slots to disk.
3. **Serialize Roots:** Serialize the 32 bit IDs of any root objects.
4. **Deserialize the Heap:** Recreate the `MempoolAllocator` with the exact same pool IDs and deserialize the raw bytes back in. Because all Gc pointers are just `(pool_id, slot_idx)` numbers, they will automatically point to the right places. We do not need to rewrite or fix any pointers
46 changes: 46 additions & 0 deletions notes/custom_ptr_integration_blockers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Custom Pointer Integration Blockers


This is a follow-up to our initial research on adding custom pointers to `oscars`. We've built the `mempool4` prototype to test out the `(pool_id, slot_idx)` custom pointer idea, so let's get into what we found.

The primary goal of this exercise was to see if a 32-bit stable coordinate could actually work for allocations, resolutions, and heap serialization.

## General notes

The implementation itself does appear to function correctly. We can allocate, we can safely resolve using a `'gc` branded context, and the serialization story is incredibly clean since the coordinates don't need any fixup passes after restarting.

However, there are a few caveats. If we want to make this custom pointer approach work with the existing `mark_sweep_branded` API, we run into some serious integration blockers.

### Major blocker: Loss of `Deref`

Right now, the existing `Gc<'gc, T>` uses raw physical pointers under the hood and implements the `Deref` trait. This makes it really nice to use: `obj.properties()` just works.

Because our custom pointer is just a 32-bit number, it can't safely implement `Deref`. The compiler has no idea where the memory actually is without asking the allocator. So every read has to become `cx.resolve(obj).properties()`.

Why is this a major blocker? We have hundreds of call sites across `builtins/`, `object/`, `vm/`, and `environments/` that rely on `Deref`. Migrating all of those introduces a lot of API friction.

There are a few ways around this:
1. We just bite the bullet and migrate all the code.
2. The Hybrid Approach: we keep using real pointers for `Gc` at runtime (so `Deref` still works), and we only convert them into Custom Pointers when we need to serialize or pin something.
3. We put the allocator in Thread Local Storage (TLS) so `Deref` can look it up behind the scenes.

### Major blocker: The `Trace` trait

Currently, the `Trace` trait passes a real memory address to the `Tracer`.

With `CustomPtr`, it's just a `NonZeroU32`. The tracer sees it and does nothing. It can't follow the coordinate because it doesn't have access to the `PoolAllocator4`. If it can't follow it, it thinks the object is dead and frees it, causing UAF.

We'd have to either pass the allocator into the tracer (an additive change) or change the signature of `Trace` entirely (a massive breaking change). Note that the Hybrid Approach mentioned above also neatly sidesteps this issue, since the tracer would only ever see real pointers.

### Room for improvement

There are a couple other open questions around the integration:

1. **Write Barriers:** When we assign a new GC pointer (like `node.next = other_gc`), the GC needs to know. With `Deref`, we could intercept this. With a raw `u32`, we can't. We'd need an explicit write API on the context.
2. **Pinning:** We built custom pointers to make pinning easy, but we haven't actually specced out what a "pinned object" looks like in the allocator.

## Conclusion

The core custom pointer concept may very well be a valid path forward, but it will be dependent on how we want to handle the loss of `Deref`.

If we choose the Hybrid Approach, we solve both the `Deref` ergonomics issue and the `Trace` issue, though we pay a small cost in runtime conversions. Otherwise, we have to commit to a massive API migration. We need to make this decision before moving ahead.
109 changes: 109 additions & 0 deletions oscars/examples/mempool4_demo.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
//! mempool4 demo: allocate via CustomPtr, serialize, deserialize, verify
//!
//! what this demo proves:
//! 1. CustomPtr coordinates (pool_id, slot_idx) survive serialization and deserialization
//! without requiring any pointer fixup passes. A linked list serialized to bytes
//! can be traversed using the exact same head coordinate after being restored.
//! 2. The allocator's internal state (pool IDs, bump pointers) is correctly restored,
//! allowing safe incremental allocations after deserialization without colliding
//! with existing data.
//!
//! Run: `cargo run --example mempool4_demo --features std`

use oscars::alloc::mempool4::{AllocCtx, CustomPtr, Gc, PoolAllocator4, deserialize, serialize};

#[derive(Debug, Clone, Copy, PartialEq)]
struct Entry {
key: u32,
value: i64,
/// Raw CustomPtr of the next entry or 0 for end of list
next_raw: u32,
}

fn push_front(cx: &AllocCtx<'_>, head_raw: u32, key: u32, value: i64) -> u32 {
cx.try_alloc(Entry {
key,
value,
next_raw: head_raw,
})
.expect("allocation failed")
.as_custom_ptr()
.to_raw()
}

fn print_list(cx: &AllocCtx<'_>, head_raw: u32) {
let mut raw = head_raw;
while let Some(ptr) = CustomPtr::from_raw(raw) {
// SAFETY: ptr came from a live allocation or a valid deserialized snapshot.
let e: &Entry = cx.resolve(unsafe { Gc::from_custom_ptr(ptr) });
println!(" -> key={} value={}", e.key, e.value);
raw = e.next_raw;
}
}

fn collect_list(cx: &AllocCtx<'_>, head_raw: u32) -> Vec<(u32, i64)> {
let mut out = Vec::new();
let mut raw = head_raw;
while let Some(ptr) = CustomPtr::from_raw(raw) {
let e: &Entry = cx.resolve(unsafe { Gc::from_custom_ptr(ptr) });
out.push((e.key, e.value));
raw = e.next_raw;
}
out
}

fn main() {
println!("Phase 1: allocating entries");
let mut alloc = PoolAllocator4::new().with_page_size(4096);

let head_raw = alloc.mutate(|cx: AllocCtx<'_>| {
let mut head = 0u32;
head = push_front(&cx, head, 30, 3000);
head = push_front(&cx, head, 20, 2000);
head = push_front(&cx, head, 10, 1000);
println!("before serialization:");
print_list(&cx, head);
println!(
"live slots: {} pool count: {}",
cx.live_slot_count(),
cx.pool_count()
);
head
});

println!("\nPhase 2: Serializing");
let snapshot = serialize(&alloc);
println!("snapshot: {} bytes", snapshot.len());

println!("\nPhase 3: deserializing");
let mut alloc2 = deserialize(&snapshot).expect("deserialization failed");
let entries_after = alloc2.mutate(|cx: AllocCtx<'_>| {
println!("after deserialization:");
print_list(&cx, head_raw);
collect_list(&cx, head_raw)
});

println!("\nPhase 4: verifying");
let entries_before = alloc.mutate(|cx: AllocCtx<'_>| collect_list(&cx, head_raw));
assert_eq!(entries_before, entries_after);
println!("{} entries match after round trip", entries_before.len());

println!("\nPhase 5: mutating and re-serializing");
let new_head = alloc2.mutate(|cx: AllocCtx<'_>| {
let h = push_front(&cx, head_raw, 5, 500);
println!("after mutation:");
print_list(&cx, h);
h
});

let snapshot2 = serialize(&alloc2);
println!("snapshot 2: {} bytes", snapshot2.len());

let mut alloc3 = deserialize(&snapshot2).unwrap();
alloc3.mutate(|cx: AllocCtx<'_>| {
println!("round trip 2:");
print_list(&cx, new_head);
});

println!("\ndone.");
}
Loading