Design notes for Commonplace Storage and overall architecture updates
Overview and rationale
- Several attempts of increasing sophistication for in-memory storage layer made.
- Current version shows promise but is getting difficult to maintain and has no real future since we need to move to a two layer (cache + disk) system anyway.
Problems with current versions
- Global card storage
- Global reference storage
- Collections were "thin" concepts?
- More clone() etc than we wanted, probably hitting cache lines pretty hard.
Good things to preserve
- Copy on write/append-only structures.
- Minimal locking (at least in theory).
- Limited contention on reads OR writes.
- Storage design mirrors system concepts (fit for purpose.)
- Reasonable performance (server able to maintain 40000 ops/sec in simple tests.)
- Async IO in main process for managing network; threads for VM, storage and storage maintenance.
New Design: Unified Immutable Object Database
- Design for in-memory cache layer backed by disk storage
- Container-based isolation boundaries for privacy and security guarantees
- All Commonplace concepts (Card Instances, Logical Cards, Collections, References) represented as immutable, versioned Objects
- Single storage model handles all object types with runtime-determined schemas
- Append-only semantics with efficient access to current versions
- Three-tier processing pipeline: Network (async) → Commonplace Workers (threads) → Storage (threads)
Core Architecture
Container Model
trait StorageEngine {
fn create_container(&self, config: ContainerConfig) -> ContainerId;
fn delete_container(&self, id: ContainerId) -> Result<()>;
fn get_container(&self, id: ContainerId) -> Option<Container>;
fn list_containers(&self) -> Vec<ContainerId>;
}
struct Container {
id: ContainerId,
created_at: Timestamp,
// Dynamic object store registry
object_stores: HashMap<ObjectStoreId, Box<dyn ObjectStore>>,
}
trait Container {
fn create_object_store<T>(&mut self, store_type: ObjectStoreType) -> ObjectStoreId;
fn get_object_store<T>(&self, id: ObjectStoreId) -> Option<&dyn ObjectStore>;
fn delete_object_store(&mut self, id: ObjectStoreId) -> Result<()>;
fn list_object_stores(&self) -> Vec<ObjectStoreId>;
}
Schema-Based Object Model
struct Object {
id: ObjectId,
schema_id: SchemaId,
version: Version,
created_at: Timestamp,
sequence_number: u64,
fields: HashMap<String, FieldValue>,
references: Vec<ObjectId>,
}
struct Schema {
id: SchemaId,
name: String, // e.g., "CardInstance", "LogicalCard", "note-v1", "task-v1"
version: u32,
fields: HashMap<String, FieldDefinition>,
indexes: Vec<IndexDefinition>,
}
struct FieldDefinition {
name: String,
field_type: FieldType,
required: bool,
indexed: bool,
}
enum FieldType {
Text,
Number,
Boolean,
Date,
Reference(SchemaId),
Array(Box<FieldType>),
}
enum FieldValue {
Text(String),
Number(f64),
Boolean(bool),
Date(DateTime),
Reference(ObjectId),
Array(Vec<FieldValue>),
}
Storage Hierarchy
Three-level hierarchy provides clean abstraction boundaries:
- StorageEngine: Manages containers and provides isolation
- Container: Manages ObjectStores for one isolated tenant
- ObjectStore: Manages Objects within container (can contain multiple schemas)
trait ObjectStore {
fn create(&self, object: Object) -> ObjectId;
fn get_current(&self, id: ObjectId) -> Option<Object>;
fn get_version(&self, id: ObjectId, version: Version) -> Option<Object>;
fn get_version_history(&self, id: ObjectId) -> Vec<Version>;
fn query(&self, criteria: QueryCriteria) -> ObjectIterator;
fn query_by_schema(&self, schema_id: SchemaId, criteria: QueryCriteria) -> ObjectIterator;
}
Container Isolation
- Privacy Guarantees: Complete data separation between containers
- Resource Management: Independent memory/disk quotas per container
- Operational Boundaries: Backup/restore/migration per container
- Access Control: Storage layer enforces container boundaries
- Cross-Container Sharing: Results in data duplication (handled by higher-layer federation)
Schema Management
Each Container includes a Schema registry:
struct Container {
object_stores: HashMap<ObjectStoreId, Box<dyn ObjectStore>>,
schema_registry: ObjectStore<Schema>, // All schemas for this container
}
- Schema Evolution: Versioned schemas support backward compatibility
- Multi-Schema Stores: Single ObjectStore can contain Objects with different schemas
- Query Flexibility: Can query across schemas or filter by specific schema
- Validation: Objects validated against their referenced schema
Storage vs Application Schemas
Storage Schema (storage layer): Field definitions for structured storage and querying CardType (Commonplace layer): Combines storage schema with behavior, scripts, UI templates
// Higher-level Commonplace concept
struct CardType {
id: CardTypeId,
name: String,
storage_schema_id: SchemaId, // References storage schema
commonscript: Option<String>, // Behavior/validation scripts
ui_template: Option<String>, // Rendering information
metadata: CardTypeMetadata,
}
Update Semantics
All object types use append-only semantics - "updates" create new Object versions:
- Objects never modified in place
- New versions reference previous versions through sequence numbering
- Complete audit trail preserved
- Schema changes handled through schema versioning
Three-Tier Threading Architecture
Domain-Separated Processing Pipeline
The storage system uses a three-tier processing pipeline:
- (A) Network Domain: Async I/O for transport - parse/serialize protocol messages, fast handoff
- (B) Commonplace Worker Domain: OS threads for business logic - decompose protocol operations, execute Commonscript, coordinate storage operations
- (C) Storage Domain: OS threads for data operations - handle Container/ObjectStore/Object primitives
- Background Domain: OS threads for maintenance tasks - compaction, eviction, cleanup
Processing Flow
Network (Async) → Commonplace Workers (Threads) → Storage Workers (Threads)
Protocol Message → Business Logic Decomposition → Storage Primitives
Per-Thread StorageEngine Container Access
struct StorageWorker {
thread_id: ThreadId,
storage_engines: HashMap<ContainerId, StorageEngine>, // Per Container, not Collection
work_receiver: Receiver<WorkItem>,
}
struct StorageDispatcher {
// Track which thread owns which container
container_assignments: DashMap<ContainerId, ThreadId>,
// Per-thread work channels
thread_channels: HashMap<ThreadId, Sender<WorkItem>>,
}
struct CommonplaceWorker {
worker_id: WorkerId,
script_vm: ScriptVM, // Integrated Commonscript VM
storage_dispatcher: StorageDispatcher, // Access to storage domain
work_receiver: Receiver<CommonplaceWorkItem>,
}
Container-to-Thread Affinity
- Container Ownership: Each Container exclusively owned by one storage thread
- Collections as Objects: Collections are Objects with specific schema stored within Containers
- Routing Logic: Dispatcher routes work items to thread that owns the target Container
- State Isolation: Each thread's StorageEngine state completely independent
- Simple Concurrency: No coordination needed within threads for Container operations
Cross-Domain Communication
// Network domain (async) → Commonplace Workers (threads)
NetworkMessage::MoveCard { card_id, from_collection, to_collection }
↓ (fast handoff)
// Commonplace Worker: Business logic decomposition
CommonplaceWorker::handle_move_card() {
// Validate permissions, run scripts
// Decompose into storage primitives:
storage.delete_object(container_id, reference_id).await?;
storage.create_object(container_id, new_reference_object).await?;
}
↓ (storage operations)
// Storage domain: Container/Object primitives
StorageWorker::create_object(container_id, object) { /* ... */ }
StorageWorker::delete_object(container_id, object_id) { /* ... */ }
Storage Primitive Operations
Storage domain operations are Container/ObjectStore/Object primitives:
enum StorageOperation {
// Container management
OpenContainer(ContainerId),
CloseContainer(ContainerId),
// Object operations within containers
CreateObject { container_id: ContainerId, object: Object },
GetObject { container_id: ContainerId, object_id: ObjectId },
QueryObjects { container_id: ContainerId, schema_id: SchemaId, criteria: QueryCriteria },
// ObjectStore management
CreateObjectStore { container_id: ContainerId, store_type: ObjectStoreType },
ListObjectStores { container_id: ContainerId },
}
Collections and References are higher-level concepts handled by Commonplace Workers:
// Collections are Objects with "Collection" schema
CommonplaceOp::CreateCollection { container_id, name }
→ StorageOp::CreateObject { container_id, collection_object }
// References are Objects with "Reference" schema
CommonplaceOp::AddCardToCollection { container_id, collection_id, card_id }
→ StorageOp::CreateObject { container_id, reference_object }
Cache-Persistent Architecture
Memory Layer (Cache)
- Multi-Schema ObjectStores: DashMap-based concurrent access supporting multiple schemas
- Layered Resolution: Base views + recent changes for performance
- Cache Behavior: Memory layer acts as eviction-based cache over disk storage
- Memory Management: Active eviction prevents unbounded memory growth
- Schema-Aware Indexing: Indexes created based on schema field definitions
- Thread-Local State: Each storage thread maintains independent cache state
struct MemoryObjectStore {
// Base materialized view (stable, efficient access)
base_objects: Arc<DashMap<ObjectId, Object>>,
// Recent changes (append-only, checked first)
recent_changes: DashMap<ObjectId, Vec<Object>>,
// Schema-based indexes for efficient querying
field_indexes: HashMap<SchemaId, HashMap<String, FieldIndex>>,
// Memory management
access_tracker: LRUTracker<ObjectId>,
memory_budget: AtomicUsize,
eviction_policy: EvictionPolicy,
change_count: AtomicUsize,
}
enum EvictionPolicy {
LRU(usize), // Keep N most recently used objects
TTL(Duration), // Evict after time threshold
MemoryPressure(usize), // Evict when over memory limit
}
Disk Layer (Persistence)
- Append-Only Log: All Object creates written sequentially
- Index Files: Efficient lookup by ObjectId, schema-based queries
- Compaction: Periodic cleanup of old versions
- Crash Recovery: Log replay on startup
- Cache Miss Handling: Serves objects evicted from memory layer
- Thread-Local Files: Each storage thread manages disk files for its assigned collections
Cache Coherency
- Write-Through: All writes go to both memory and disk layers
- Read Path: Memory first (cache hit), fallback to disk (cache miss)
- Eviction: Objects removed from memory when cache is full
- Cache Loading: Disk objects loaded into memory on access
- Compaction: Promote recent changes to base views, may trigger eviction
- Work Routing: Storage operations routed to thread owning the target collection
Performance Characteristics
Read Operations
- Hot Objects: O(1) from memory base views
- Recently Changed: O(k) where k = recent versions
- Cold Objects: Disk read with index lookup
- Schema Queries: Efficient field-based queries using schema-aware indexes
- Cross-Schema Queries: Can query across multiple schemas in same ObjectStore
- Cache Friendly: No massive allocations, localized performance impact
Write Operations
- Memory: O(1) append to recent changes + write-through to disk
- Disk: Sequential append to log
- No Blocking: Writers don't block readers
- Memory Pressure: May trigger eviction or aggressive compaction
- Graceful Degradation: Performance degrades locally per object
Memory Management Strategy
- Eviction Policies: LRU, TTL, or memory pressure-based eviction
- Version Caching: Keep current versions in memory, limit historical versions
- Memory Budgets: Per-container and system-wide memory limits
- Adaptive Compaction: More aggressive compaction under memory pressure
- Cache Replacement: Objects evicted from memory remain accessible on disk
Compaction Strategy
- Threshold-Based: Compact when change_count exceeds limit
- Memory Pressure: Lower thresholds when approaching memory limits
- Per-Thread Compaction: Each storage thread manages compaction for its collections
- Background Coordination: Background domain handles cross-thread compaction coordination
- Incremental: Process subsets of objects to avoid latency spikes
- Disk Compaction: Background domain handles log file management
Scaling Properties
- Three-Tier Pipeline: Network (async) → Commonplace Workers (threads) → Storage (threads)
- Per-Thread Storage Engines: Independent scaling per storage worker thread
- Container Affinity: Containers distributed across storage threads for load balancing
- No Inter-Thread Coordination: Storage threads operate independently on their containers
- Domain Isolation: Complex protocol operations don't block async network operations
- Business Logic Separation: Commonplace workers handle complex operations and coordination
- Memory Efficiency: Linear growth with actual changes, per-thread cache management
- Exclusive Access: Simplified concurrency through thread-level container ownership
Implementation Phases
- Container Management: Implement container creation/deletion and isolation
- Three-Tier Threading: Integrate Network → Commonplace Workers → Storage pipeline
- Commonplace Workers: Implement business logic decomposition and Commonscript integration
- Per-Thread StorageEngines: Implement container-to-thread affinity and exclusive access
- Storage Primitives: Implement Container/ObjectStore/Object operations
- Memory Layer: Implement layered resolution for all object types within containers
- Disk Layer: Add append-only log with basic indexing per container
- Cross-Domain Communication: Connect all three domains with work dispatchers
- Background Tasks: Move compaction and maintenance to background domain
- Optimization: Add advanced indexing, query support, and performance tuning
- Production: Monitoring, operational tooling, and deployment readiness