Design notes for Commonplace Storage and overall architecture updates

Overview and rationale

Several attempts of increasing sophistication for in-memory storage layer made.
Current version shows promise but is getting difficult to maintain and has no real future since we need to move to a two layer (cache + disk) system anyway.

Problems with current versions

Global card storage
Global reference storage
Collections were "thin" concepts?
More clone() etc than we wanted, probably hitting cache lines pretty hard.

Good things to preserve

Copy on write/append-only structures.
Minimal locking (at least in theory).
Limited contention on reads OR writes.
Storage design mirrors system concepts (fit for purpose.)
Reasonable performance (server able to maintain 40000 ops/sec in simple tests.)
Async IO in main process for managing network; threads for VM, storage and storage maintenance.

New Design: Unified Immutable Object Database

Design for in-memory cache layer backed by disk storage
Container-based isolation boundaries for privacy and security guarantees
All Commonplace concepts (Card Instances, Logical Cards, Collections, References) represented as immutable, versioned Objects
Single storage model handles all object types with runtime-determined schemas
Append-only semantics with efficient access to current versions
Three-tier processing pipeline: Network (async) → Commonplace Workers (threads) → Storage (threads)

Core Architecture

Container Model

trait StorageEngine {
    fn create_container(&self, config: ContainerConfig) -> ContainerId;
    fn delete_container(&self, id: ContainerId) -> Result<()>;
    fn get_container(&self, id: ContainerId) -> Option<Container>;
    fn list_containers(&self) -> Vec<ContainerId>;
}

struct Container {
    id: ContainerId,
    created_at: Timestamp,
    
    // Dynamic object store registry
    object_stores: HashMap<ObjectStoreId, Box<dyn ObjectStore>>,
}

trait Container {
    fn create_object_store<T>(&mut self, store_type: ObjectStoreType) -> ObjectStoreId;
    fn get_object_store<T>(&self, id: ObjectStoreId) -> Option<&dyn ObjectStore>;
    fn delete_object_store(&mut self, id: ObjectStoreId) -> Result<()>;
    fn list_object_stores(&self) -> Vec<ObjectStoreId>;
}

Schema-Based Object Model

struct Object {
    id: ObjectId,
    schema_id: SchemaId,
    version: Version,
    created_at: Timestamp,
    sequence_number: u64,
    fields: HashMap<String, FieldValue>,
    references: Vec<ObjectId>,
}

struct Schema {
    id: SchemaId,
    name: String,           // e.g., "CardInstance", "LogicalCard", "note-v1", "task-v1"
    version: u32,
    fields: HashMap<String, FieldDefinition>,
    indexes: Vec<IndexDefinition>,
}

struct FieldDefinition {
    name: String,
    field_type: FieldType,
    required: bool,
    indexed: bool,
}

enum FieldType {
    Text,
    Number,
    Boolean,
    Date,
    Reference(SchemaId),
    Array(Box<FieldType>),
}

enum FieldValue {
    Text(String),
    Number(f64),
    Boolean(bool),
    Date(DateTime),
    Reference(ObjectId),
    Array(Vec<FieldValue>),
}

Storage Hierarchy

Three-level hierarchy provides clean abstraction boundaries:

StorageEngine: Manages containers and provides isolation
Container: Manages ObjectStores for one isolated tenant
ObjectStore: Manages Objects within container (can contain multiple schemas)

trait ObjectStore {
    fn create(&self, object: Object) -> ObjectId;
    fn get_current(&self, id: ObjectId) -> Option<Object>;
    fn get_version(&self, id: ObjectId, version: Version) -> Option<Object>;
    fn get_version_history(&self, id: ObjectId) -> Vec<Version>;
    fn query(&self, criteria: QueryCriteria) -> ObjectIterator;
    fn query_by_schema(&self, schema_id: SchemaId, criteria: QueryCriteria) -> ObjectIterator;
}

Container Isolation

Privacy Guarantees: Complete data separation between containers
Resource Management: Independent memory/disk quotas per container
Operational Boundaries: Backup/restore/migration per container
Access Control: Storage layer enforces container boundaries
Cross-Container Sharing: Results in data duplication (handled by higher-layer federation)

Schema Management

Each Container includes a Schema registry:

struct Container {
    object_stores: HashMap<ObjectStoreId, Box<dyn ObjectStore>>,
    schema_registry: ObjectStore<Schema>,  // All schemas for this container
}

Schema Evolution: Versioned schemas support backward compatibility
Multi-Schema Stores: Single ObjectStore can contain Objects with different schemas
Query Flexibility: Can query across schemas or filter by specific schema
Validation: Objects validated against their referenced schema

Storage vs Application Schemas

Storage Schema (storage layer): Field definitions for structured storage and querying CardType (Commonplace layer): Combines storage schema with behavior, scripts, UI templates

// Higher-level Commonplace concept
struct CardType {
    id: CardTypeId,
    name: String,
    storage_schema_id: SchemaId,    // References storage schema
    commonscript: Option<String>,   // Behavior/validation scripts
    ui_template: Option<String>,    // Rendering information
    metadata: CardTypeMetadata,
}

Update Semantics

All object types use append-only semantics - "updates" create new Object versions:

Objects never modified in place
New versions reference previous versions through sequence numbering
Complete audit trail preserved
Schema changes handled through schema versioning

Three-Tier Threading Architecture

Domain-Separated Processing Pipeline

The storage system uses a three-tier processing pipeline:

(A) Network Domain: Async I/O for transport - parse/serialize protocol messages, fast handoff
(B) Commonplace Worker Domain: OS threads for business logic - decompose protocol operations, execute Commonscript, coordinate storage operations
(C) Storage Domain: OS threads for data operations - handle Container/ObjectStore/Object primitives
Background Domain: OS threads for maintenance tasks - compaction, eviction, cleanup

Processing Flow

Network (Async) → Commonplace Workers (Threads) → Storage Workers (Threads)

Protocol Message → Business Logic Decomposition → Storage Primitives

Per-Thread StorageEngine Container Access

struct StorageWorker {
    thread_id: ThreadId,
    storage_engines: HashMap<ContainerId, StorageEngine>,  // Per Container, not Collection
    work_receiver: Receiver<WorkItem>,
}

struct StorageDispatcher {
    // Track which thread owns which container
    container_assignments: DashMap<ContainerId, ThreadId>,
    
    // Per-thread work channels
    thread_channels: HashMap<ThreadId, Sender<WorkItem>>,
}

struct CommonplaceWorker {
    worker_id: WorkerId,
    script_vm: ScriptVM,                    // Integrated Commonscript VM
    storage_dispatcher: StorageDispatcher,  // Access to storage domain
    work_receiver: Receiver<CommonplaceWorkItem>,
}

Container-to-Thread Affinity

Container Ownership: Each Container exclusively owned by one storage thread
Collections as Objects: Collections are Objects with specific schema stored within Containers
Routing Logic: Dispatcher routes work items to thread that owns the target Container
State Isolation: Each thread's StorageEngine state completely independent
Simple Concurrency: No coordination needed within threads for Container operations

Cross-Domain Communication

// Network domain (async) → Commonplace Workers (threads)
NetworkMessage::MoveCard { card_id, from_collection, to_collection }
    ↓ (fast handoff)

// Commonplace Worker: Business logic decomposition  
CommonplaceWorker::handle_move_card() {
    // Validate permissions, run scripts
    // Decompose into storage primitives:
    storage.delete_object(container_id, reference_id).await?;
    storage.create_object(container_id, new_reference_object).await?;
}
    ↓ (storage operations)

// Storage domain: Container/Object primitives
StorageWorker::create_object(container_id, object) { /* ... */ }
StorageWorker::delete_object(container_id, object_id) { /* ... */ }

Storage Primitive Operations

Storage domain operations are Container/ObjectStore/Object primitives:

enum StorageOperation {
    // Container management
    OpenContainer(ContainerId),
    CloseContainer(ContainerId),
    
    // Object operations within containers
    CreateObject { container_id: ContainerId, object: Object },
    GetObject { container_id: ContainerId, object_id: ObjectId },
    QueryObjects { container_id: ContainerId, schema_id: SchemaId, criteria: QueryCriteria },
    
    // ObjectStore management
    CreateObjectStore { container_id: ContainerId, store_type: ObjectStoreType },
    ListObjectStores { container_id: ContainerId },
}

Collections and References are higher-level concepts handled by Commonplace Workers:

// Collections are Objects with "Collection" schema
CommonplaceOp::CreateCollection { container_id, name } 
  → StorageOp::CreateObject { container_id, collection_object }

// References are Objects with "Reference" schema  
CommonplaceOp::AddCardToCollection { container_id, collection_id, card_id }
  → StorageOp::CreateObject { container_id, reference_object }

Cache-Persistent Architecture

Memory Layer (Cache)

Multi-Schema ObjectStores: DashMap-based concurrent access supporting multiple schemas
Layered Resolution: Base views + recent changes for performance
Cache Behavior: Memory layer acts as eviction-based cache over disk storage
Memory Management: Active eviction prevents unbounded memory growth
Schema-Aware Indexing: Indexes created based on schema field definitions
Thread-Local State: Each storage thread maintains independent cache state

struct MemoryObjectStore {
    // Base materialized view (stable, efficient access)
    base_objects: Arc<DashMap<ObjectId, Object>>,
    
    // Recent changes (append-only, checked first)
    recent_changes: DashMap<ObjectId, Vec<Object>>,
    
    // Schema-based indexes for efficient querying
    field_indexes: HashMap<SchemaId, HashMap<String, FieldIndex>>,
    
    // Memory management
    access_tracker: LRUTracker<ObjectId>,
    memory_budget: AtomicUsize,
    eviction_policy: EvictionPolicy,
    change_count: AtomicUsize,
}

enum EvictionPolicy {
    LRU(usize),           // Keep N most recently used objects
    TTL(Duration),        // Evict after time threshold
    MemoryPressure(usize), // Evict when over memory limit
}

Disk Layer (Persistence)

Append-Only Log: All Object creates written sequentially
Index Files: Efficient lookup by ObjectId, schema-based queries
Compaction: Periodic cleanup of old versions
Crash Recovery: Log replay on startup
Cache Miss Handling: Serves objects evicted from memory layer
Thread-Local Files: Each storage thread manages disk files for its assigned collections

Cache Coherency

Write-Through: All writes go to both memory and disk layers
Read Path: Memory first (cache hit), fallback to disk (cache miss)
Eviction: Objects removed from memory when cache is full
Cache Loading: Disk objects loaded into memory on access
Compaction: Promote recent changes to base views, may trigger eviction
Work Routing: Storage operations routed to thread owning the target collection

Performance Characteristics

Read Operations

Hot Objects: O(1) from memory base views
Recently Changed: O(k) where k = recent versions
Cold Objects: Disk read with index lookup
Schema Queries: Efficient field-based queries using schema-aware indexes
Cross-Schema Queries: Can query across multiple schemas in same ObjectStore
Cache Friendly: No massive allocations, localized performance impact

Write Operations

Memory: O(1) append to recent changes + write-through to disk
Disk: Sequential append to log
No Blocking: Writers don't block readers
Memory Pressure: May trigger eviction or aggressive compaction
Graceful Degradation: Performance degrades locally per object

Memory Management Strategy

Eviction Policies: LRU, TTL, or memory pressure-based eviction
Version Caching: Keep current versions in memory, limit historical versions
Memory Budgets: Per-container and system-wide memory limits
Adaptive Compaction: More aggressive compaction under memory pressure
Cache Replacement: Objects evicted from memory remain accessible on disk

Compaction Strategy

Threshold-Based: Compact when change_count exceeds limit
Memory Pressure: Lower thresholds when approaching memory limits
Per-Thread Compaction: Each storage thread manages compaction for its collections
Background Coordination: Background domain handles cross-thread compaction coordination
Incremental: Process subsets of objects to avoid latency spikes
Disk Compaction: Background domain handles log file management

Scaling Properties

Three-Tier Pipeline: Network (async) → Commonplace Workers (threads) → Storage (threads)
Per-Thread Storage Engines: Independent scaling per storage worker thread
Container Affinity: Containers distributed across storage threads for load balancing
No Inter-Thread Coordination: Storage threads operate independently on their containers
Domain Isolation: Complex protocol operations don't block async network operations
Business Logic Separation: Commonplace workers handle complex operations and coordination
Memory Efficiency: Linear growth with actual changes, per-thread cache management
Exclusive Access: Simplified concurrency through thread-level container ownership

Implementation Phases

Container Management: Implement container creation/deletion and isolation
Three-Tier Threading: Integrate Network → Commonplace Workers → Storage pipeline
Commonplace Workers: Implement business logic decomposition and Commonscript integration
Per-Thread StorageEngines: Implement container-to-thread affinity and exclusive access
Storage Primitives: Implement Container/ObjectStore/Object operations
Memory Layer: Implement layered resolution for all object types within containers
Disk Layer: Add append-only log with basic indexing per container
Cross-Domain Communication: Connect all three domains with work dispatchers
Background Tasks: Move compaction and maintenance to background domain
Optimization: Add advanced indexing, query support, and performance tuning
Production: Monitoring, operational tooling, and deployment readiness