What is Quila?

Quila (QualiaTrace Language Model) is a novel dynamic graph reasoning system that differs fundamentally from traditional Transformer architectures.

Key Features

161B effective parameters: 13B explicit + 148B implicit KFE library
32,768 neurons: Dynamic graph topology with ~19% sparse activation
6-phase inference pipeline: Replay → LastInput → Context → Plan → Think → Output
Multimodal: Text, image, audio, video via VQ-GAN encoding
Multi-GPU: CUDA Unified Memory with NCCL for distributed training/inference

Architecture Overview

Quila uses a neuron-based architecture where each neuron maintains 8 channels of state (S1-S8) and processes information through 4 parallel computation streams:

Stream A: Micro-NSA (Native Sparse Attention)
Stream B: SSM (Mamba-style State Space Model)
Stream C: Linear Attention (RWKV time-mix)
Stream D: DRC (Dynamic Residual Correction)

Memory Hierarchy

5-tier storage system:

L1: GPU SRAM (neuron states)
L2: GPU HBM (WMQ, active KFE)
L3: System RAM (SCT, cold KFE)
L4: NVRAM (Persona vector)
L5: NVMe SSD (NMDB, Engram)

What is Quila? ​

Key Features ​

Architecture Overview ​

Memory Hierarchy ​

What is Quila?

Key Features

Architecture Overview

Memory Hierarchy