Skip to content

What is Quila?

Quila (QualiaTrace Language Model) is a novel dynamic graph reasoning system that differs fundamentally from traditional Transformer architectures.

Key Features

  • 161B effective parameters: 13B explicit + 148B implicit KFE library
  • 32,768 neurons: Dynamic graph topology with ~19% sparse activation
  • 6-phase inference pipeline: Replay → LastInput → Context → Plan → Think → Output
  • Multimodal: Text, image, audio, video via VQ-GAN encoding
  • Multi-GPU: CUDA Unified Memory with NCCL for distributed training/inference

Architecture Overview

Quila uses a neuron-based architecture where each neuron maintains 8 channels of state (S1-S8) and processes information through 4 parallel computation streams:

  • Stream A: Micro-NSA (Native Sparse Attention)
  • Stream B: SSM (Mamba-style State Space Model)
  • Stream C: Linear Attention (RWKV time-mix)
  • Stream D: DRC (Dynamic Residual Correction)

Memory Hierarchy

5-tier storage system:

  • L1: GPU SRAM (neuron states)
  • L2: GPU HBM (WMQ, active KFE)
  • L3: System RAM (SCT, cold KFE)
  • L4: NVRAM (Persona vector)
  • L5: NVMe SSD (NMDB, Engram)