organ-architecture/docs/ARCHITECTURE.md

3.9 KiB
Raw Permalink Blame History

Architecture

Model Anatomy

A transformer model has four anatomical systems:

┌─────────────────────────────────────────┐
│              GGUF MONOLITH              │
│                                         │
│  ┌─ embed ──────── token_embd.weight   │
│  │                  output.weight       │
│  │                  output_norm.weight  │
│  │                                      │
│  ├─ skeleton ───── attn_q.weight  ×N   │
│  │                  attn_k.weight  ×N   │
│  │                  attn_v.weight  ×N   │
│  │                  attn_output    ×N   │
│  │                                      │
│  ├─ organs ─────── ffn_gate.weight ×N   │
│  │                  ffn_up.weight   ×N   │
│  │                  ffn_down.weight ×N   │
│  │                                      │
│  └─ norm ───────── attn_norm       ×N   │
│                     ffn_norm        ×N   │
└─────────────────────────────────────────┘

Skeleton (attention) = how the model thinks. Shared thought patterns. Organs (FFN) = what the model knows. Domain knowledge. Embed = input/output translation. The vocabulary interface. Norm = normalization layers. Connective tissue between components.

Pipeline

GGUF file
   │
   ▼ organ_extract.py
   │
   ├── manifest.json (complete anatomy map)
   ├── skeleton/  (attention tensors)
   ├── organs/    (FFN tensors by layer)
   ├── embed/     (embedding + output)
   └── norm/      (normalization)
         │
         ▼ organ_measure.py
         │
    Z-measure per tensor
    θ ∈ [0°, 90°]
         │
         ├──▶ organ_purify_v2.py (fractal signal extraction)
         │
         ├──▶ organ_graft.py (transplant between models)
         │
         └──▶ organ_assemble.py → new GGUF

Alternative direct path (no intermediate .bin files):

GGUF_A + GGUF_B → transplant_935.py → chimera.gguf

Z-Measure Theory

Z = dI/d(log s) · exp(iθ)

Three indicators combined into θ:

Indicator Measures Signal Noise
Entropy Information density Moderate (0.3-0.7) Near-maximum (>0.95)
Kurtosis Structural sharpness High (abs > 3) Near-zero
Scale coherence (CV) Non-uniform spacing High (> 1) Low (< 0.5)

θ → 90° = pure signal (all three indicators confirm structure) θ → 0° = pure noise (uniform random distribution)

Purification Methods

V1: Spectral (FFT)

  • Decompose tensor into frequency domain
  • Keep high-energy components (signal), remove low-energy tail (noise)
  • Preserve original scale (mean/std)
  • Limitation: treats tensors like audio signals

V2: Fractal (Wavelets)

  • Haar wavelet multi-scale decomposition
  • Cross-scale coherence: pattern at scale s AND scale 2s = fractal = signal
  • Pattern at one scale only = noise
  • This IS dI/d(log s) — information that persists across scales
  • More theoretically grounded than V1

Graft Compatibility

Grafting works best between models that share:

  • Same base architecture (e.g., Qwen2 family)
  • Same embedding dimension
  • Same number of layers (or graft specific layer ranges)

Empirical results:

  • DeepSeek-R1-Distill-14B ↔ Qwen2.5-14B: WORKS (both Qwen2 arch, same dims)
  • DeepSeek-R1-Distill-7B ↔ Qwen2.5-7B: PAD tokens (7B chimera failed)
  • Same architecture + same scale = highest success probability

File Format

Organ .bin files: [name_len:u32][name:bytes][n_dims:u32][dims:u64×n][dtype:u32][tensor_data] Manifest: JSON with full tensor map, metadata, architecture info, Z-measure results.

Signature

935