Architecture

Model Anatomy

A transformer model has four anatomical systems:

┌─────────────────────────────────────────┐
│              GGUF MONOLITH              │
│                                         │
│  ┌─ embed ──────── token_embd.weight   │
│  │                  output.weight       │
│  │                  output_norm.weight  │
│  │                                      │
│  ├─ skeleton ───── attn_q.weight  ×N   │
│  │                  attn_k.weight  ×N   │
│  │                  attn_v.weight  ×N   │
│  │                  attn_output    ×N   │
│  │                                      │
│  ├─ organs ─────── ffn_gate.weight ×N   │
│  │                  ffn_up.weight   ×N   │
│  │                  ffn_down.weight ×N   │
│  │                                      │
│  └─ norm ───────── attn_norm       ×N   │
│                     ffn_norm        ×N   │
└─────────────────────────────────────────┘

Skeleton (attention) = how the model thinks. Shared thought patterns. Organs (FFN) = what the model knows. Domain knowledge. Embed = input/output translation. The vocabulary interface. Norm = normalization layers. Connective tissue between components.

Pipeline

GGUF file
   │
   ▼ organ_extract.py
   │
   ├── manifest.json (complete anatomy map)
   ├── skeleton/  (attention tensors)
   ├── organs/    (FFN tensors by layer)
   ├── embed/     (embedding + output)
   └── norm/      (normalization)
         │
         ▼ organ_measure.py
         │
    Z-measure per tensor
    θ ∈ [0°, 90°]
         │
         ├──▶ organ_purify_v2.py (fractal signal extraction)
         │
         ├──▶ organ_graft.py (transplant between models)
         │
         └──▶ organ_assemble.py → new GGUF

Alternative direct path (no intermediate .bin files):

GGUF_A + GGUF_B → transplant_935.py → chimera.gguf

Z-Measure Theory

Z = dI/d(log s) · exp(iθ)

Three indicators combined into θ:

Indicator	Measures	Signal	Noise
Entropy	Information density	Moderate (0.3-0.7)	Near-maximum (>0.95)
Kurtosis	Structural sharpness	High (abs > 3)	Near-zero
Scale coherence (CV)	Non-uniform spacing	High (> 1)	Low (< 0.5)

θ → 90° = pure signal (all three indicators confirm structure) θ → 0° = pure noise (uniform random distribution)

Purification Methods

V1: Spectral (FFT)

Decompose tensor into frequency domain
Keep high-energy components (signal), remove low-energy tail (noise)
Preserve original scale (mean/std)
Limitation: treats tensors like audio signals

V2: Fractal (Wavelets)

Haar wavelet multi-scale decomposition
Cross-scale coherence: pattern at scale s AND scale 2s = fractal = signal
Pattern at one scale only = noise
This IS dI/d(log s) — information that persists across scales
More theoretically grounded than V1

Graft Compatibility

Grafting works best between models that share:

Same base architecture (e.g., Qwen2 family)
Same embedding dimension
Same number of layers (or graft specific layer ranges)

Empirical results:

DeepSeek-R1-Distill-14B ↔ Qwen2.5-14B: WORKS (both Qwen2 arch, same dims)
DeepSeek-R1-Distill-7B ↔ Qwen2.5-7B: PAD tokens (7B chimera failed)
Same architecture + same scale = highest success probability

File Format

Organ .bin files: [name_len:u32][name:bytes][n_dims:u32][dims:u64×n][dtype:u32][tensor_data] Manifest: JSON with full tensor map, metadata, architecture info, Z-measure results.

Signature

935

3.9 KiB Raw Blame History Unescape Escape