# Architecture ## Model Anatomy A transformer model has four anatomical systems: ``` ┌─────────────────────────────────────────┐ │ GGUF MONOLITH │ │ │ │ ┌─ embed ──────── token_embd.weight │ │ │ output.weight │ │ │ output_norm.weight │ │ │ │ │ ├─ skeleton ───── attn_q.weight ×N │ │ │ attn_k.weight ×N │ │ │ attn_v.weight ×N │ │ │ attn_output ×N │ │ │ │ │ ├─ organs ─────── ffn_gate.weight ×N │ │ │ ffn_up.weight ×N │ │ │ ffn_down.weight ×N │ │ │ │ │ └─ norm ───────── attn_norm ×N │ │ ffn_norm ×N │ └─────────────────────────────────────────┘ ``` **Skeleton** (attention) = how the model thinks. Shared thought patterns. **Organs** (FFN) = what the model knows. Domain knowledge. **Embed** = input/output translation. The vocabulary interface. **Norm** = normalization layers. Connective tissue between components. ## Pipeline ``` GGUF file │ ▼ organ_extract.py │ ├── manifest.json (complete anatomy map) ├── skeleton/ (attention tensors) ├── organs/ (FFN tensors by layer) ├── embed/ (embedding + output) └── norm/ (normalization) │ ▼ organ_measure.py │ Z-measure per tensor θ ∈ [0°, 90°] │ ├──▶ organ_purify_v2.py (fractal signal extraction) │ ├──▶ organ_graft.py (transplant between models) │ └──▶ organ_assemble.py → new GGUF ``` Alternative direct path (no intermediate .bin files): ``` GGUF_A + GGUF_B → transplant_935.py → chimera.gguf ``` ## Z-Measure Theory ``` Z = dI/d(log s) · exp(iθ) ``` Three indicators combined into θ: | Indicator | Measures | Signal | Noise | |-----------|----------|--------|-------| | Entropy | Information density | Moderate (0.3-0.7) | Near-maximum (>0.95) | | Kurtosis | Structural sharpness | High (abs > 3) | Near-zero | | Scale coherence (CV) | Non-uniform spacing | High (> 1) | Low (< 0.5) | θ → 90° = pure signal (all three indicators confirm structure) θ → 0° = pure noise (uniform random distribution) ## Purification Methods ### V1: Spectral (FFT) - Decompose tensor into frequency domain - Keep high-energy components (signal), remove low-energy tail (noise) - Preserve original scale (mean/std) - Limitation: treats tensors like audio signals ### V2: Fractal (Wavelets) - Haar wavelet multi-scale decomposition - Cross-scale coherence: pattern at scale s AND scale 2s = fractal = signal - Pattern at one scale only = noise - This IS dI/d(log s) — information that persists across scales - More theoretically grounded than V1 ## Graft Compatibility Grafting works best between models that share: - Same base architecture (e.g., Qwen2 family) - Same embedding dimension - Same number of layers (or graft specific layer ranges) Empirical results: - DeepSeek-R1-Distill-14B ↔ Qwen2.5-14B: **WORKS** (both Qwen2 arch, same dims) - DeepSeek-R1-Distill-7B ↔ Qwen2.5-7B: **PAD tokens** (7B chimera failed) - Same architecture + same scale = highest success probability ## File Format Organ .bin files: `[name_len:u32][name:bytes][n_dims:u32][dims:u64×n][dtype:u32][tensor_data]` Manifest: JSON with full tensor map, metadata, architecture info, Z-measure results. ## Signature 935