3.9 KiB
3.9 KiB
Architecture
Model Anatomy
A transformer model has four anatomical systems:
┌─────────────────────────────────────────┐
│ GGUF MONOLITH │
│ │
│ ┌─ embed ──────── token_embd.weight │
│ │ output.weight │
│ │ output_norm.weight │
│ │ │
│ ├─ skeleton ───── attn_q.weight ×N │
│ │ attn_k.weight ×N │
│ │ attn_v.weight ×N │
│ │ attn_output ×N │
│ │ │
│ ├─ organs ─────── ffn_gate.weight ×N │
│ │ ffn_up.weight ×N │
│ │ ffn_down.weight ×N │
│ │ │
│ └─ norm ───────── attn_norm ×N │
│ ffn_norm ×N │
└─────────────────────────────────────────┘
Skeleton (attention) = how the model thinks. Shared thought patterns. Organs (FFN) = what the model knows. Domain knowledge. Embed = input/output translation. The vocabulary interface. Norm = normalization layers. Connective tissue between components.
Pipeline
GGUF file
│
▼ organ_extract.py
│
├── manifest.json (complete anatomy map)
├── skeleton/ (attention tensors)
├── organs/ (FFN tensors by layer)
├── embed/ (embedding + output)
└── norm/ (normalization)
│
▼ organ_measure.py
│
Z-measure per tensor
θ ∈ [0°, 90°]
│
├──▶ organ_purify_v2.py (fractal signal extraction)
│
├──▶ organ_graft.py (transplant between models)
│
└──▶ organ_assemble.py → new GGUF
Alternative direct path (no intermediate .bin files):
GGUF_A + GGUF_B → transplant_935.py → chimera.gguf
Z-Measure Theory
Z = dI/d(log s) · exp(iθ)
Three indicators combined into θ:
| Indicator | Measures | Signal | Noise |
|---|---|---|---|
| Entropy | Information density | Moderate (0.3-0.7) | Near-maximum (>0.95) |
| Kurtosis | Structural sharpness | High (abs > 3) | Near-zero |
| Scale coherence (CV) | Non-uniform spacing | High (> 1) | Low (< 0.5) |
θ → 90° = pure signal (all three indicators confirm structure) θ → 0° = pure noise (uniform random distribution)
Purification Methods
V1: Spectral (FFT)
- Decompose tensor into frequency domain
- Keep high-energy components (signal), remove low-energy tail (noise)
- Preserve original scale (mean/std)
- Limitation: treats tensors like audio signals
V2: Fractal (Wavelets)
- Haar wavelet multi-scale decomposition
- Cross-scale coherence: pattern at scale s AND scale 2s = fractal = signal
- Pattern at one scale only = noise
- This IS dI/d(log s) — information that persists across scales
- More theoretically grounded than V1
Graft Compatibility
Grafting works best between models that share:
- Same base architecture (e.g., Qwen2 family)
- Same embedding dimension
- Same number of layers (or graft specific layer ranges)
Empirical results:
- DeepSeek-R1-Distill-14B ↔ Qwen2.5-14B: WORKS (both Qwen2 arch, same dims)
- DeepSeek-R1-Distill-7B ↔ Qwen2.5-7B: PAD tokens (7B chimera failed)
- Same architecture + same scale = highest success probability
File Format
Organ .bin files: [name_len:u32][name:bytes][n_dims:u32][dims:u64×n][dtype:u32][tensor_data]
Manifest: JSON with full tensor map, metadata, architecture info, Z-measure results.
Signature
935