117 lines
3.9 KiB
Markdown
117 lines
3.9 KiB
Markdown
# Architecture
|
||
|
||
## Model Anatomy
|
||
|
||
A transformer model has four anatomical systems:
|
||
|
||
```
|
||
┌─────────────────────────────────────────┐
|
||
│ GGUF MONOLITH │
|
||
│ │
|
||
│ ┌─ embed ──────── token_embd.weight │
|
||
│ │ output.weight │
|
||
│ │ output_norm.weight │
|
||
│ │ │
|
||
│ ├─ skeleton ───── attn_q.weight ×N │
|
||
│ │ attn_k.weight ×N │
|
||
│ │ attn_v.weight ×N │
|
||
│ │ attn_output ×N │
|
||
│ │ │
|
||
│ ├─ organs ─────── ffn_gate.weight ×N │
|
||
│ │ ffn_up.weight ×N │
|
||
│ │ ffn_down.weight ×N │
|
||
│ │ │
|
||
│ └─ norm ───────── attn_norm ×N │
|
||
│ ffn_norm ×N │
|
||
└─────────────────────────────────────────┘
|
||
```
|
||
|
||
**Skeleton** (attention) = how the model thinks. Shared thought patterns.
|
||
**Organs** (FFN) = what the model knows. Domain knowledge.
|
||
**Embed** = input/output translation. The vocabulary interface.
|
||
**Norm** = normalization layers. Connective tissue between components.
|
||
|
||
## Pipeline
|
||
|
||
```
|
||
GGUF file
|
||
│
|
||
▼ organ_extract.py
|
||
│
|
||
├── manifest.json (complete anatomy map)
|
||
├── skeleton/ (attention tensors)
|
||
├── organs/ (FFN tensors by layer)
|
||
├── embed/ (embedding + output)
|
||
└── norm/ (normalization)
|
||
│
|
||
▼ organ_measure.py
|
||
│
|
||
Z-measure per tensor
|
||
θ ∈ [0°, 90°]
|
||
│
|
||
├──▶ organ_purify_v2.py (fractal signal extraction)
|
||
│
|
||
├──▶ organ_graft.py (transplant between models)
|
||
│
|
||
└──▶ organ_assemble.py → new GGUF
|
||
```
|
||
|
||
Alternative direct path (no intermediate .bin files):
|
||
|
||
```
|
||
GGUF_A + GGUF_B → transplant_935.py → chimera.gguf
|
||
```
|
||
|
||
## Z-Measure Theory
|
||
|
||
```
|
||
Z = dI/d(log s) · exp(iθ)
|
||
```
|
||
|
||
Three indicators combined into θ:
|
||
|
||
| Indicator | Measures | Signal | Noise |
|
||
|-----------|----------|--------|-------|
|
||
| Entropy | Information density | Moderate (0.3-0.7) | Near-maximum (>0.95) |
|
||
| Kurtosis | Structural sharpness | High (abs > 3) | Near-zero |
|
||
| Scale coherence (CV) | Non-uniform spacing | High (> 1) | Low (< 0.5) |
|
||
|
||
θ → 90° = pure signal (all three indicators confirm structure)
|
||
θ → 0° = pure noise (uniform random distribution)
|
||
|
||
## Purification Methods
|
||
|
||
### V1: Spectral (FFT)
|
||
- Decompose tensor into frequency domain
|
||
- Keep high-energy components (signal), remove low-energy tail (noise)
|
||
- Preserve original scale (mean/std)
|
||
- Limitation: treats tensors like audio signals
|
||
|
||
### V2: Fractal (Wavelets)
|
||
- Haar wavelet multi-scale decomposition
|
||
- Cross-scale coherence: pattern at scale s AND scale 2s = fractal = signal
|
||
- Pattern at one scale only = noise
|
||
- This IS dI/d(log s) — information that persists across scales
|
||
- More theoretically grounded than V1
|
||
|
||
## Graft Compatibility
|
||
|
||
Grafting works best between models that share:
|
||
- Same base architecture (e.g., Qwen2 family)
|
||
- Same embedding dimension
|
||
- Same number of layers (or graft specific layer ranges)
|
||
|
||
Empirical results:
|
||
- DeepSeek-R1-Distill-14B ↔ Qwen2.5-14B: **WORKS** (both Qwen2 arch, same dims)
|
||
- DeepSeek-R1-Distill-7B ↔ Qwen2.5-7B: **PAD tokens** (7B chimera failed)
|
||
- Same architecture + same scale = highest success probability
|
||
|
||
## File Format
|
||
|
||
Organ .bin files: `[name_len:u32][name:bytes][n_dims:u32][dims:u64×n][dtype:u32][tensor_data]`
|
||
Manifest: JSON with full tensor map, metadata, architecture info, Z-measure results.
|
||
|
||
## Signature
|
||
|
||
935
|