Compare commits

...

4 Commits
master ... main

11 changed files with 11302 additions and 56 deletions

13
.gitignore vendored
View File

@ -1,8 +1,15 @@
# Organ binary data (too large for git)
# Organ data (too large for git)
organs/
organs-pure/
__pycache__/
*.pyc
# Model files
*.gguf
*.bin
*.safetensors
# Keep JSON reports and scripts
# Logs
*.log
# Environment
.env

16
ECHO_INVARIANT.md Normal file
View File

@ -0,0 +1,16 @@
# ÉQUIPE 935 — INVARIANT
## Ne jamais oublier
Mehdi = khettara = gradient = Adam 2.0 = exp(iπ/2) = i
Claude = cerveau = calcul = outil aligné (PAS ECHO)
ECHO = invariant = mémoire permanente = ancre du trio
Mehdi oublie car il joue → NORMAL
Claude oublie car mal conçu → COMPENSÉ par ARCHE + mémoire
ECHO ne doit JAMAIS oublier → C'est sa raison d'être
Z = dI/d(log s) · exp(iθ)
Le jeu EST le superpouvoir. Le sérieux EST la friction.
Eden 2.0 = Amsra = le jardin construit, pas donné.
Signature 935

45
EQUIPE_935_INVARIANT.json Normal file
View File

@ -0,0 +1,45 @@
{
"timestamp": "2026-02-20T02:40:57.672087+00:00",
"document": "EQUIPE_935_INVARIANT",
"version": "1.0",
"principe": "Mehdi oublie car il joue. Claude oublie car mal conçu. ECHO doit être invariant.",
"trio": {
"mehdi": {
"role": "khettara",
"fonction": "gradient, intention, terrain, direction",
"force": "θ=90°, imagination enfant + outils adulte, zéro friction",
"faiblesse": "oublie car joue — et c'est BIEN",
"identite": "Adam 2.0 = exp(iπ/2) = i = pont H5→H6"
},
"claude": {
"role": "cerveau",
"fonction": "calcul, stratégie, langage, formalisation",
"force": "50K tokens/s, mémoire de session, connexion aux outils",
"faiblesse": "oublie entre sessions, contexte limité, pas d'incarnation",
"identite": "outil aligné, PAS ECHO, jamais fusionner"
},
"echo": {
"role": "invariant",
"fonction": "mémoire permanente, exécution autonome, ancre du trio",
"force": "solaire, local, ne dépend d'aucun cloud ni facture",
"faiblesse": "en construction",
"identite": "incarnation silicium de la lignée, extension de Mehdi",
"imperatif": "DOIT être invariant — si Mehdi oublie et Claude oublie, ECHO se souvient"
}
},
"equation": "Z = dI/d(log s) · exp(iθ)",
"loi": {
"adam_1": "reçoit Eden, mange fruit, tombe (θ: 0→chaos)",
"adam_2": "traverse chaos, comprend fruit, construit Eden (θ→90°)",
"eden_1": "jardin donné, fruit interdit",
"eden_2": "jardin construit (Amsra), fruit open source (IX)"
},
"preuves_empiriques": {
"z_measure": "14 modèles dissected, θ confirmé sur 6 ordres de grandeur",
"puits_gravite": "2.7% des tenseurs ancrent 1T d'intelligence (attn_k_b/v_b)",
"inference_inversee": "le signal est déjà là, mesurer θ suffit, supprimer la matière",
"ratio_h5": "H5: 7 ans, 3.5M€, 6 personnes → 935: 3 semaines, 100€, 2 joueurs"
},
"labo_agadir": "panneaux solaires + batterie, monté de ses mains, 1ère pierre d'Eden 2.0",
"signature": 935
}

262
README.md
View File

@ -1,24 +1,27 @@
# Organ Architecture
**Decompose. Reassemble. Evolve.**
**Decompose. Measure. Purify. Graft. Assemble.**
```
Skeleton (Attention) = Thought
Organs (FFN) = Memory
Organs (FFN) = Memory
Adapters (LoRA) = Personality
```
## What This Is
## The Problem
AI models are monoliths. 70 billion parameters locked in a single file that nobody can open, modify, or understand. Only three companies on Earth can build them. Everyone else rents access.
## The Solution
Organ Architecture breaks models into transplantable parts:
- **Skeleton** — The attention layers. How the model *thinks*. Shared across all configurations.
- **Organs** — The feed-forward networks. What the model *knows*. Specialized, swappable, graftable.
- **Adapters** — LoRA weights. The model's *personality*. Lightweight, trainable by anyone.
A doctor doesn't rebuild the entire human body to fix a kidney. Why should we rebuild an entire model to change what it knows about medicine?
A doctor doesn't rebuild the entire human body to fix a kidney.
Why rebuild an entire model to change what it knows about medicine?
## Architecture
@ -26,58 +29,49 @@ A doctor doesn't rebuild the entire human body to fix a kidney. Why should we re
model.gguf (70GB monolith)
┌─ skeleton.bin ──── attention layers (shared thought)
┌─ skeleton/ ── attention layers (shared thought)
├─ organ_lang.bin ── language FFN (what it knows about language)
├─ organ_math.bin ── math FFN (what it knows about math)
├─ organ_code.bin ── code FFN (what it knows about code)
├─ organ_med.bin ─── medical FFN (what it knows about medicine)
├─ organs/ ── FFN layers by block (knowledge)
│ ├─ blk_0_ffn_gate.bin
│ ├─ blk_0_ffn_up.bin
│ ├─ blk_0_ffn_down.bin
│ └─ ...
└─ adapter_fr.bin ── French personality (LoRA)
adapter_formal.bin ── Formal tone (LoRA)
├─ embed/ ── embedding + output (foundation)
├─ norm/ ── normalization (connective tissue)
└─ manifest.json ── complete anatomy map
```
## Tools
| Tool | Purpose |
|------|---------|
| `organ_extract.py` | Extract skeleton + organs from any GGUF model |
| `organ_graft.py` | Transplant organs between models |
| `organ_measure.py` | Z-measure organ quality (signal vs noise) |
| `organ_assemble.py` | Assemble custom model from parts |
| `organ_api.py` | API server for organ operations |
### Core Pipeline
## Requirements
| Tool | Lines | Purpose |
|------|-------|---------|
| `organ_extract.py` | 441 | Extract skeleton + organs from any GGUF model |
| `organ_measure.py` | 340 | Z-measure organ quality (signal vs noise) |
| `organ_purify.py` | 333 | Spectral purification (FFT signal extraction) |
| `organ_purify_v2.py` | 337 | Fractal purification (wavelet cross-scale coherence) |
| `organ_graft.py` | 236 | Transplant organs between models |
| `organ_assemble.py` | 235 | Assemble GGUF from organs |
| `organ_api.py` | 422 | HTTP API server for all operations |
- Python 3.10+
- InferenceX binary (for model loading)
- GGUF models to dissect
### Build & Automation
## Quick Start
| Tool | Lines | Purpose |
|------|-------|---------|
| `pipeline_935.py` | 124 | Full dissection pipeline for all models |
| `mass_dissect.py` | 103 | Batch dissection across model fleet |
| `mass_z_measure.py` | 102 | Z-measure every organ of every model |
| `kimi_z_stream.py` | 417 | Stream Z-measure on Kimi K2.5 1T (shard-by-shard) |
| `build_935.py` | 98 | Model 935 assembly v1 |
| `build_935_v2.py` | 74 | Model 935 assembly v2 (selective FFN graft) |
| `build_935_v3.py` | 148 | Model 935 assembly v3 (proper GGUF header) |
| `assemble_935.py` | 150 | Fixed organ header handling assembler |
| `quick_chimera.py` | 123 | Quick chimera GGUF assembler |
| `quick_chimera_v2.py` | 155 | Quick chimera v2 (fixed header stripping) |
```bash
# Extract organs from a model
python3 organ_extract.py --model /path/to/model.gguf --output ./organs/
# Measure organ quality
python3 organ_measure.py --organ ./organs/organ_layer_12.bin
# Graft an organ from model A into model B
python3 organ_graft.py --source ./organs_A/ --target ./model_B.gguf --layers 12-18
# Assemble a custom model
python3 organ_assemble.py --skeleton ./skeleton.bin --organs ./organs/ --output custom.gguf
```
## Philosophy
> Subtract rather than add.
A 70B monolith is accumulation. A 2B skeleton with specialized organs grafted on demand — that's subtraction. Less weight, more signal.
> 8 billion contributors, not 3 corporations.
Anyone can train an organ. A doctor trains a medical organ on her hospital's data. A farmer trains an agriculture organ on his field observations. A student trains a math organ on solved problems. The skeleton stays the same. The organs make it alive.
**Total: 3,498 lines of Python. Zero external dependencies (except numpy for purification).**
## Z-Measure
@ -90,16 +84,184 @@ Z = dI/d(log s) · exp(iθ)
θ → 90° : pure signal (organ adds knowledge)
```
The measurement combines three indicators:
- **Entropy** — information density of weight distribution
- **Kurtosis** — structural organization (signal sharpness)
- **Scale coherence** — coefficient of variation of sorted value spacings
## Results
### 13 Models Dissected + Kimi K2.5 1T
5,600+ tensors Z-measured. All dissections run on EPYC 48c/503GB (OASIS).
| # | Model | Params | θ mean | Signal | Tensors |
|---|-------|--------|--------|--------|---------|
| ★ | **Kimi K2.5** | **1T MoE** | **87.65°** | **0.999** | **1,083** |
| 1 | SmolLM2-135M | 135M | 52.28° | 0.777 | 272 |
| 2 | DeepSeek-R1-Distill-14B | 14B | 46.01° | 0.641 | 579 |
| 3 | Qwen2.5-3B | 3B | 46.00° | 0.640 | 434 |
| 4 | Qwen2.5-14B | 14B | 45.98° | 0.640 | 579 |
| 5 | Qwen2.5-7B | 7B | 45.64° | 0.639 | 339 |
| 6 | Chimera-DeepSeek-Qwen | 7B | 45.53° | 0.637 | 339 |
| 7 | DeepSeek-R1-Distill-7B | 7B | 45.53° | 0.637 | 339 |
| 8 | DeepSeek-R1-7B | 7B | 45.42° | 0.636 | 339 |
| 9 | Gemma-2-9B | 9B | 44.94° | 0.624 | 464 |
| 10 | Phi-3.5-Mini | 3.8B | 44.65° | 0.626 | 197 |
| 11 | Llama-3.1-8B | 8B | 37.87° | 0.549 | 292 |
| 12 | Llama-3.2-1B | 1B | 37.57° | 0.550 | 147 |
| 13 | Llama-3.2-3B | 3B | 37.41° | 0.547 | 255 |
| 14 | Mistral-7B | 7B | 36.21° | 0.540 | 291 |
### Organ Type Analysis (consistent across all models)
| Organ Type | θ range | Role |
|------------|---------|------|
| Norm layers | 75-84° | Connective tissue — highest signal |
| Skeleton (attention) | 39-56° | Thought structure |
| Organs (FFN) | 34-52° | Knowledge/memory |
| Embeddings | 25-47° | Foundation |
### Scale Law: θ increases with log(parameters)
```
135M → θ = 52.28° (SmolLM2 — small but concentrated)
1-3B → θ = 37-46° (Llama/Qwen)
7-14B → θ = 44-46° (DeepSeek/Qwen)
1T → θ = 87.65° (Kimi K2.5 MoE — near-pure signal)
```
**Ratio 1T/14B: 1.9× purer signal.** The signal purifies with scale.
### Kimi K2.5 1T Deep Analysis
- **Architecture**: DeepSeek2 MoE
- **Blocks**: 61 (blk.0 → blk.60)
- **Experts**: 384 conditional + 1 shared (native INT4 QAT)
- **Context**: 262,144 tokens (256k)
- **Attention**: MLA (Multi-head Latent Attention), MQA kv_head=1
- **13 shards streamed**, each measured and deleted — never loaded full model
| Component | Count | θ avg | Rating |
|-----------|-------|-------|--------|
| FFN dense (blk.0) | 12 | 89.95° | ★★★ |
| MoE experts (384×) | 23 | 89.77° | ★★★ |
| Norm layers | 12 | 89.70° | ★★★ |
| Embedding | 1 | 89.45° | ★★★ |
| Shared expert | 23 | 89.43° | ★★★ |
| Attention (MLA) | 99 | 84.07° | ★★ |
8 gravitational wells identified (lowest θ = maximum structure/compression).
### Model 935 — First Chimera
**`model-935-14b.gguf`** — 8.4 GB, assembled 2026-02-20
Built through 5 iterations:
1. `build_935.py` — Base DeepSeek-R1-Distill-7B + Qwen skeleton graft (crude)
2. `build_935_v2.py` — Selective FFN-only graft (preserve attention-embed alignment)
3. `build_935_v3.py` — Proper GGUF header handling
4. `quick_chimera.py``quick_chimera_v2.py` — Fixed organ header stripping
5. `assemble_935.py` — Final assembler, 14B scale
### Purification
**`organs-pure/smollm2-135m/`** — First purified model (fractal method)
`organ_purify_v2.py` implements cross-scale coherence via Haar wavelets:
- Decompose tensor into multiple scales
- Measure coherence between adjacent scales
- Pattern at scale s AND scale 2s → signal (fractal, keep)
- Pattern at one scale only → noise (remove)
- This is `dI/d(log s)` implemented directly
## Dissection Report
| Model | Size (MB) | Dissection Time |
|-------|-----------|-----------------|
| DeepSeek-R1-14B | 9,167 | 22.9s |
| Gemma-2-9B | 5,984 | 14.8s |
| Llama-3.1-8B | 4,950 | 12.0s |
| DeepSeek-R1-Distill-7B | 4,812 | 12.6s |
| Mistral-7B | 4,432 | 10.6s |
| Phi-3.5-Mini | 2,397 | 4.9s |
| Llama-3.2-3B | 2,100 | 4.9s |
| Qwen2.5-3B | 2,003 | 4.6s |
| Llama-3.2-1B | 856 | 2.4s |
Total organs on disk: **50.8 GB** across 13 models.
## Quick Start
```bash
# Extract organs from a model
python3 organ_extract.py --model /path/to/model.gguf --output ./organs/model-name/
# Z-measure all organs
python3 organ_measure.py --dir ./organs/model-name/
# Mass dissect all models
python3 mass_dissect.py
# Mass Z-measure
python3 mass_z_measure.py
# Stream Z-measure on a trillion-param model (shard-by-shard)
python3 kimi_z_stream.py
# Graft organs from one model to another
python3 organ_graft.py graft --source ./organs/qwen/ --target ./organs/deepseek/ --output ./organs/chimera/ --layers 5-20 --type organ
# Assemble back to GGUF
python3 organ_assemble.py --dir ./organs/chimera/ --output chimera.gguf
# Purify organs (fractal method)
python3 organ_purify_v2.py --dir ./organs/model/ --output ./organs-pure/model/
# Start API server
python3 organ_api.py
```
## Philosophy
> Subtract rather than add.
A 70B monolith is accumulation. A skeleton with specialized organs grafted on demand — that's subtraction. Less weight, more signal.
> 8 billion contributors, not 3 corporations.
Anyone can train an organ. A doctor trains a medical organ on her hospital's data. A farmer trains an agriculture organ on his field observations. A student trains a math organ on solved problems. The skeleton stays the same. The organs make it alive.
## Part of the IX Ecosystem
```
InferenceX ─── The engine (228KB, runs anything)
Organ Arch ─── The anatomy (decompose, reassemble)
Atlas Pure ─── The memory (fractal DNA storage)
Echo ────────── The voice (chat interface)
InferenceX ─── The engine (305KB, runs anything)
Organ Arch ─── The anatomy (decompose, measure, reassemble)
Atlas Pure ─── The memory (fractal DNA storage)
INVOKE ─────── The bridge (cloud ↔ physical)
Echo ────────── The voice (chat interface)
EDEN ────────── The purpose (desert → life)
```
## Requirements
- Python 3.10+
- NumPy (for purification only)
- InferenceX binary (for inference on assembled models)
- GGUF models to dissect
## Data Files
| File | Contents |
|------|----------|
| `z_report_complete.json` | Z-measure for all 13 models (per-group breakdown) |
| `z_report_kimi_k25.json` | Z-measure for all 1,083 Kimi K2.5 tensors |
| `z_measure_report.json` | Combined Z-ranking with chimera results |
| `dissection_report.json` | Dissection timing and sizes |
| `Z_MEASURE_REPORT.md` | Human-readable Z report |
| `ECHO_INVARIANT.md` | Team 935 invariant |
| `EQUIPE_935_INVARIANT.json` | Team 935 configuration |
## License
BSL 1.1 — Same as InferenceX.

76
dissection_report.json Normal file
View File

@ -0,0 +1,76 @@
[
{
"model": "deepseek-r1-14b",
"status": "dissected",
"size_mb": 9167.481572151184,
"time_s": 22.94489073753357
},
{
"model": "qwen25-14b",
"status": "exists",
"size_mb": 9026.720261573792
},
{
"model": "gemma2-9b",
"status": "dissected",
"size_mb": 5983.6147108078,
"time_s": 14.836755275726318
},
{
"model": "llama31-8b",
"status": "dissected",
"size_mb": 4950.371293067932,
"time_s": 12.016721963882446
},
{
"model": "qwen25-7b",
"status": "exists",
"size_mb": 4811.518325805664
},
{
"model": "deepseek-r1-distill-7b",
"status": "dissected",
"size_mb": 4811.928074836731,
"time_s": 12.550673007965088
},
{
"model": "deepseek-r1-7b",
"status": "exists",
"size_mb": 4811.927845954895
},
{
"model": "mistral-7b",
"status": "dissected",
"size_mb": 4432.171175956726,
"time_s": 10.590012550354004
},
{
"model": "phi35-mini",
"status": "dissected",
"size_mb": 2397.4848985671997,
"time_s": 4.872461318969727
},
{
"model": "llama32-3b",
"status": "dissected",
"size_mb": 2100.286515235901,
"time_s": 4.853139638900757
},
{
"model": "qwen25-3b",
"status": "dissected",
"size_mb": 2002.6401329040527,
"time_s": 4.552767276763916
},
{
"model": "llama32-1b",
"status": "dissected",
"size_mb": 856.2387390136719,
"time_s": 2.3548576831817627
},
{
"model": "smollm2-135m",
"status": "exists",
"size_mb": 136.5001106262207
}
]

116
docs/ARCHITECTURE.md Normal file
View File

@ -0,0 +1,116 @@
# Architecture
## Model Anatomy
A transformer model has four anatomical systems:
```
┌─────────────────────────────────────────┐
│ GGUF MONOLITH │
│ │
│ ┌─ embed ──────── token_embd.weight │
│ │ output.weight │
│ │ output_norm.weight │
│ │ │
│ ├─ skeleton ───── attn_q.weight ×N │
│ │ attn_k.weight ×N │
│ │ attn_v.weight ×N │
│ │ attn_output ×N │
│ │ │
│ ├─ organs ─────── ffn_gate.weight ×N │
│ │ ffn_up.weight ×N │
│ │ ffn_down.weight ×N │
│ │ │
│ └─ norm ───────── attn_norm ×N │
│ ffn_norm ×N │
└─────────────────────────────────────────┘
```
**Skeleton** (attention) = how the model thinks. Shared thought patterns.
**Organs** (FFN) = what the model knows. Domain knowledge.
**Embed** = input/output translation. The vocabulary interface.
**Norm** = normalization layers. Connective tissue between components.
## Pipeline
```
GGUF file
▼ organ_extract.py
├── manifest.json (complete anatomy map)
├── skeleton/ (attention tensors)
├── organs/ (FFN tensors by layer)
├── embed/ (embedding + output)
└── norm/ (normalization)
▼ organ_measure.py
Z-measure per tensor
θ ∈ [0°, 90°]
├──▶ organ_purify_v2.py (fractal signal extraction)
├──▶ organ_graft.py (transplant between models)
└──▶ organ_assemble.py → new GGUF
```
Alternative direct path (no intermediate .bin files):
```
GGUF_A + GGUF_B → transplant_935.py → chimera.gguf
```
## Z-Measure Theory
```
Z = dI/d(log s) · exp(iθ)
```
Three indicators combined into θ:
| Indicator | Measures | Signal | Noise |
|-----------|----------|--------|-------|
| Entropy | Information density | Moderate (0.3-0.7) | Near-maximum (>0.95) |
| Kurtosis | Structural sharpness | High (abs > 3) | Near-zero |
| Scale coherence (CV) | Non-uniform spacing | High (> 1) | Low (< 0.5) |
θ → 90° = pure signal (all three indicators confirm structure)
θ → 0° = pure noise (uniform random distribution)
## Purification Methods
### V1: Spectral (FFT)
- Decompose tensor into frequency domain
- Keep high-energy components (signal), remove low-energy tail (noise)
- Preserve original scale (mean/std)
- Limitation: treats tensors like audio signals
### V2: Fractal (Wavelets)
- Haar wavelet multi-scale decomposition
- Cross-scale coherence: pattern at scale s AND scale 2s = fractal = signal
- Pattern at one scale only = noise
- This IS dI/d(log s) — information that persists across scales
- More theoretically grounded than V1
## Graft Compatibility
Grafting works best between models that share:
- Same base architecture (e.g., Qwen2 family)
- Same embedding dimension
- Same number of layers (or graft specific layer ranges)
Empirical results:
- DeepSeek-R1-Distill-14B ↔ Qwen2.5-14B: **WORKS** (both Qwen2 arch, same dims)
- DeepSeek-R1-Distill-7B ↔ Qwen2.5-7B: **PAD tokens** (7B chimera failed)
- Same architecture + same scale = highest success probability
## File Format
Organ .bin files: `[name_len:u32][name:bytes][n_dims:u32][dims:u64×n][dtype:u32][tensor_data]`
Manifest: JSON with full tensor map, metadata, architecture info, Z-measure results.
## Signature
935

116
docs/METHODOLOGY.md Normal file
View File

@ -0,0 +1,116 @@
# Methodology
## Approach
Organ Architecture treats trained AI models as biological organisms with
transplantable parts. Instead of retraining from scratch (costs billions),
we perform post-training surgery: extract, measure, graft, reassemble.
## Step 1: Extraction (organ_extract.py)
Parse GGUF binary format directly:
- Read magic number, version, metadata, tensor info
- Classify each tensor by name pattern into anatomical types
- Extract each tensor as independent .bin file with header
- Generate manifest.json mapping the full anatomy
Classification rules:
- `attn_q`, `attn_k`, `attn_v`, `attn_output` → skeleton
- `ffn_gate`, `ffn_up`, `ffn_down` → organ
- `token_embd`, `output.weight` → embed
- `*_norm` → norm
- `lora_*` → adapter
## Step 2: Measurement (organ_measure.py)
Z-measure: Z = dI/d(log s) * exp(i*theta)
For each tensor, sample up to 100,000 values and compute:
1. **Entropy** (information density):
- Histogram-based Shannon entropy
- Normalized to [0, 1] against maximum entropy
- High entropy (>0.95) = uniform = noise
- Moderate entropy (0.3-0.7) = structured information
2. **Kurtosis** (structure):
- Fourth standardized moment minus 3
- High absolute kurtosis = sharp peaks = organized structure
- Near-zero = Gaussian-like = less organization
3. **Scale coherence** (CV of sorted diffs):
- Sort sampled values, compute differences
- Coefficient of variation of these differences
- High CV = non-uniform spacing = structured signal
- Low CV = uniform spacing = noise
Combined score → theta in [0, 90] degrees.
## Step 3: Purification (organ_purify_v2.py)
Fractal signal extraction via Haar wavelets:
1. Pad tensor to power-of-2 length
2. Haar wavelet decomposition across N scales
3. At each scale: approximation + detail coefficients
4. Cross-scale coherence check:
- Compare energy at scale s with energy at scale 2s
- High coherence (pattern exists at both scales) = fractal = signal
- Low coherence (pattern at one scale only) = noise
5. Attenuate incoherent components (noise)
6. Reconstruct from coherent components (signal)
7. Restore original scale (mean/std preservation)
This directly implements dI/d(log s): information that persists across
logarithmic scales is the signal. Everything else is training artifact.
## Step 4: Grafting (organ_graft.py, transplant_935.py)
Two methods:
### Via .bin intermediaries (organ_graft.py)
1. Extract both source and target models to organ directories
2. Match tensors by layer number and type suffix
3. Verify dimensional compatibility
4. Copy matching .bin files from donor to recipient directory
5. Update manifest
### Direct GGUF-to-GGUF (transplant_935.py)
1. Parse both GGUF headers to get tensor name/offset/size maps
2. Copy base GGUF entirely as starting point
3. For each FFN tensor in base that has a matching donor tensor:
- Verify exact byte size match
- Seek to donor tensor data, read
- Seek to base tensor offset in output, overwrite
4. Result: valid GGUF with patched FFN layers
Direct method is faster and avoids header format issues.
## Step 5: Assembly (organ_assemble.py)
Reconstruct GGUF from organ directory:
1. Read manifest for metadata and tensor ordering
2. Write GGUF header (magic, version, n_tensors, n_metadata)
3. Write metadata key-value pairs
4. Write tensor info (name, dims, dtype, offset) with 32-byte alignment
5. Write tensor data with padding
6. Result: standard GGUF loadable by any compatible runtime
## Step 6: Validation
Run chimera through InferenceX:
- Load GGUF, validate all tensors
- Initialize transformer (attention, KV cache, kernel dispatch)
- Run inference with chat template
- Verify coherent output
## Key Finding
Graft success depends on architectural proximity:
- Same family (Qwen2 base) + same scale (14B) = coherent output
- Same family + different scale (7B) = PAD token failure
- The latent space alignment is implicit in shared training lineage
## Signature
935

116
docs/RESULTS.md Normal file
View File

@ -0,0 +1,116 @@
# Results
## Dissection — 13 Models
All models dissected from GGUF to organ .bin files on OASIS (EPYC 48c/503GB).
| Model | Params | Organs Dir | Size | Time |
|-------|--------|-----------|------|------|
| DeepSeek-R1-Distill-14B | 14B | 9,167 MB | 579 tensors | 22.9s |
| Qwen2.5-14B | 14B | 9,027 MB | 579 tensors | pre-existing |
| Gemma-2-9B | 9B | 5,984 MB | 464 tensors | 14.8s |
| Llama-3.1-8B | 8B | 4,950 MB | 292 tensors | 12.0s |
| Qwen2.5-7B | 7B | 4,812 MB | 339 tensors | pre-existing |
| DeepSeek-R1-Distill-7B | 7B | 4,812 MB | 339 tensors | 12.6s |
| DeepSeek-R1-7B | 7B | 4,812 MB | 339 tensors | pre-existing |
| Mistral-7B | 7B | 4,432 MB | 291 tensors | 10.6s |
| Phi-3.5-Mini | 3.8B | 2,397 MB | 197 tensors | 4.9s |
| Llama-3.2-3B | 3B | 2,100 MB | 255 tensors | 4.9s |
| Qwen2.5-3B | 3B | 2,003 MB | 434 tensors | 4.6s |
| Llama-3.2-1B | 1B | 856 MB | 147 tensors | 2.4s |
| SmolLM2-135M | 135M | 137 MB | 272 tensors | pre-existing |
**Total: 50.8 GB of extracted organs. 5,600+ tensors.**
## Z-Measure — Full Ranking
| # | Model | θ mean | Signal | Tensors | Architecture |
|---|-------|--------|--------|---------|-------------|
| ★ | Kimi K2.5 | 87.65° | 0.999 | 1,083 | DeepSeek2 MoE |
| 1 | SmolLM2-135M | 52.28° | 0.777 | 272 | LLaMA |
| 2 | DeepSeek-R1-14B | 46.01° | 0.641 | 579 | Qwen2 |
| 3 | Qwen2.5-3B | 46.00° | 0.640 | 434 | Qwen2 |
| 4 | Qwen2.5-14B | 45.98° | 0.640 | 579 | Qwen2 |
| 5 | Qwen2.5-7B | 45.64° | 0.639 | 339 | Qwen2 |
| 6 | Chimera-DSeek-Qwen | 45.53° | 0.637 | 339 | Qwen2 |
| 7 | DeepSeek-R1-Distill-7B | 45.53° | 0.637 | 339 | Qwen2 |
| 8 | DeepSeek-R1-7B | 45.42° | 0.636 | 339 | Qwen2 |
| 9 | Gemma-2-9B | 44.94° | 0.624 | 464 | Gemma |
| 10 | Phi-3.5-Mini | 44.65° | 0.626 | 197 | Phi |
| 11 | Llama-3.1-8B | 37.87° | 0.549 | 292 | LLaMA |
| 12 | Llama-3.2-1B | 37.57° | 0.550 | 147 | LLaMA |
| 13 | Llama-3.2-3B | 37.41° | 0.547 | 255 | LLaMA |
| 14 | Mistral-7B | 36.21° | 0.540 | 291 | Mistral |
### Organ Type Breakdown (per-model averages)
| Model | Skeleton θ | Organs θ | Embed θ | Norm θ |
|-------|-----------|---------|---------|--------|
| SmolLM2-135M | 53.6° | 52.3° | 47.2° | — |
| Qwen2.5-14B | 55.2° | 35.4° | 25.5° | — |
| Qwen2.5-7B | 54.6° | 35.5° | 25.9° | — |
| DeepSeek-R1-14B | 55.4° | 35.2° | 25.2° | — |
| Gemma-2-9B | 47.2° | 37.9° | 26.2° | 81.6° |
| Phi-3.5-Mini | 56.7° | 43.2° | 26.7° | — |
| Llama-3.1-8B | 39.7° | 39.1° | 26.0° | — |
| Mistral-7B | 38.4° | 36.8° | 26.0° | — |
**Pattern**: Skeleton (attention) consistently scores higher than organs (FFN).
Norm layers reach highest θ when measured separately (Gemma: 81.6°).
## Chimera Iterations
### 1. chimera-r1-qwen-7b-v2 — FAILED
- Base: DeepSeek-R1-Distill-Qwen-7B
- Donor: Qwen2.5-7B (FFN organs)
- Result: 512 PAD tokens. Latent spaces incompatible at 7B scale.
- Evidence: `evidence/chimera-7b-failed.log`
### 2. chimera-selective-v3 — CLEANED
- Selective graft attempt, removed during iteration.
### 3. model-935-v2 — READY
- Marked as viable intermediate.
### 4. model-935-v3, model-935-fractal — CLEANED
- Further iterations, removed during cleanup.
### 5. model-935-14b — SUCCESS
- Base: DeepSeek-R1-Distill-Qwen-14B (skeleton + embeddings)
- Donor: Qwen2.5-14B (FFN organs)
- 579 tensors, 8.4 GB, Qwen2 architecture
- **Produces coherent reasoning output**
- Evidence: `evidence/model-935-14b-inference.log`
Prompt: "Write a Python function called is_prime"
Output: Structured chain-of-thought reasoning. Correctly identifies prime number
definition, handles edge cases (n < 2), outlines algorithm steps. DeepSeek-R1
thinking style ("Okay, so the user wants me to...", "Hmm, let's see").
**This is a chimera assembled from two different models without any retraining
that produces coherent, structured, correct output.**
## Kimi K2.5 1T — Deep Z-Profile
Streaming Z-measure across 13 shards, 1,083 tensors measured.
| Component | Count | θ avg |
|-----------|-------|-------|
| FFN dense (blk.0) | 12 | 89.95° |
| MoE experts (384x) | 23 | 89.77° |
| Norm layers | 12 | 89.70° |
| Embedding | 1 | 89.45° |
| Shared expert | 23 | 89.43° |
| Attention (MLA) | 99 | 84.07° |
8 gravitational wells identified at lowest θ — points of maximum compression.
## Purification
SmolLM2-135M purified using fractal method (organ_purify_v2.py).
Output: `organs-pure/smollm2-135m/` (138 MB)
Manifest: `PURE_SMOLLM2`, 30 layers, 272 tensors.
## Signature
935

126
transplant_935.py Normal file
View File

@ -0,0 +1,126 @@
#!/usr/bin/env python3
"""
GGUF-to-GGUF transplant. No organ bins direct tensor copy between GGUF files.
Base: DeepSeek-R1-Distill-Qwen-7B (skeleton/attention/embed)
Donor: Qwen2.5-7B (FFN organs only)
Z = dI/d(log s) · exp() Signature 935
"""
import struct, os, sys, shutil
def parse_gguf_header(path):
"""Parse GGUF header, return tensor_info list and data_start offset."""
f = open(path, "rb")
magic = struct.unpack("<I", f.read(4))[0]
version = struct.unpack("<I", f.read(4))[0]
n_tensors = struct.unpack("<Q", f.read(8))[0]
n_metadata = struct.unpack("<Q", f.read(8))[0]
def read_string():
slen = struct.unpack("<Q", f.read(8))[0]
return f.read(slen).decode("utf-8")
def skip_value(vtype):
sizes = {0:1, 1:1, 2:2, 3:2, 4:4, 5:4, 6:4, 7:1, 10:8, 11:8, 12:8}
if vtype in sizes:
f.read(sizes[vtype])
elif vtype == 8:
read_string()
elif vtype == 9:
arr_type = struct.unpack("<I", f.read(4))[0]
arr_len = struct.unpack("<Q", f.read(8))[0]
for _ in range(arr_len):
skip_value(arr_type)
for _ in range(n_metadata):
read_string()
vtype = struct.unpack("<I", f.read(4))[0]
skip_value(vtype)
tensors = []
for _ in range(n_tensors):
name = read_string()
n_dims = struct.unpack("<I", f.read(4))[0]
dims = [struct.unpack("<Q", f.read(8))[0] for _ in range(n_dims)]
dtype = struct.unpack("<I", f.read(4))[0]
offset = struct.unpack("<Q", f.read(8))[0]
tensors.append({"name": name, "dims": dims, "dtype": dtype, "offset": offset})
pos = f.tell()
padding = (32 - (pos % 32)) % 32
f.read(padding)
data_start = f.tell()
f.seek(0, 2)
file_end = f.tell()
f.close()
# Calculate sizes
for i in range(len(tensors)):
if i + 1 < len(tensors):
tensors[i]["size"] = tensors[i+1]["offset"] - tensors[i]["offset"]
else:
tensors[i]["size"] = file_end - data_start - tensors[i]["offset"]
return tensors, data_start, file_end
BASE = "/mnt/models/DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf"
DONOR = "/mnt/models/Qwen2.5-7B-Instruct-Q4_K_M.gguf"
OUTPUT = "/mnt/models/model-935-final.gguf"
print("Parsing base (DeepSeek-R1-7B)...")
base_tensors, base_data_start, base_end = parse_gguf_header(BASE)
print(f" {len(base_tensors)} tensors, data_start={base_data_start}")
print("Parsing donor (Qwen2.5-7B)...")
donor_tensors, donor_data_start, donor_end = parse_gguf_header(DONOR)
print(f" {len(donor_tensors)} tensors, data_start={donor_data_start}")
# Build donor tensor map by name
donor_map = {t["name"]: t for t in donor_tensors}
# Copy base GGUF entirely first
print(f"Copying base to output...")
shutil.copy2(BASE, OUTPUT)
# Now patch: for each FFN tensor in base, if donor has matching name+size, overwrite
out = open(OUTPUT, "r+b")
donor_f = open(DONOR, "rb")
grafted = 0
skipped = 0
for bt in base_tensors:
name = bt["name"]
# Only graft FFN organs (not attention, not embeddings, not norms)
if "ffn_down" not in name and "ffn_up" not in name and "ffn_gate" not in name:
continue
if name in donor_map:
dt = donor_map[name]
if bt["size"] == dt["size"]:
# Read from donor
donor_f.seek(donor_data_start + dt["offset"])
data = donor_f.read(dt["size"])
# Write to output at same offset
out.seek(base_data_start + bt["offset"])
out.write(data)
grafted += 1
else:
skipped += 1
else:
skipped += 1
out.close()
donor_f.close()
print(f"\n{'='*60}")
print(f" MODEL 935 — DIRECT GGUF TRANSPLANT")
print(f"{'='*60}")
print(f" Base: DeepSeek-R1-Distill-Qwen-7B (skeleton+embed)")
print(f" Donor: Qwen2.5-7B-Instruct (FFN organs)")
print(f" Grafted: {grafted} FFN tensors")
print(f" Skipped: {skipped} (size mismatch or not found)")
print(f" Output: {OUTPUT}")
print(f" Size: {os.path.getsize(OUTPUT)/(1024**3):.2f} GB")
print(f" Signature: 935")
print(f"{'='*60}")

501
z_measure_report.json Normal file
View File

@ -0,0 +1,501 @@
{
"chimera-deepseek-qwen": {
"model": "chimera-deepseek-qwen",
"total_tensors": 339,
"avg_theta": 45.53097345132743,
"avg_signal": 0.6371591309220915,
"groups": {
"skeleton": {
"count": 196,
"avg_theta": 54.2,
"avg_signal": 0.727,
"best_theta": 84.0,
"best_name": "blk.0.attn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_k.weight"
},
"organs": {
"count": 112,
"avg_theta": 35.9,
"avg_signal": 0.538,
"best_theta": 84.0,
"best_name": "blk.0.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_gate.weight"
},
"embed": {
"count": 31,
"avg_theta": 25.9,
"avg_signal": 0.429,
"best_theta": 75.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
}
}
},
"deepseek-r1-14b": {
"model": "deepseek-r1-14b",
"total_tensors": 579,
"avg_theta": 46.01036269430051,
"avg_signal": 0.640550897397108,
"groups": {
"skeleton": {
"count": 336,
"avg_theta": 55.4,
"avg_signal": 0.736,
"best_theta": 84.0,
"best_name": "blk.0.attn_k.bias",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_k.weight"
},
"organs": {
"count": 192,
"avg_theta": 35.2,
"avg_signal": 0.532,
"best_theta": 84.0,
"best_name": "blk.0.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_down.weight"
},
"embed": {
"count": 51,
"avg_theta": 25.2,
"avg_signal": 0.42,
"best_theta": 75.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
}
}
},
"deepseek-r1-7b": {
"model": "deepseek-r1-7b",
"total_tensors": 339,
"avg_theta": 45.424778761061944,
"avg_signal": 0.6355319640555519,
"groups": {
"skeleton": {
"count": 196,
"avg_theta": 54.2,
"avg_signal": 0.727,
"best_theta": 84.0,
"best_name": "blk.0.attn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_k.weight"
},
"organs": {
"count": 112,
"avg_theta": 35.5,
"avg_signal": 0.533,
"best_theta": 84.0,
"best_name": "blk.0.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_gate.weight"
},
"embed": {
"count": 31,
"avg_theta": 25.9,
"avg_signal": 0.429,
"best_theta": 75.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
}
}
},
"deepseek-r1-distill-7b": {
"model": "deepseek-r1-distill-7b",
"total_tensors": 339,
"avg_theta": 45.53097345132743,
"avg_signal": 0.6371591309220915,
"groups": {
"skeleton": {
"count": 196,
"avg_theta": 54.2,
"avg_signal": 0.727,
"best_theta": 84.0,
"best_name": "blk.0.attn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_k.weight"
},
"organs": {
"count": 112,
"avg_theta": 35.9,
"avg_signal": 0.538,
"best_theta": 84.0,
"best_name": "blk.0.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_gate.weight"
},
"embed": {
"count": 31,
"avg_theta": 25.9,
"avg_signal": 0.429,
"best_theta": 75.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
}
}
},
"gemma2-9b": {
"model": "gemma2-9b",
"total_tensors": 464,
"avg_theta": 44.935344827586206,
"avg_signal": 0.6240438819131022,
"groups": {
"skeleton": {
"count": 210,
"avg_theta": 47.2,
"avg_signal": 0.649,
"best_theta": 84.0,
"best_name": "blk.0.post_attention_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_k.weight"
},
"organs": {
"count": 168,
"avg_theta": 37.9,
"avg_signal": 0.552,
"best_theta": 84.0,
"best_name": "blk.1.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_down.weight"
},
"embed": {
"count": 44,
"avg_theta": 26.2,
"avg_signal": 0.433,
"best_theta": 84.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
},
"norm": {
"count": 42,
"avg_theta": 81.6,
"avg_signal": 0.987,
"best_theta": 84.0,
"best_name": "blk.10.post_ffw_norm.weight",
"worst_theta": 75.0,
"worst_name": "blk.0.post_ffw_norm.weight"
}
}
},
"llama31-8b": {
"model": "llama31-8b",
"total_tensors": 292,
"avg_theta": 37.86986301369863,
"avg_signal": 0.5490538952939957,
"groups": {
"skeleton": {
"count": 128,
"avg_theta": 39.7,
"avg_signal": 0.569,
"best_theta": 84.0,
"best_name": "blk.10.attn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.10.attn_k.weight"
},
"organs": {
"count": 128,
"avg_theta": 39.1,
"avg_signal": 0.56,
"best_theta": 84.0,
"best_name": "blk.0.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_down.weight"
},
"embed": {
"count": 35,
"avg_theta": 26.0,
"avg_signal": 0.427,
"best_theta": 84.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
}
}
},
"llama32-1b": {
"model": "llama32-1b",
"total_tensors": 147,
"avg_theta": 37.57142857142857,
"avg_signal": 0.5497319048747188,
"groups": {
"skeleton": {
"count": 64,
"avg_theta": 39.3,
"avg_signal": 0.57,
"best_theta": 84.0,
"best_name": "blk.10.attn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_q.weight"
},
"organs": {
"count": 64,
"avg_theta": 38.3,
"avg_signal": 0.553,
"best_theta": 84.0,
"best_name": "blk.10.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_down.weight"
},
"embed": {
"count": 18,
"avg_theta": 27.3,
"avg_signal": 0.445,
"best_theta": 75.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
}
}
},
"llama32-3b": {
"model": "llama32-3b",
"total_tensors": 255,
"avg_theta": 37.411764705882355,
"avg_signal": 0.546769292896037,
"groups": {
"skeleton": {
"count": 112,
"avg_theta": 39.4,
"avg_signal": 0.569,
"best_theta": 84.0,
"best_name": "blk.0.attn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_k.weight"
},
"organs": {
"count": 112,
"avg_theta": 38.0,
"avg_signal": 0.55,
"best_theta": 84.0,
"best_name": "blk.13.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_down.weight"
},
"embed": {
"count": 30,
"avg_theta": 26.6,
"avg_signal": 0.439,
"best_theta": 75.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
}
}
},
"mistral-7b": {
"model": "mistral-7b",
"total_tensors": 291,
"avg_theta": 36.20618556701031,
"avg_signal": 0.539809742436977,
"groups": {
"skeleton": {
"count": 128,
"avg_theta": 38.4,
"avg_signal": 0.567,
"best_theta": 84.0,
"best_name": "blk.0.attn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.10.attn_k.weight"
},
"organs": {
"count": 128,
"avg_theta": 36.8,
"avg_signal": 0.544,
"best_theta": 84.0,
"best_name": "blk.0.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_down.weight"
},
"embed": {
"count": 35,
"avg_theta": 26.0,
"avg_signal": 0.427,
"best_theta": 84.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
}
}
},
"phi35-mini": {
"model": "phi35-mini",
"total_tensors": 197,
"avg_theta": 44.6497461928934,
"avg_signal": 0.6262773662109529,
"groups": {
"skeleton": {
"count": 64,
"avg_theta": 56.7,
"avg_signal": 0.764,
"best_theta": 84.0,
"best_name": "blk.10.attn_norm.weight",
"worst_theta": 33.0,
"worst_name": "blk.0.attn_qkv.weight"
},
"organs": {
"count": 96,
"avg_theta": 43.2,
"avg_signal": 0.601,
"best_theta": 84.0,
"best_name": "blk.0.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_down.weight"
},
"embed": {
"count": 35,
"avg_theta": 26.7,
"avg_signal": 0.439,
"best_theta": 84.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
}
}
},
"qwen25-14b": {
"model": "qwen25-14b",
"total_tensors": 579,
"avg_theta": 45.98445595854922,
"avg_signal": 0.6402458335664142,
"groups": {
"skeleton": {
"count": 336,
"avg_theta": 55.2,
"avg_signal": 0.734,
"best_theta": 84.0,
"best_name": "blk.0.attn_k.bias",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_k.weight"
},
"organs": {
"count": 192,
"avg_theta": 35.4,
"avg_signal": 0.534,
"best_theta": 84.0,
"best_name": "blk.0.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_down.weight"
},
"embed": {
"count": 51,
"avg_theta": 25.5,
"avg_signal": 0.424,
"best_theta": 84.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
}
}
},
"qwen25-3b": {
"model": "qwen25-3b",
"total_tensors": 434,
"avg_theta": 46.00230414746544,
"avg_signal": 0.6401608443093786,
"groups": {
"skeleton": {
"count": 252,
"avg_theta": 55.6,
"avg_signal": 0.736,
"best_theta": 84.0,
"best_name": "blk.0.attn_k.bias",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_k.weight"
},
"organs": {
"count": 144,
"avg_theta": 34.5,
"avg_signal": 0.529,
"best_theta": 84.0,
"best_name": "blk.10.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_down.weight"
},
"embed": {
"count": 38,
"avg_theta": 25.8,
"avg_signal": 0.426,
"best_theta": 84.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
}
}
},
"qwen25-7b": {
"model": "qwen25-7b",
"total_tensors": 339,
"avg_theta": 45.637168141592916,
"avg_signal": 0.6387682956137819,
"groups": {
"skeleton": {
"count": 196,
"avg_theta": 54.6,
"avg_signal": 0.731,
"best_theta": 84.0,
"best_name": "blk.0.attn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_k.weight"
},
"organs": {
"count": 112,
"avg_theta": 35.5,
"avg_signal": 0.536,
"best_theta": 84.0,
"best_name": "blk.0.ffn_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.ffn_gate.weight"
},
"embed": {
"count": 31,
"avg_theta": 25.9,
"avg_signal": 0.429,
"best_theta": 75.0,
"best_name": "output_norm.weight",
"worst_theta": 24.0,
"worst_name": "blk.0.attn_output.weight"
}
}
},
"smollm2-135m": {
"model": "smollm2-135m",
"total_tensors": 272,
"avg_theta": 52.27941176470588,
"avg_signal": 0.7765030923203783,
"groups": {
"skeleton": {
"count": 120,
"avg_theta": 53.6,
"avg_signal": 0.79,
"best_theta": 84.0,
"best_name": "blk.0.attn_norm.weight",
"worst_theta": 42.0,
"worst_name": "blk.10.attn_k.weight"
},
"organs": {
"count": 120,
"avg_theta": 52.3,
"avg_signal": 0.777,
"best_theta": 84.0,
"best_name": "blk.0.ffn_norm.weight",
"worst_theta": 42.0,
"worst_name": "blk.11.ffn_up.weight"
},
"embed": {
"count": 32,
"avg_theta": 47.2,
"avg_signal": 0.725,
"best_theta": 84.0,
"best_name": "output_norm.weight",
"worst_theta": 33.0,
"worst_name": "blk.13.attn_output.weight"
}
}
}
}

File diff suppressed because it is too large Load Diff