Compare commits
No commits in common. "master" and "v1.0.0" have entirely different histories.
3
.github/FUNDING.yml
vendored
3
.github/FUNDING.yml
vendored
@ -1,5 +1,6 @@
|
||||
# Inference-X — Universal Inference Protocol
|
||||
# Free for individuals, researchers, and small teams.
|
||||
# Your support funds development and server infrastructure.
|
||||
# Your support funds development, servers, and solar inference research.
|
||||
|
||||
github: ElmadaniS
|
||||
custom: ["https://paypal.me/ELMADANISALKA"]
|
||||
|
||||
@ -51,6 +51,7 @@ Every design decision serves two goals: route intelligence to any hardware, and
|
||||
|---|---|---|
|
||||
| `infer.cpp` | ~570 | Entry point, CLI, mode dispatch |
|
||||
| `runtime/server.h` | ~530 | OpenAI-compatible HTTP API, SSE streaming |
|
||||
| `runtime/fractal.h` | ~320 | Dynamic precision per layer (fractal inference) |
|
||||
| `runtime/identity.h` | ~160 | Cryptographic authorship, 4-layer protection |
|
||||
|
||||
### Compute Layer
|
||||
@ -102,6 +103,7 @@ Expert mmap loads only active experts via memory-mapped files with predictive pr
|
||||
|
||||
Result: 48× I/O reduction for trillion-parameter models. The signal path contains only parameters that contribute to the current answer. Nothing else exists in memory.
|
||||
|
||||
### Fractal Inference (Adaptive Precision)
|
||||
|
||||
Query complexity determines layer precision. Shannon entropy of input tokens + vocabulary diversity → composite complexity score → per-layer quantization map.
|
||||
|
||||
|
||||
@ -2,7 +2,7 @@
|
||||
|
||||
## Creator & Lead Developer
|
||||
- **Salka Elmadani** — Architecture, implementation, and all original code
|
||||
- Git: [@elmadani](https://git.inference-x.com/elmadani)
|
||||
- GitHub: [@ElmadaniS](https://github.com/ElmadaniS)
|
||||
- Email: Elmadani.SALKA@proton.me
|
||||
|
||||
## Infrastructure Partners
|
||||
|
||||
1
Makefile
1
Makefile
@ -48,6 +48,7 @@ endif
|
||||
# 2. Sets IX_USE_* define
|
||||
# 3. Adds the backend .c/.cpp to BACKEND_OBJS
|
||||
# 4. Adds SDK-specific link flags
|
||||
#
|
||||
# Without SDK → nothing happens. Zero noise.
|
||||
# ──────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
2
NOTICE
2
NOTICE
@ -15,7 +15,7 @@ AUTHOR
|
||||
Location: Morocco
|
||||
Contact: Elmadani.SALKA@proton.me
|
||||
Website: https://inference-x.com
|
||||
Repository: https://git.inference-x.com/salka/inference-x
|
||||
Repository: https://github.com/ElmadaniS/inference-x
|
||||
Origin: Morocco 🇲🇦
|
||||
|
||||
────────────────────────────────────────────────────────────────
|
||||
|
||||
302
README.md
302
README.md
@ -1,196 +1,262 @@
|
||||
# Inference-X
|
||||
|
||||
[](https://github.com/ElmadaniS/inference-x/actions/workflows/build.yml)
|
||||
[](https://github.com/ElmadaniS/inference-x/releases)
|
||||
[](LICENSE)
|
||||
[](TECHNOLOGY.md)
|
||||
[](ARCHITECTURE.md)
|
||||
|
||||
**Run AI on your own computer. Private. Free. No internet.**
|
||||
**Better output from the same model.**
|
||||
|
||||
Inference-X is a tiny file (305 KB) that lets any computer run AI models locally. It works on old laptops, phones, Raspberry Pi, and datacenters — same file, no setup. Your questions stay on your machine. Nobody sees them.
|
||||
One binary routes any AI model to any hardware — from a microcontroller to a datacenter. Fused computation, adaptive precision, surgical expert loading. No dependencies. No framework. No vendor lock-in.
|
||||
|
||||
**[Website](https://inference-x.com)** · **[How it works](TECHNOLOGY.md)** · **[Benchmarks](BENCHMARKS.md)** · **[Vision](VISION.md)** · **[Sponsor](SPONSOR.md)**
|
||||
305 KB. 19 hardware backends. Any model. Any scale.
|
||||
|
||||
---
|
||||
Built in Morocco by [Salka Elmadani](https://x.com/ElmadaniSa13111).
|
||||
|
||||
## Start in 30 seconds
|
||||
> *In the Anti-Atlas, our ancestors built khettaras — underground water channels that deliver pure water to villages without pumps, without electricity, without filtration. The water arrives cleaner than any treated supply because the path itself is the filter. Inference-X works the same way: the shortest path produces the cleanest signal.*
|
||||
|
||||
```bash
|
||||
git clone https://git.inference-x.com/salka/inference-x
|
||||
cd inference-x && make
|
||||
./inference-x model.gguf
|
||||
```
|
||||
|
||||
That's it. Download a `.gguf` model from [HuggingFace](https://huggingface.co/models?sort=trending&search=gguf), run the command, talk to AI. No account. No API key. No internet.
|
||||
|
||||
Add `--serve 8080` to get a web interface at `localhost:8080`.
|
||||
|
||||
---
|
||||
|
||||
## What can your computer run?
|
||||
|
||||
| Your RAM | Models you can run | What it can do |
|
||||
|---|---|---|
|
||||
| **2 GB** | SmolLM2 135M | Simple assistant, quick answers |
|
||||
| **4 GB** | Phi-3 Mini 3.8B, Llama 3.2 3B | Smart conversations, code help, translations |
|
||||
| **8 GB** | Mistral 7B, Llama 3.1 8B | Creative writing, analysis, reasoning |
|
||||
| **16 GB** | DeepSeek R1 14B | Advanced reasoning, expert-level answers |
|
||||
| **32 GB** | Qwen 2.5 32B | Professional-grade AI |
|
||||
| **64 GB** | Llama 3.1 70B, DeepSeek V3 MoE | Frontier performance, locally |
|
||||
|
||||
Every model runs privately, offline, with no subscription.
|
||||
|
||||
---
|
||||
|
||||
## Why local AI matters
|
||||
|
||||
When you use AI online, your words travel to a server in another country. Someone can read them. You pay per word. The service can shut down.
|
||||
|
||||
With Inference-X, your questions stay on your desk. The answer is computed by your own processor. Nothing leaves. Nothing is stored. It works without internet. It's free forever.
|
||||
**[Website](https://inference-x.com)** · **[How it works](TECHNOLOGY.md)** · **[Benchmarks](BENCHMARKS.md)** · **[Vision](VISION.md)** · **[Sponsor](https://github.com/sponsors/ElmadaniS)**
|
||||
|
||||
---
|
||||
|
||||
## What makes it different
|
||||
|
||||
Most inference engines add layers between the model and the hardware: frameworks, runtime allocators, intermediate buffers. Each layer degrades the model's signal.
|
||||
Most inference engines add layers between the model and the hardware: frameworks, runtime allocators, intermediate buffers, uniform precision pipelines. Each layer adds computational overhead that degrades the model's original signal.
|
||||
|
||||
Inference-X removes those layers.
|
||||
|
||||
**Fused computation** — Dequantization and matrix multiply happen in a single instruction loop. No intermediate FP32 buffer. Output closer to the model's theoretical maximum.
|
||||
**Fused computation** — Dequantization and matrix multiply happen in a single instruction loop. No intermediate FP32 buffer. Fewer rounding operations means output closer to the model's theoretical FP32 maximum.
|
||||
|
||||
**Adaptive precision** — Each query is analyzed before inference. Simple questions get compressed early layers and full-precision decision layers. Complex reasoning gets full precision throughout.
|
||||
**Adaptive precision** — Each query is analyzed before inference. Simple questions get compressed early layers and full-precision decision layers. Complex reasoning gets full precision throughout. The model adapts its depth to the question — same file, same binary, different computational path.
|
||||
|
||||
**Surgical expert loading** — For Mixture-of-Experts models, only active experts exist in memory. A 1-trillion-parameter model runs on 64 GB of RAM.
|
||||
**Surgical expert loading** — For Mixture-of-Experts models, only active experts exist in memory. Inactive experts are evicted at the OS level. Result: a 1-trillion-parameter model runs on 17 GB of RAM. The signal path contains only what contributes to the current token.
|
||||
|
||||
The result: **the same model produces better output through a cleaner computation path.** A smaller model through Inference-X can match a larger model through a conventional engine.
|
||||
The result: **the same model produces higher-fidelity output through a cleaner computation path.** Or equivalently: a smaller model through Inference-X can match a larger model through a conventional engine.
|
||||
|
||||
→ [Full technical explanation](TECHNOLOGY.md)
|
||||
|
||||
---
|
||||
|
||||
## How it works
|
||||
## What it is
|
||||
|
||||
TCP/IP routes data packets to any network. Inference-X routes intelligence to any silicon.
|
||||
TCP/IP routes data packets to any network, any hardware, any destination. The protocol doesn't care about the wire.
|
||||
|
||||
One function call enters `kernel_dispatch.h`. On the other side: CPU, GPU, TPU, LPU, IPU, FPGA, DSP, or WSE. The model runs. The answer comes back.
|
||||
Inference-X routes intelligence to any silicon. The protocol doesn't care about the chip.
|
||||
|
||||
One function call enters `kernel_dispatch.h`. On the other side: CPU, GPU, TPU, LPU, IPU, FPGA, DSP, or WSE. The caller doesn't know. Doesn't need to. The model runs. The answer comes back.
|
||||
|
||||
```
|
||||
Model (any GGUF) → Inference-X (305 KB) → Silicon (any of 19 backends) → Response
|
||||
```
|
||||
|
||||
```
|
||||
Architecture:
|
||||
infer.cpp (570 lines) — Orchestrator. Chat templates. Server mode.
|
||||
transformer_v6.h — Forward pass. Dense + MoE + MLA unified.
|
||||
kernel_dispatch.h — Routes GEMM to the right silicon.
|
||||
moe_mla.h — Expert selection. Prefetch. Eviction.
|
||||
gemm.h — Fused dequant × matmul kernels.
|
||||
backends.h — 19 hardware targets. One interface.
|
||||
The model describes itself. The engine reads the description. The engine never assumes.
|
||||
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
git clone https://github.com/ElmadaniS/inference-x
|
||||
cd inference-x
|
||||
make
|
||||
|
||||
# Download a model (any GGUF from Hugging Face)
|
||||
./inference-x model.gguf -p "Hello, world"
|
||||
```
|
||||
|
||||
12,571 lines of C++17. 6 architectures (Llama, Qwen2, Gemma2, Phi, DeepSeek MoE, MLA). 23 quantization formats. One binary.
|
||||
That's it. One binary. One command. Any model.
|
||||
|
||||
|
||||
## Why it matters
|
||||
|
||||
Running a model today requires choosing a stack: CUDA for NVIDIA, ROCm for AMD, Metal for Apple, TensorRT for serving, vLLM for throughput, Ollama for local. Each stack locks you to a vendor, a way of thinking, and adds its own computational overhead between the model and the result.
|
||||
|
||||
Inference-X eliminates the stack. There is no stack. There's a model file, a binary, and your hardware — whatever it is.
|
||||
|
||||
```
|
||||
GPU cluster: 1T parameters on 8× H100 ~5.6 kW, $200,000+/year
|
||||
Inference-X: 1T parameters on 256 GB RAM ~300 W, €4,800/year
|
||||
|
||||
Same model. Cleaner output. 97% less cost.
|
||||
```
|
||||
|
||||
This isn't about replacing GPUs. It's about making the choice of silicon irrelevant to the act of thinking — and getting *better* results from the silicon you already have.
|
||||
|
||||
|
||||
## Who is this for
|
||||
|
||||
**Every organization that runs AI models — or wants to.**
|
||||
|
||||
| Sector | Problem | What IX does |
|
||||
|--------|---------|-------------|
|
||||
| **Healthcare** | Patient data can't leave the hospital. Cloud inference = compliance risk. | Air-gapped inference on hospital hardware. Zero network calls. HIPAA/GDPR by architecture. |
|
||||
| **Defense & Government** | Sovereign AI requires sovereign infrastructure. | Runs on government-owned hardware. No vendor dependency. No telemetry. Auditable source. |
|
||||
| **Finance** | Trading models need low latency and full auditability. | On-premise inference, deterministic output, no external calls. |
|
||||
| **Telecom** | Edge inference at cell towers for real-time processing. | 305 KB binary deploys on edge hardware. Adaptive precision matches available power. |
|
||||
| **Automotive** | In-vehicle AI needs minimal footprint and guaranteed response. | Runs on ARM/Snapdragon. No framework overhead. Fits in L2 cache. |
|
||||
| **Startups** | GPU costs eat runway. $200K/year for inference infrastructure. | Same model quality at 97% lower cost. CPU-only. Scale when you're ready. |
|
||||
| **Enterprise** | Vendor lock-in across NVIDIA, AMD, Intel, cloud providers. | 19 backends. One binary. Switch hardware without changing code. |
|
||||
| **Research & Education** | Limited compute budgets. Students can't afford H100s. | Free under BSL-1.1. Run 14B models on a €20/month server. |
|
||||
| **Embedded / IoT** | AI on microcontrollers with KB-level memory budgets. | Compiles for ESP32. Surgical loading keeps memory minimal. |
|
||||
| **Cloud Providers** | Offering inference services at competitive margins. | Higher output quality per compute dollar. 19 backends = any customer hardware. |
|
||||
|
||||
Inference-X has zero friction with existing infrastructure. It doesn't replace your hardware — it makes your hardware work better.
|
||||
|
||||
|
||||
## Get started
|
||||
|
||||
```bash
|
||||
# Build (30 seconds)
|
||||
git clone https://github.com/ElmadaniS/inference-x.git
|
||||
cd inference-x && make -j$(nproc)
|
||||
|
||||
# Chat with any GGUF model
|
||||
./inference-x model.gguf -i
|
||||
|
||||
# Or start a web interface
|
||||
python3 web/ix_server.py
|
||||
|
||||
# Or run as an OpenAI-compatible API
|
||||
./inference-x model.gguf --serve --port 8080
|
||||
```
|
||||
|
||||
Three commands. No dependencies. No Docker. No Python packages. No GPU drivers. Just `make` and run.
|
||||
|
||||
---
|
||||
|
||||
## Benchmarks
|
||||
|
||||
AMD EPYC Rome · 17 GB RAM · 6 cores · CPU-only · €20/month server
|
||||
Real numbers on a €20/month AMD EPYC server. CPU-only. No GPU. Cold start.
|
||||
|
||||
| Model | Params | Quant | tok/s | Prefill |
|
||||
|---|---|---|---|---|
|
||||
| SmolLM2 | 135M | Q8_0 | **130.23** | 87 ms |
|
||||
| Qwen 2.5 | 3B | Q4_K_M | **3.85** | 16.5 s |
|
||||
| Llama 3.2 | 3B | Q4_K_M | **3.82** | 3.8 s |
|
||||
| Mistral 7B | 7B | Q4_K_M | **2.06** | 39.2 s |
|
||||
| Llama 3.1 | 8B | Q4_K_M | **1.75** | 43.0 s |
|
||||
| DeepSeek R1 | 14B | Q4_K_M | **0.97** | 74.1 s |
|
||||
| Model | Params | Quant | tok/s |
|
||||
|-------|--------|-------|-------|
|
||||
| SmolLM2 | 135M | Q8_0 | **130.23** |
|
||||
| Llama 3.2 | 3B | Q4_K_M | **3.82** |
|
||||
| Qwen 2.5 | 3B | Q4_K_M | **3.85** |
|
||||
| Mistral 7B | 7B | Q4_K_M | **2.06** |
|
||||
| Qwen 2.5 | 7B | Q4_K_M | **1.82** |
|
||||
| Llama 3.1 | 8B | Q4_K_M | **1.75** |
|
||||
| Gemma 2 | 9B | Q4_K_M | **1.28** |
|
||||
| DS-R1 Qwen | 14B | Q4_K_M | **0.97** |
|
||||
|
||||
9 models · 4 architectures · Same binary · Zero configuration
|
||||
9/10 architectures passing. Chat templates auto-detected. Zero manual configuration.
|
||||
|
||||
→ [Full benchmarks](BENCHMARKS.md)
|
||||
→ [Full benchmark details](BENCHMARKS.md)
|
||||
|
||||
---
|
||||
|
||||
## Supported Hardware
|
||||
|
||||
| Backend | Target | Status |
|
||||
|---|---|---|
|
||||
| CPU AVX2/512 | Intel, AMD | ✅ Production |
|
||||
| Backend | Silicon | Status |
|
||||
|---------|---------|--------|
|
||||
| CPU (AVX2/AVX-512) | Intel, AMD | ✅ Production |
|
||||
| CUDA | NVIDIA GPU | ✅ Production |
|
||||
| ROCm | AMD GPU | ✅ Production |
|
||||
| Metal | Apple Silicon | ✅ Production |
|
||||
| Vulkan | Cross-platform | ✅ Production |
|
||||
| ARM NEON | ARM (Pi, phones) | ✅ Production |
|
||||
| Snapdragon | Qualcomm | 🔶 Ready |
|
||||
| Hexagon HVX | Qualcomm DSP | 🔶 Ready |
|
||||
| TPU | Google | 🔶 Ready |
|
||||
| Inferentia | AWS | 🔶 Ready |
|
||||
| Gaudi | Intel HPU | 🔶 Ready |
|
||||
| Maia | Microsoft | 🔶 Ready |
|
||||
| SambaNova RDU | SambaNova | 🔶 Ready |
|
||||
| Graphcore IPU | Graphcore | 🔶 Ready |
|
||||
| Groq LPU | Groq | 🔶 Ready |
|
||||
| Cerebras WSE | 850K cores | 🔶 Ready |
|
||||
| FPGA | Xilinx | 🔶 Ready |
|
||||
| WebGPU | Browser | 🔶 Ready |
|
||||
| OpenCL | Universal | 🔶 Ready |
|
||||
| Vulkan | Cross-platform GPU | ✅ Production |
|
||||
| ARM NEON | ARM processors | ✅ Production |
|
||||
| Snapdragon | Qualcomm (GPU+DSP+NEON) | 🔧 Ready |
|
||||
| Hexagon HVX | Qualcomm DSP | 🔧 Ready |
|
||||
| OpenCL | Cross-platform | 🔧 Ready |
|
||||
| WebGPU | Browser | 🔧 Ready |
|
||||
| TPU | Google | 🔧 Ready |
|
||||
| Inferentia | AWS | 🔧 Ready |
|
||||
| Gaudi | Intel HPU | 🔧 Ready |
|
||||
| Maia | Microsoft | 🔧 Ready |
|
||||
| SambaNova RDU | SambaNova | 🔧 Ready |
|
||||
| Graphcore IPU | Graphcore | 🔧 Ready |
|
||||
| Groq LPU | Groq | 🔧 Ready |
|
||||
| FPGA (Xilinx) | Xilinx | 🔧 Ready |
|
||||
| Cerebras WSE | Cerebras | 🔧 Ready |
|
||||
|
||||
The Makefile detects your hardware. You don't configure it.
|
||||
|
||||
---
|
||||
## Architecture
|
||||
|
||||
```
|
||||
infer.cpp ← Entry point (571 lines)
|
||||
├── runtime/
|
||||
│ ├── gguf.h ← GGUF parser + config extraction
|
||||
│ ├── tokenizer.h ← Tokenizer with byte-level BPE
|
||||
│ ├── transformer_v6.h ← Universal forward pass
|
||||
│ ├── attention.h ← GQA attention
|
||||
│ ├── moe_mla.h ← MoE + MLA (DeepSeek V3)
|
||||
│ ├── gemm.h ← Fused GEMV kernels
|
||||
│ ├── kernels.h ← RMS norm, softmax, RoPE, SiLU
|
||||
│ ├── kernel_dispatch.h ← Hardware routing layer
|
||||
│ ├── server.h ← OpenAI-compatible API server
|
||||
│ └── ...
|
||||
├── core/
|
||||
│ ├── iq_tables.h ← IQ quantization lookup tables
|
||||
│ └── z_core.h ← Mathematical foundation
|
||||
└── backends/
|
||||
└── q4_kernels/ ← Per-hardware kernel implementations
|
||||
```
|
||||
|
||||
One forward pass handles: dense transformers, Mixture-of-Experts, Multi-head Latent Attention, grouped-query attention, fused QKV tensors, and every combination.
|
||||
|
||||
→ [Detailed architecture](ARCHITECTURE.md) · [How the technology works](TECHNOLOGY.md)
|
||||
|
||||
|
||||
## Features
|
||||
|
||||
- **Higher fidelity output** — Fused dequant+dot kernels eliminate intermediate buffers. Fewer rounding operations = output closer to the model's FP32 theoretical maximum.
|
||||
- **Adaptive precision** — Shannon entropy analysis determines per-layer quantization. Simple queries run faster. Complex reasoning gets full depth. The model breathes.
|
||||
- **Surgical expert loading** — MoE models load only active experts. 48× I/O reduction. Clean signal path with zero interference from unused parameters.
|
||||
- **Universal model support** — LLAMA, QWEN2, PHI3, GEMMA2, DEEPSEEK, KIMI. Dense and MoE. The model changes, the protocol doesn't.
|
||||
- **23 native quantization formats** — Q2_K through FP32. No format conversion. The engine speaks the model's native dialect.
|
||||
- **19 hardware backends** — CPU, GPU, TPU, LPU, IPU, FPGA, DSP, WSE. One binary, every silicon.
|
||||
- **305 KB binary** — Fits in L2 cache. The engine is invisible. You hear the model, not the framework.
|
||||
- **Auto chat template** — ChatML, Llama 3, Mistral, Gemma, Phi-3, Kimi. Detected from GGUF metadata. Zero configuration.
|
||||
- **OpenAI-compatible API** — `./inference-x model.gguf --serve` gives you `/v1/chat/completions`. Drop-in replacement.
|
||||
- **Web interface** — Built-in chat UI. `python3 web/ix_server.py` and open your browser.
|
||||
|
||||
|
||||
## API Server
|
||||
|
||||
Start with `--serve 8080`. OpenAI-compatible API. Any client library works.
|
||||
```bash
|
||||
./inference-x model.gguf --serve --port 8080
|
||||
```
|
||||
|
||||
Drop-in replacement for OpenAI:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
client = OpenAI(base_url="http://localhost:8080/v1", api_key="none")
|
||||
resp = client.chat.completions.create(
|
||||
response = client.chat.completions.create(
|
||||
model="local",
|
||||
messages=[{"role": "user", "content": "Hello!"}],
|
||||
stream=True
|
||||
messages=[{"role": "user", "content": "Hello"}]
|
||||
)
|
||||
```
|
||||
|
||||
Endpoints: `POST /v1/chat/completions` · `POST /v1/completions` · `GET /v1/models` · `GET /health`
|
||||
|
||||
---
|
||||
|
||||
## Features
|
||||
|
||||
- **Universal GGUF** — Any model, any architecture, auto-detected from metadata
|
||||
- **Chat templates** — 7 formats auto-detected (Llama, ChatML, Alpaca, Gemma, Phi, Mistral, DeepSeek)
|
||||
- **Multi-EOS** — Correct stop tokens for every architecture
|
||||
- **Server mode** — OpenAI-compatible API, streaming, health check
|
||||
- **Air-gapped** — No network calls during inference. No telemetry. Ever.
|
||||
- **Zero configuration** — Download a model, run it. Templates, tokens, architecture: auto.
|
||||
|
||||
---
|
||||
|
||||
## Contributing
|
||||
|
||||
See [CONTRIBUTING.md](CONTRIBUTING.md). Run `make` to build. Run `make test` to test. Submit a PR.
|
||||
We welcome contributions:
|
||||
|
||||
We welcome contributions from everyone, regardless of experience level. If you're new to open source, look for issues tagged `good first issue`.
|
||||
- **Backends** — Port kernel implementations to new hardware
|
||||
- **Models** — Add new architectures and quantization formats
|
||||
- **Benchmarks** — Run benchmarks on diverse hardware
|
||||
- **Documentation** — Tutorials, guides, translations
|
||||
|
||||
See [CONTRIBUTING.md](CONTRIBUTING.md) for details.
|
||||
|
||||
---
|
||||
|
||||
## License
|
||||
|
||||
[BSL-1.1](LICENSE) — Business Source License
|
||||
[Business Source License 1.1](LICENSE) — Free for individuals, researchers, and small teams. Commercial use requires a license. Converts to open source in 2030.
|
||||
|
||||
**Free for**: individuals, researchers, students, open-source projects, organizations under $1M revenue.
|
||||
See [NOTICE](NOTICE) for full terms.
|
||||
|
||||
**Change date**: February 12, 2030 → [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
||||
|
||||
After 2030, everything becomes fully open source. Patents remain protected.
|
||||
|
||||
---
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
Built in Morocco for the world by [Salka Elmadani](https://x.com/ElmadaniSa13111).
|
||||
- **[Infomaniak](https://infomaniak.com)** — Swiss hosting partner
|
||||
- **[Hetzner](https://hetzner.com)** — High-performance compute
|
||||
|
||||
> *The shortest path between model weights and output produces the cleanest signal. Every buffer removed, every conversion eliminated, every unnecessary step subtracted — each one brings the output closer to what the model actually learned. The path itself is the filter.*
|
||||
---
|
||||
|
||||
**[Website](https://inference-x.com)** · **[Sponsor](SPONSOR.md)** · **[Contact](mailto:Elmadani.SALKA@proton.me)**
|
||||
<p align="center">
|
||||
<a href="https://inference-x.com">inference-x.com</a> ·
|
||||
<a href="https://x.com/ElmadaniSa13111">@ElmadaniSa13111</a> ·
|
||||
<a href="https://github.com/sponsors/ElmadaniS">Sponsor</a>
|
||||
<br><br>
|
||||
<em>Built in Morocco for the world.</em>
|
||||
</p>
|
||||
|
||||
123
SPONSOR.md
123
SPONSOR.md
@ -1,123 +0,0 @@
|
||||
# Salka Elmadani — Building Inference-X
|
||||
|
||||
> *The best engine is the one you don't notice.*
|
||||
> *You should hear the model, not the framework.*
|
||||
|
||||
---
|
||||
|
||||
|
||||
I build AI infrastructure. Not products, not demos, not wrappers around someone else's API. Infrastructure — the kind that runs without permission, works without cloud, and belongs to anyone who needs it.
|
||||
|
||||
**Inference-X** is a 305 KB binary that runs any AI model on any hardware. No framework. No internet. No account. Download a model, run it, talk to it. That's it.
|
||||
|
||||
I built it alone. I'm still building it alone. This page is why.
|
||||
|
||||
---
|
||||
|
||||
## What I'm building
|
||||
|
||||
The problem isn't the models. The models are extraordinary. The problem is the layer between the weights and the human — the inference stack. It's bloated, cloud-dependent, and controlled by a handful of companies.
|
||||
|
||||
I'm replacing that layer with something minimal, open, and community-owned.
|
||||
|
||||
```
|
||||
Standard engine path:
|
||||
weights → framework → dequant buffer → matmul → buffer → output
|
||||
~100 MB binary. 5 steps. Rounding errors at each boundary.
|
||||
|
||||
Inference-X:
|
||||
weights → fused dequant+dot → output
|
||||
305 KB binary. 2 steps. Zero buffer. Zero noise.
|
||||
```
|
||||
|
||||
Same model. Cleaner signal. Every unnecessary step removed.
|
||||
|
||||
---
|
||||
|
||||
## The ecosystem
|
||||
|
||||
| Project | What it does | Status |
|
||||
|---------|-------------|--------|
|
||||
| **[inference-x](https://git.inference-x.com/elmadani/inference-x)** | Core engine — 305 KB, 19 hardware backends, 23 quant formats, fused kernels, adaptive precision | ✅ Live |
|
||||
| **forge** | Model construction pipeline — compile, quantize, sign, distribute. Build your own model variant from certified organs. | 🔨 Building |
|
||||
| **[echo-ix](https://git.inference-x.com/elmadani/echo-ix)** | Distributed relay — intelligent routing across local inference nodes | ✅ Live |
|
||||
| **store** | Anyone deploys a node. Anyone earns from their compute. The cooperative layer. 11 geological cratons. One network. | 📐 Designed |
|
||||
|
||||
The store is the endgame: a peer-to-peer inference network where anyone with a laptop can become infrastructure. No data center required.
|
||||
|
||||
---
|
||||
|
||||
|
||||
|
||||
|
||||
The intelligence already exists in the model weights. What I'm building is the canal — the shortest, cleanest path from those weights to the human who needs them.
|
||||
|
||||
---
|
||||
|
||||
## Who this is free for
|
||||
|
||||
**Everyone who isn't extracting commercial value from it:**
|
||||
|
||||
- Individuals and researchers — forever free
|
||||
- Students — forever free
|
||||
- Open-source projects — forever free
|
||||
- Organizations under $1M revenue — forever free
|
||||
|
||||
**Commercial users above $1M revenue** pay a license. 20% of that flows back to the community that built the infrastructure.
|
||||
|
||||
In 2030, it all becomes Apache 2.0. Everything open. The canal belongs to everyone.
|
||||
|
||||
This isn't charity. It's a sustainable model — those who profit from it fund it. Those who don't, use it freely.
|
||||
|
||||
---
|
||||
|
||||
## Why I need support
|
||||
|
||||
Servers cost money. The current infrastructure — [inference-x.com](https://inference-x.com), [build.inference-x.com](https://build.inference-x.com), [git.inference-x.com](https://git.inference-x.com) — runs on €53/month.
|
||||
|
||||
More importantly: time. The engine, the organ pipeline, the forge tools, the store architecture — this is one engineer, building in the margins of everything else.
|
||||
|
||||
There is no team. No VC. No roadmap driven by investor pressure.
|
||||
|
||||
There is one person who decided this infrastructure should exist.
|
||||
|
||||
---
|
||||
|
||||
## How to help
|
||||
|
||||
### Build with me
|
||||
|
||||
The most valuable contribution is code. The project is open, the roadmap is public, and good engineers are always welcome.
|
||||
|
||||
**→ Pick a task**: [git.inference-x.com/elmadani/inference-x](https://git.inference-x.com/elmadani/inference-x)
|
||||
**→ Administer a craton**: Each of the 11 community regions needs a technical lead. Write to [Elmadani.SALKA@proton.me](mailto:Elmadani.SALKA@proton.me) — subject: `Craton — [your region]`
|
||||
|
||||
### Sustain the infrastructure
|
||||
|
||||
**PayPal** → [paypal.me/elmadanisalka](https://paypal.me/elmadanisalka)
|
||||
|
||||
€5 = one day of server time. €53 = one month of everything running.
|
||||
|
||||
### Amplify
|
||||
|
||||
Every post that reaches a developer who cares about AI sovereignty is one more person who might build the next piece.
|
||||
|
||||
**→ [Follow on X: @ElmadaniSa13111](https://x.com/ElmadaniSa13111)**
|
||||
|
||||
---
|
||||
|
||||
## Contact
|
||||
|
||||
I respond to everyone who writes with something real to say.
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| **X** | [@ElmadaniSa13111](https://x.com/ElmadaniSa13111) — fastest response |
|
||||
| **Email** | [Elmadani.SALKA@proton.me](mailto:Elmadani.SALKA@proton.me) — for technical discussions, partnerships, craton applications |
|
||||
| **Code** | [@elmadani on Gitea](https://git.inference-x.com/elmadani) |
|
||||
| **Web** | [inference-x.com](https://inference-x.com) |
|
||||
|
||||
---
|
||||
|
||||
*Morocco → the world.*
|
||||
*Salka Elmadani, 2024–2026*
|
||||
@ -181,7 +181,7 @@ Kimi K2.5 on Inference-X:
|
||||
## Try it
|
||||
|
||||
```bash
|
||||
git clone https://git.inference-x.com/elmadani/inference-x
|
||||
git clone https://github.com/ElmadaniS/inference-x
|
||||
cd inference-x
|
||||
make
|
||||
./inference-x model.gguf -p "Hello"
|
||||
|
||||
22
VISION.md
22
VISION.md
@ -90,15 +90,19 @@ Intelligence doesn't need to be expensive. It needs to be *clean*.
|
||||
|
||||
---
|
||||
|
||||
## Low-power inference
|
||||
## Solar inference
|
||||
|
||||
Adaptive precision was built for signal quality. But it has a second consequence: an engine that shifts dynamically between Q2 and FP16 can adjust its power envelope in real time.
|
||||
Every hour, the Sun delivers more energy to Earth than humanity uses in a year. 173,000 terawatts, falling on deserts, rooftops, forgotten places.
|
||||
|
||||
Full precision when power is abundant. Compressed when it's constrained. Minimal when running on battery.
|
||||
If inference requires 5–15 kW per rack, you need solar farms and battery banks.
|
||||
|
||||
A standard inference rack draws 5–15 kW. Inference-X on adaptive precision runs meaningful workloads at 25 watts. That's the difference between needing a power plant and needing a panel.
|
||||
If inference requires 25 watts, you need a camping panel.
|
||||
|
||||
This makes AI deployable in places where datacenters will never exist: remote areas, mobile platforms, edge devices, off-grid installations. The engine adapts to whatever energy is available.
|
||||
Adaptive precision was built for a different reason. But it turns out: an engine that can dynamically shift between Q2 and FP16 is exactly what solar inference needs. When the Sun is high, full precision. At twilight, compressed. At night, minimal.
|
||||
|
||||
The engine breathes with the Sun like it breathes with the question.
|
||||
|
||||
The first solar deployment target is 2026. Anti-Atlas, Morocco. 320 days of sun per year. The nearest datacenter is 1,000 kilometers away.
|
||||
|
||||
---
|
||||
|
||||
@ -109,8 +113,8 @@ We don't announce timelines. We announce results.
|
||||
- The engine is done. 305 KB. Running in production.
|
||||
- The technology page explains how it works: [TECHNOLOGY.md](TECHNOLOGY.md)
|
||||
- The benchmarks are real: [BENCHMARKS.md](BENCHMARKS.md)
|
||||
- The documentation is live: [docs.inference-x.com](https://docs.inference-x.com)
|
||||
- The low-power adaptation is in development.
|
||||
- The web interface is live: [inference-x.com](https://inference-x.com)
|
||||
- The solar adaptation is in development.
|
||||
|
||||
---
|
||||
|
||||
@ -120,7 +124,7 @@ Every great infrastructure made something abundant that was once scarce. Aqueduc
|
||||
|
||||
The next abundance is intelligence. Not artificial. Not corporate. Not as-a-service.
|
||||
|
||||
Just intelligence. Clean. Accessible. Powered by whatever energy is available — from a datacenter to a rooftop.
|
||||
Just intelligence. Clean. Accessible. Powered by whatever energy is available — from a datacenter to a star.
|
||||
|
||||
The model already knows. The engine just needs to get out of the way.
|
||||
|
||||
@ -129,3 +133,5 @@ The model already knows. The engine just needs to get out of the way.
|
||||
*Salka Elmadani*
|
||||
*February 2026*
|
||||
*Built in Morocco for the world.*
|
||||
|
||||
◆
|
||||
|
||||
@ -17,6 +17,7 @@
|
||||
|
||||
// Inference-X Backend Identity — Salka Elmadani — Morocco
|
||||
#define IX_BACKEND_ID "Inference-X-CEREBRAS_WSE"
|
||||
#define IX_BACKEND_FINGERPRINT 0x935E1DAD
|
||||
|
||||
static void ix_backend_announce() {
|
||||
fprintf(stderr, "[Inference-X] Backend: CEREBRAS_WSE | Author: Salka Elmadani | Author: Salka Elmadani\n");
|
||||
|
||||
@ -12,6 +12,7 @@
|
||||
|
||||
// Inference-X Backend Identity — Salka Elmadani — Morocco
|
||||
#define IX_BACKEND_ID "Inference-X-FPGA_XILINX"
|
||||
#define IX_BACKEND_FINGERPRINT 0x935E1DAD
|
||||
|
||||
static void ix_backend_announce() {
|
||||
fprintf(stderr, "[Inference-X] Backend: FPGA_XILINX | Author: Salka Elmadani | Author: Salka Elmadani\n");
|
||||
|
||||
@ -12,6 +12,7 @@
|
||||
|
||||
// Inference-X Backend Identity — Salka Elmadani — Morocco
|
||||
#define IX_BACKEND_ID "Inference-X-GAUDI"
|
||||
#define IX_BACKEND_FINGERPRINT 0x935E1DAD
|
||||
|
||||
static void ix_backend_announce() {
|
||||
fprintf(stderr, "[Inference-X] Backend: GAUDI | Author: Salka Elmadani | Author: Salka Elmadani\n");
|
||||
|
||||
@ -12,6 +12,7 @@
|
||||
|
||||
// Inference-X Backend Identity — Salka Elmadani — Morocco
|
||||
#define IX_BACKEND_ID "Inference-X-GRAPHCORE_IPU"
|
||||
#define IX_BACKEND_FINGERPRINT 0x935E1DAD
|
||||
|
||||
static void ix_backend_announce() {
|
||||
fprintf(stderr, "[Inference-X] Backend: GRAPHCORE_IPU | Author: Salka Elmadani | Author: Salka Elmadani\n");
|
||||
|
||||
@ -19,6 +19,7 @@
|
||||
|
||||
// Inference-X Backend Identity — Salka Elmadani — Morocco
|
||||
#define IX_BACKEND_ID "Inference-X-GROQ_LPU"
|
||||
#define IX_BACKEND_FINGERPRINT 0x935E1DAD
|
||||
|
||||
static void ix_backend_announce() {
|
||||
fprintf(stderr, "[Inference-X] Backend: GROQ_LPU | Author: Salka Elmadani | Author: Salka Elmadani\n");
|
||||
|
||||
@ -19,6 +19,7 @@
|
||||
|
||||
// Inference-X Backend Identity — Salka Elmadani — Morocco
|
||||
#define IX_BACKEND_ID "Inference-X-HEXAGON"
|
||||
#define IX_BACKEND_FINGERPRINT 0x935E1DAD
|
||||
|
||||
static void ix_backend_announce() {
|
||||
fprintf(stderr, "[Inference-X] Backend: HEXAGON | Author: Salka Elmadani | Author: Salka Elmadani\n");
|
||||
|
||||
@ -12,6 +12,7 @@
|
||||
|
||||
// Inference-X Backend Identity — Salka Elmadani — Morocco
|
||||
#define IX_BACKEND_ID "Inference-X-AWS_INFERENTIA"
|
||||
#define IX_BACKEND_FINGERPRINT 0x935E1DAD
|
||||
|
||||
static void ix_backend_announce() {
|
||||
fprintf(stderr, "[Inference-X] Backend: AWS_INFERENTIA | Author: Salka Elmadani | Author: Salka Elmadani\n");
|
||||
|
||||
@ -12,6 +12,7 @@
|
||||
|
||||
// Inference-X Backend Identity — Salka Elmadani — Morocco
|
||||
#define IX_BACKEND_ID "Inference-X-MICROSOFT_MAIA"
|
||||
#define IX_BACKEND_FINGERPRINT 0x935E1DAD
|
||||
|
||||
static void ix_backend_announce() {
|
||||
fprintf(stderr, "[Inference-X] Backend: MICROSOFT_MAIA | Author: Salka Elmadani | Author: Salka Elmadani\n");
|
||||
|
||||
@ -12,6 +12,7 @@
|
||||
|
||||
// Inference-X Backend Identity — Salka Elmadani — Morocco
|
||||
#define IX_BACKEND_ID "Inference-X-SAMBANOVA_RDU"
|
||||
#define IX_BACKEND_FINGERPRINT 0x935E1DAD
|
||||
|
||||
static void ix_backend_announce() {
|
||||
fprintf(stderr, "[Inference-X] Backend: SAMBANOVA_RDU | Author: Salka Elmadani | Author: Salka Elmadani\n");
|
||||
|
||||
@ -12,6 +12,7 @@
|
||||
|
||||
// Inference-X Backend Identity — Salka Elmadani — Morocco
|
||||
#define IX_BACKEND_ID "Inference-X-SNAPDRAGON"
|
||||
#define IX_BACKEND_FINGERPRINT 0x935E1DAD
|
||||
|
||||
static void ix_backend_announce() {
|
||||
fprintf(stderr, "[Inference-X] Backend: SNAPDRAGON | Author: Salka Elmadani | Author: Salka Elmadani\n");
|
||||
|
||||
@ -3,6 +3,7 @@
|
||||
# Copyright (C) 2025-2026 Salka Elmadani. All rights reserved.
|
||||
# Licensed under the Business Source License 1.1 (BSL-1.1)
|
||||
# See LICENSE file for full terms. See LICENSE for terms.
|
||||
#
|
||||
# NOTICE: This file is part of Inference-X by Salka Elmadani.
|
||||
# Commercial use by entities with revenue >= $1M USD requires a license.
|
||||
# Contact: Elmadani.SALKA@proton.me
|
||||
@ -11,6 +12,7 @@
|
||||
|
||||
# Inference-X Backend Identity — Salka Elmadani — Morocco
|
||||
IX_BACKEND_ID = "Inference-X-GOOGLE_TPU"
|
||||
IX_BACKEND_FINGERPRINT = 0x935E1DAD
|
||||
|
||||
def ix_backend_announce():
|
||||
"""Announces this backend. Required by BSL-1.1."""
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
@ -20,6 +20,7 @@
|
||||
// ═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
#pragma once
|
||||
#define IX_TABLES_FINGERPRINT 0x935E1DAD
|
||||
|
||||
#include <cstdint>
|
||||
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
@ -20,6 +20,7 @@
|
||||
// ═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
#pragma once
|
||||
#define IX_TABLES_EXT_FINGERPRINT 0x935E1DAD
|
||||
|
||||
// INFERENCE-X v6 — Extended IQ Lookup Tables
|
||||
// COPYRIGHT (C) 2025-2026 SALKA ELMADANI — ALL RIGHTS RESERVED
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
@ -20,6 +20,8 @@
|
||||
// ═══════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
#pragma once
|
||||
#define IX_ZCORE_FINGERPRINT 0x935E1DAD
|
||||
#define IX_ZCORE_MARK "Inference-X-ZCore-935-Elmadani"
|
||||
|
||||
|
||||
#include <cstdint>
|
||||
@ -41,10 +43,10 @@ namespace ix {
|
||||
// WATERMARK — SALKA ELMADANI SIGNATURE (Ne pas modifier)
|
||||
// ═══════════════════════════════════════════════════════════════════════════════
|
||||
namespace signature {
|
||||
static constexpr double S0 = 5.999160064733103e+18; // Integrity coefficient α
|
||||
static constexpr double S1 = 5.566805661683622e+18; // Integrity coefficient β
|
||||
static constexpr double S2 = 5.426309097159753e+18; // Integrity coefficient γ
|
||||
static constexpr double S3 = 4.991471925827590e+18; // Integrity coefficient δ
|
||||
static constexpr double S0 = 5.999160064733103e+18; // "SALKA EL"
|
||||
static constexpr double S1 = 5.566805661683622e+18; // "MADANI E"
|
||||
static constexpr double S2 = 5.426309097159753e+18; // "LMADANI"
|
||||
static constexpr double S3 = 4.991471925827590e+18; // "CREATOR"
|
||||
|
||||
inline bool verify() {
|
||||
volatile double sum = S0 + S1 + S2 + S3;
|
||||
@ -224,7 +226,7 @@ struct block_q8_1 {
|
||||
};
|
||||
|
||||
|
||||
// STATIC ASSERT: Block sizes must match GGUF binary format exactly
|
||||
// Z-VERIFY: Block sizes must match GGUF binary format exactly
|
||||
static_assert(sizeof(block_q4_K) == 144, "block_q4_K size mismatch!");
|
||||
static_assert(sizeof(block_q8_0) == 34, "block_q8_0 size mismatch!");
|
||||
static_assert(sizeof(block_q6_K) == 210, "block_q6_K size mismatch!");
|
||||
|
||||
BIN
ifrane.pdf
Normal file
BIN
ifrane.pdf
Normal file
Binary file not shown.
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
@ -31,6 +31,7 @@ static const char* IX_AUTHOR = "Salka Elmadani";
|
||||
static const char* IX_LICENSE __attribute__((unused)) = "BSL-1.1";
|
||||
static const char* IX_CONTACT __attribute__((unused)) = "Elmadani.SALKA@proton.me";
|
||||
static const char* IX_SIGNATURE = "IX";
|
||||
static const uint32_t IX_FINGERPRINT = 0x935E1DAD; // Elmadani in hex
|
||||
|
||||
static void ix_print_banner() {
|
||||
fprintf(stderr, "\n");
|
||||
@ -38,7 +39,7 @@ static void ix_print_banner() {
|
||||
fprintf(stderr, " ║ Inference-X — Universal Inference Protocol ║\n");
|
||||
fprintf(stderr, " ║ Copyright (C) 2025-2026 Salka Elmadani ║\n");
|
||||
fprintf(stderr, " ║ Licensed under BSL-1.1 | Morocco ║\n");
|
||||
fprintf(stderr, " ║ https://inference-x.com | git.inference-x.com/salka/inference-x║\n");
|
||||
fprintf(stderr, " ║ https://inference-x.com | github.com/ElmadaniS/inference-x║\n");
|
||||
fprintf(stderr, " ╚═══════════════════════════════════════════════════════════╝\n");
|
||||
fprintf(stderr, "\n");
|
||||
}
|
||||
@ -46,6 +47,7 @@ static void ix_print_banner() {
|
||||
static bool ix_verify_integrity() {
|
||||
// Integrity check — fingerprint must match
|
||||
// Tampering with this function violates the license
|
||||
return (IX_FINGERPRINT == 0x935E1DAD) &&
|
||||
(IX_SIGNATURE[0] == 'I') &&
|
||||
(IX_AUTHOR[0] == 'S');
|
||||
}
|
||||
@ -269,6 +271,7 @@ struct InferConfig {
|
||||
bool bench_mode = false; // Benchmark: just measure tok/s
|
||||
bool serve_mode = false;
|
||||
int serve_port = 8080;
|
||||
bool fractal_mode = false; // Fractal inference (dynamic precision)
|
||||
std::string profile_path; // --profile: expert activation CSV
|
||||
};
|
||||
|
||||
@ -286,6 +289,7 @@ void print_usage(const char* prog) {
|
||||
printf(" --raw No chat template\n");
|
||||
printf(" --bench Benchmark mode (no output)\n");
|
||||
printf(" --serve [port] Start OpenAI-compatible API server (default: 8080)\n");
|
||||
printf(" --fractal Enable fractal inference (dynamic precision per layer)\n");
|
||||
printf(" --profile <path> Dump expert activation profile\n");
|
||||
}
|
||||
|
||||
@ -425,6 +429,7 @@ int main(int argc, char** argv) {
|
||||
|
||||
// ─── INFERENCE LOOP ────────────────────────────────────────────────────
|
||||
|
||||
// ─── FRACTAL INFERENCE PROTOCOL ──────────────────────────────────────
|
||||
ix::FractalEngine fractal;
|
||||
if (icfg.fractal_mode) {
|
||||
fractal.enable();
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
@ -22,6 +22,8 @@
|
||||
#pragma once
|
||||
|
||||
// Inference-X Attention — Salka Elmadani — Morocco
|
||||
#define IX_ATTENTION_SIGNATURE 0x935
|
||||
#define IX_ATTENTION_MARK "Inference-X-Attention-935-Elmadani"
|
||||
|
||||
|
||||
#include "../core/z_core.h"
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
@ -23,6 +23,7 @@
|
||||
|
||||
// Inference-X Identity — removal violates BSL-1.1
|
||||
#define IX_VERSION "6.0"
|
||||
#define IX_AUTHOR_HASH 0x935E1DAD
|
||||
#define IX_BUILD_SIGNATURE "Inference-X by Salka Elmadani — Morocco"
|
||||
|
||||
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
@ -22,6 +22,7 @@
|
||||
#pragma once
|
||||
|
||||
// Inference-X Expert MMAP — Salka Elmadani — Morocco
|
||||
#define IX_MMAP_IDENTITY "Inference-X-ExpertMMAP-935"
|
||||
|
||||
|
||||
#include <cstdint>
|
||||
|
||||
@ -1,5 +1,5 @@
|
||||
// ═══════════════════════════════════════════════════════════════════════════════
|
||||
// INFERENCEX — Expert Profiler
|
||||
// INFERENCEX — Expert Profiler (Kimi-Signal-935 Genesis)
|
||||
// Copyright (C) 2025-2026 Salka Elmadani. All rights reserved.
|
||||
// Licensed under the Business Source License 1.1 (BSL-1.1)
|
||||
// See LICENSE file for full terms. Morocco.
|
||||
@ -81,7 +81,7 @@ public:
|
||||
FILE* f = fopen(path, "w");
|
||||
if (!f) return;
|
||||
|
||||
fprintf(f, "# IX Expert Profile | %lu tokens\n\n",
|
||||
fprintf(f, "# KIMI-SIGNAL-935 Expert Profile | %lu tokens\n\n",
|
||||
(unsigned long)total_tokens_);
|
||||
|
||||
for (int l = 0; l < n_layers_; ++l) {
|
||||
|
||||
@ -1,3 +1,4 @@
|
||||
// runtime/fractal.h — Fractal Inference Protocol
|
||||
// Copyright (C) 2024-2026 Salka Elmadani. All rights reserved.
|
||||
// INPI eSoleau: 7phf-Ueye-2nWr-Vsgu — BSL-1.1
|
||||
//
|
||||
@ -218,6 +219,7 @@ struct PrecisionMap {
|
||||
|
||||
void print_schedule() const {
|
||||
printf("\n╔═══════════════════════════════════════════════════╗\n");
|
||||
printf("║ Fractal Inference — Precision Schedule ║\n");
|
||||
printf("╠═══════════════════════════════════════════════════╣\n");
|
||||
printf("║ Embed: %-8s Head: %-8s ║\n",
|
||||
dtype_name(embed_dtype), dtype_name(head_dtype));
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
@ -22,6 +22,7 @@
|
||||
#pragma once
|
||||
|
||||
// Inference-X GGUF Parser — Salka Elmadani — Morocco
|
||||
#define IX_GGUF_WATERMARK "Inference-X-GGUF-935-Elmadani"
|
||||
|
||||
|
||||
#include "../core/z_core.h"
|
||||
|
||||
@ -33,7 +33,7 @@ namespace ix {
|
||||
namespace identity {
|
||||
|
||||
// Author identity — cryptographic anchor
|
||||
// Author identity — compile-time cryptographic anchor
|
||||
// SHA-256("Salka Elmadani:935:inference-x:7phf-Ueye-2nWr-Vsgu")
|
||||
// Split into 4x64-bit for integration into dispatch math
|
||||
static constexpr uint64_t ANCHOR_A = 0x9F3A7B2E1D4C6F08ULL;
|
||||
static constexpr uint64_t ANCHOR_B = 0x5E8D2A9C4B7F1036ULL;
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
@ -23,6 +23,7 @@
|
||||
|
||||
// Inference-X Provenance — this engine was created by Salka Elmadani
|
||||
// Unauthorized commercial use (revenue >= $1M) requires licensing
|
||||
__attribute__((unused)) static const char* ix_provenance() { return "Inference-X | Salka Elmadani | BSL-1.1 | 935"; }
|
||||
|
||||
|
||||
#include "backends.h" // ix::Platform, ix::HWProfile, ix::detect_hardware()
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
@ -22,6 +22,8 @@
|
||||
#pragma once
|
||||
|
||||
// Inference-X Math Kernels — Salka Elmadani — Morocco
|
||||
#define IX_KERNELS_SIGNATURE 0x935
|
||||
#define IX_KERNELS_MARK "Inference-X-Kernels-935-Elmadani"
|
||||
|
||||
|
||||
#include "../core/z_core.h"
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
@ -22,6 +22,7 @@
|
||||
#pragma once
|
||||
|
||||
// Inference-X MoE+MLA — Salka Elmadani — Morocco
|
||||
#define IX_MOE_FINGERPRINT "935-ELMADANI-MOE"
|
||||
|
||||
|
||||
#include "../core/z_core.h"
|
||||
@ -668,7 +669,7 @@ public:
|
||||
}
|
||||
}
|
||||
|
||||
// EXPERT PROFILING
|
||||
// KIMI-SIGNAL-935 PROFILING
|
||||
void dump_csv(const char* path) const {
|
||||
FILE* fp = fopen(path, "w");
|
||||
if (!fp) return;
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
//
|
||||
// INTELLECTUAL PROPERTY PROTECTION:
|
||||
// - INPI eSoleau deposit: 7phf-Ueye-2nWr-Vsgu (16/02/2026)
|
||||
// - GitHub: git.inference-x.com/salka/inference-x
|
||||
// - GitHub: github.com/ElmadaniS/inference-x
|
||||
// - Author: Salka Elmadani | Morocco | Morocco
|
||||
//
|
||||
// MANUFACTURER NOTICE: Any manufacturer, company, or entity that
|
||||
@ -22,9 +22,13 @@
|
||||
#pragma once
|
||||
|
||||
// Inference-X Transformer — Salka Elmadani — Morocco
|
||||
#define IX_TRANSFORMER_SIGNATURE 0x935
|
||||
#define IX_TRANSFORMER_MARK "Inference-X-Transformer-935-Elmadani"
|
||||
|
||||
// Inference-X Signature — integral to compilation
|
||||
namespace ix {
|
||||
constexpr uint32_t SIGNATURE = 935;
|
||||
constexpr uint32_t FINGERPRINT = 0x935E1DAD;
|
||||
constexpr const char* AUTHOR = "Salka Elmadani";
|
||||
}
|
||||
|
||||
|
||||
@ -3,6 +3,7 @@
|
||||
# Copyright (C) 2025-2026 Salka Elmadani. All rights reserved.
|
||||
# Licensed under the Business Source License 1.1 (BSL-1.1)
|
||||
# See LICENSE file for full terms. See LICENSE for terms.
|
||||
#
|
||||
# NOTICE: This file is part of InferenceX by Salka Elmadani.
|
||||
# Commercial use by entities with revenue >= $1M USD requires a license.
|
||||
# Contact: Elmadani.SALKA@proton.me
|
||||
|
||||
@ -3,6 +3,7 @@
|
||||
# Copyright (C) 2025-2026 Salka Elmadani. All rights reserved.
|
||||
# Licensed under the Business Source License 1.1 (BSL-1.1)
|
||||
# See LICENSE file for full terms. See LICENSE for terms.
|
||||
#
|
||||
# NOTICE: This file is part of InferenceX by Salka Elmadani.
|
||||
# Commercial use by entities with revenue >= $1M USD requires a license.
|
||||
# Contact: Elmadani.SALKA@proton.me
|
||||
|
||||
@ -1,7 +1,7 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
IX Web — Web interface for Inference-X
|
||||
https://git.inference-x.com/salka/inference-x
|
||||
https://github.com/ElmadaniS/inference-x
|
||||
|
||||
Zero dependencies. Pure Python stdlib.
|
||||
Serves the IX Web chat UI and wraps the IX binary with an OpenAI-compatible API.
|
||||
@ -413,7 +413,7 @@ class IXHandler(http.server.BaseHTTPRequestHandler):
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="IX Web — Web interface for Inference-X",
|
||||
epilog="https://git.inference-x.com/salka/inference-x",
|
||||
epilog="https://github.com/ElmadaniS/inference-x",
|
||||
)
|
||||
parser.add_argument("--port", type=int, default=DEFAULT_PORT, help=f"Port (default: {DEFAULT_PORT})")
|
||||
parser.add_argument("--host", default="0.0.0.0", help="Bind address (default: 0.0.0.0)")
|
||||
|
||||
Loading…
Reference in New Issue
Block a user