🌍 Built in Morocco for the world

Intelligence,
for everyone.
No permission needed.

305KB. Runs on your phone, your laptop, your server. Free forever. No cloud, no account, no limit. The AI belongs to whoever runs it.

305KBEntire engine
19Hardware backends
23Model formats
API calls · forever free
$0Per year · your hardware
What is this

Three things to know. Nothing more.

No degree required. If you have a device, you have AI.

📦

It's a tiny file

305 kilobytes. Smaller than a photo on your phone. This file lets your computer run AI — any AI — without the internet. Download it, run it. That's it.

🔒

Your words stay yours

When you use AI online, your questions travel to a distant server. Someone can read them. With Inference-X, nothing leaves your machine. Ever.

It runs on anything

Old laptop, new phone, Raspberry Pi, datacenter. Same file. It detects your hardware and uses it. No configuration needed.

Your hardware

What can YOUR computer do?

Move the slider to your RAM. See what's possible.

1 GB4 GB8 GB16 GB32 GB64 GB128+ GB

RAM: 8 GB — showing models that fit

Your AI runs locally. No internet. No account. Free forever.

Privacy

Where do your words go?

Cloud AI

Your question leaves your device, crosses the internet, reaches a server in another country, gets processed, stored, and analyzed. You pay per word.

⚠ Your data · their server · their rules
Inference-X

Your question stays on your desk. The answer is computed by your own processor. Nothing leaves. Nothing is stored. You pay nothing.

✓ Your data · your processor · your rules
Footprint

How small is 305 KB?

The entire AI engine — smaller than what you think.

Inference-X 305 KB
iPhone photo ~3 MB
Average app ~50 MB
Chrome ~200 MB

All 19 hardware targets, all 23 formats — in less space than a single photo on your phone.

The engine

One binary to run them all.

Written in C++. No dependencies. No runtime. No cloud. Any silicon, any OS, any AI model.

CUDANVIDIA GPU
MetalApple Silicon
VulkanAny GPU
ROCmAMD GPU
OpenCLAny GPU
SYCLIntel GPU
CPU x86Intel/AMD
CPU ARMMobile/Pi
RISC-VEmerging
WebGPUBrowser
TPUGoogle
FPGACustom HW
InferentiaAWS
GaudiIntel
GroqLPU
CerebrasWafer
SambaNovaRDU
GraphcoreIPU
Custom+ your HW
Zero-Copy Inference
Dequantization and matrix multiply in one instruction loop. No intermediate buffer.
Trillion-Parameter Native
Only active experts exist in memory. A 1T-parameter model runs on 64 GB RAM.
Smart Precision
Simple questions get compressed layers. Complex reasoning gets full precision.
Zero Telemetry
No network calls. No phone-home. Works on a plane, in a submarine, on the moon.
Auto-Detect
Architecture, chat templates, EOS tokens — auto-detected from model metadata.
Self-Configuring
The Makefile detects your hardware. You don't configure it — it configures itself.
What runs on it

Any GGUF model. Zero setup.

Download a model from HuggingFace or Ollama. Drop it in. Run it. These are models we've benchmarked.

LLaMA 3.2 · 1B
Quick answers. Tiny device. Lightning fast.
1 GB RAMmobile-readyfast
Mistral · 7B
Smart conversations, code help, translations.
5 GB RAMmultilingual
LLaMA 3.1 · 8B
Meta's compact model. Great reasoning at low cost.
6 GB RAMreasoning
Mistral · 22B
Creative writing, analysis, multilingual expert.
16 GB RAMcreative
LLaMA 3.1 · 70B
Full-featured assistant. Code. Math. Logic.
48 GB RAMcodemath
DeepSeek · 671B
Advanced reasoning. Expert-level answers. MoE architecture.
64 GB RAMexpertMoE
Phi-3 · 3.8B
Microsoft's small model. Punches far above its weight.
3 GB RAMefficient
Qwen 2.5 · 7B
Chinese-developed. Excellent for multilingual tasks.
5 GB RAMmultilingualcode
+ any GGUF
Download from HuggingFace. Drop in folder. Done.
any size
The real cost

How much does AI cost?

Using AI 1 hour per day, every day, for a year.

$2,500+
per year · and rising · your data = their product
API key required · Rate limited · Terms can change
$0
forever · electricity only · your data stays yours
No API key. No subscription. No limit. Your hardware, your AI.
For developers

OpenAI-compatible API

Start with --serve 8080. Drop-in replacement. Any client library works.

# Start the inference server
./inference-x --model llama3.gguf --serve 8080

# Works with any OpenAI SDK
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" \
  -d '{"model":"llama3","messages":[{"role":"user","content":"Hello"}]}'
POST /v1/chat/completions POST /v1/completions GET /v1/models GET /health GET /v1/embeddings
Get started

Ready? Three steps.

Pick your system.

1
Download the binary
# x86_64 with CUDA/CPU
curl -LO https://git.inference-x.com/elmadani/inference-x/releases/download/v1.0/ix-linux-x64
chmod +x ix-linux-x64
2
Get a model
# Download any GGUF from HuggingFace
wget https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf
3
Run it
./ix-linux-x64 --model Llama-3.2-1B-Instruct-Q4_K_M.gguf
# or serve as API:
./ix-linux-x64 --model Llama-3.2-1B-Instruct-Q4_K_M.gguf --serve 8080
1
Download (Apple Silicon native)
curl -LO https://git.inference-x.com/elmadani/inference-x/releases/download/v1.0/ix-macos-arm64
chmod +x ix-macos-arm64
2
Get a model
# Metal GPU acceleration automatic on Apple Silicon
wget https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf
3
Run it
./ix-macos-arm64 --model Llama-3.2-1B-Instruct-Q4_K_M.gguf
1
Download
# PowerShell
Invoke-WebRequest -Uri "https://git.inference-x.com/elmadani/inference-x/releases/download/v1.0/ix-windows-x64.exe" -OutFile "ix.exe"
2
Get a model — download any .gguf file from HuggingFace
3
Run it
.\ix.exe --model model.gguf
1
ARM build for Raspberry Pi 4/5
curl -LO https://git.inference-x.com/elmadani/inference-x/releases/download/v1.0/ix-linux-arm64
chmod +x ix-linux-arm64
2
Get a small model (fits in 1-4GB)
wget https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf
3
Run on Pi
./ix-linux-arm64 --model Llama-3.2-1B-Instruct-Q4_K_M.gguf
# Pi 4 4GB: runs 1B models at ~8 tok/s
Community

The tools we built together.

Inference-X is the core. Around it, the community builds the ecosystem. Here's what exists today — more is being forged every day.

LIVE
IX Engine
The core. 228KB C++ binary. 19 backends. Zero dependencies. The foundation everything runs on.
LIVE
🛠
Community SaaS
Cloud playground. Deploy models, test APIs, share with others. No installation. Donation-powered.
LIVE
📡
Hardware Scout
See every IX node running globally. Real-time compute map. Who runs what, how fast.
BUILDING
🫀
Organ Store
Extract, share and transplant AI model components. Attention heads, FFN layers, expert blocks. The future of open AI.
BUILDING
🔬
Organ Architect
Analyze model internals. Visualize layers, heads, topology. Like an MRI for AI models.
BUILDING
🔥
The Forge
Community fine-tuning platform. Contribute training data, improve models, share results. Collective intelligence.
COMING
🎙
GhostVoice
Neural voice synthesis. Clone, create, share voice models. Same philosophy: local, private, yours.
COMING
🌐
Echo Relay
Federated inference network. Your idle hardware earns you compute credits. The khettara for AI power.
The future

AI organ transplants.

Neural networks have anatomy. Layers. Attention heads. Expert blocks. We built tools to extract them, study them, and transplant them between models. The community will fill the store.

🧠
⚙️
extract
🫀
💉
transplant
🧬
Vision: A community marketplace where builders extract specialized capabilities from models — multilingual reasoning, code completion, visual understanding — and share them as components others can transplant. The Organ Store doesn't exist yet. The community will build it.
🔍
Analyze
Map model internals
⚗️
Extract
Isolate components
📦
Publish
Share to the store
💉
Transplant
Enhance any model
"In the Moroccan desert, ancient builders carved underground canals — khettaras — that deliver water to entire villages using only gravity. No pump. No electricity. No central authority. They've worked for centuries. Inference-X is a khettara for intelligence: built by many, maintained by many, flowing to anyone who needs it."

Inference-X has no enemies. Every researcher, every company, every government that processes AI is playing a role. We're not competing — we're building the infrastructure that makes all of it accessible to everyone who was left out.

Community hardware

Every IX node on Earth. Live.

When you run Inference-X, you can optionally report your hardware telemetry. This is the network. Anonymous. Voluntary. Real.

Backend Nodes Avg tok/s Avg load Status
Loading community hardware data...
License

Free for those who need it. Fair for those who profit.

No tricks. No hidden limits. The engine is the same everywhere.

Commercial Fair
20% rev
Companies with $1M+ annual revenue using IX in production. 20% of revenue attributed to IX-powered features goes to the community fund. Transparent. Auditable.
80% flows to community builders
Industrial Embed
Custom
Hardware manufacturers embedding IX in products. Custom licensing for bulk distribution, signed binaries, hardware co-optimization. Contact us.
Redistribute · Co-brand · Optimize
Join the builders

11 seats. One per craton.

The governance of Inference-X is anchored in geology. 11 ancient continental cratons — the most stable structures on Earth — give their names to 11 permanent Core Team seats. One per major civilization region. Designed to last as long as the rocks.

2.7 Ga · Africa
🪨 Anti-Atlas
Morocco · North Africa
⚒ Founder — Elmadani Salka
3.6 Ga · Africa
💎 Kaapvaal
South Africa, Botswana
2.9 Ga · Africa
🌍 West African
Ghana, Senegal, Mali
2.8 Ga · Africa
🌿 Congo
DRC, Republic of Congo
3.1 Ga · Americas
🍁 Superior
Canada, North America
2.5 Ga · Americas
🌳 Amazon
Brazil, South America
3.1 Ga · Europe
🌊 Baltica
Scandinavia, Eastern Europe
3.0 Ga · Asia
🌲 Siberian
Russia, Central Asia
3.8 Ga · Asia
🏮 North China
China, East Asia
3.0 Ga · Asia
🪷 Dharwar
India, South Asia
3.5 Ga · Oceania
🦘 Pilbara
Australia, Oceania
What craton leaders do
Represent their region in project decisions. Connect local builders. Translate and adapt for local communities. No salary — compensation is access, visibility, and history.
Apply for your craton →