Inference-X — Intelligence for Everyone

What is this

Three things to know. Nothing more.

No degree required. If you have a device, you have AI.

📦

It's a tiny file

305 kilobytes. Smaller than a photo on your phone. This file lets your computer run AI — any AI — without the internet. Download it, run it. That's it.

🔒

Your words stay yours

When you use AI online, your questions travel to a distant server. Someone can read them. With Inference-X, nothing leaves your machine. Ever.

⚡

It runs on anything

Old laptop, new phone, Raspberry Pi, datacenter. Same file. It detects your hardware and uses it. No configuration needed.

The engine

One binary to run them all.

Written in C++. No dependencies. No runtime. No cloud. Any silicon, any OS, any AI model.

CUDANVIDIA GPU

MetalApple Silicon

VulkanAny GPU

ROCmAMD GPU

OpenCLAny GPU

SYCLIntel GPU

CPU x86Intel/AMD

CPU ARMMobile/Pi

RISC-VEmerging

WebGPUBrowser

TPUGoogle

FPGACustom HW

InferentiaAWS

GaudiIntel

GroqLPU

CerebrasWafer

SambaNovaRDU

GraphcoreIPU

Custom+ your HW

Zero-Copy Inference

Dequantization and matrix multiply in one instruction loop. No intermediate buffer.

Trillion-Parameter Native

Only active experts exist in memory. A 1T-parameter model runs on 64 GB RAM.

Smart Precision

Simple questions get compressed layers. Complex reasoning gets full precision.

Zero Telemetry

No network calls. No phone-home. Works on a plane, in a submarine, on the moon.

Auto-Detect

Architecture, chat templates, EOS tokens — auto-detected from model metadata.

Self-Configuring

The Makefile detects your hardware. You don't configure it — it configures itself.

What runs on it

Any GGUF model. Zero setup.

Download a model from HuggingFace or Ollama. Drop it in. Run it. These are models we've benchmarked.

LLaMA 3.2 · 1B

Quick answers. Tiny device. Lightning fast.

1 GB RAMmobile-readyfast

Mistral · 7B

Smart conversations, code help, translations.

5 GB RAMmultilingual

LLaMA 3.1 · 8B

Meta's compact model. Great reasoning at low cost.

6 GB RAMreasoning

Mistral · 22B

Creative writing, analysis, multilingual expert.

16 GB RAMcreative

LLaMA 3.1 · 70B

Full-featured assistant. Code. Math. Logic.

48 GB RAMcodemath

DeepSeek · 671B

Advanced reasoning. Expert-level answers. MoE architecture.

64 GB RAMexpertMoE

Phi-3 · 3.8B

Microsoft's small model. Punches far above its weight.

3 GB RAMefficient

Qwen 2.5 · 7B

Chinese-developed. Excellent for multilingual tasks.

5 GB RAMmultilingualcode

+ any GGUF

Download from HuggingFace. Drop in folder. Done.

any size

For developers

OpenAI-compatible API

Start with --serve 8080. Drop-in replacement. Any client library works.

# Start the inference server
./inference-x --model llama3.gguf --serve 8080

# Works with any OpenAI SDK
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" \
-d '{"model":"llama3","messages":[{"role":"user","content":"Hello"}]}'

POST /v1/chat/completions POST /v1/completions GET /v1/models GET /health GET /v1/embeddings

Get started

Ready? Three steps.

Pick your system.

Download the binary

# x86_64 with CUDA/CPU
curl -LO https://git.inference-x.com/elmadani/inference-x/releases/download/v1.0/ix-linux-x64
chmod +x ix-linux-x64

Get a model

# Download any GGUF from HuggingFace
wget https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf

Run it

./ix-linux-x64 --model Llama-3.2-1B-Instruct-Q4_K_M.gguf
# or serve as API:
./ix-linux-x64 --model Llama-3.2-1B-Instruct-Q4_K_M.gguf --serve 8080

Download (Apple Silicon native)

curl -LO https://git.inference-x.com/elmadani/inference-x/releases/download/v1.0/ix-macos-arm64
chmod +x ix-macos-arm64

Get a model

# Metal GPU acceleration automatic on Apple Silicon
wget https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf

Run it

./ix-macos-arm64 --model Llama-3.2-1B-Instruct-Q4_K_M.gguf

Download

# PowerShell
Invoke-WebRequest -Uri "https://git.inference-x.com/elmadani/inference-x/releases/download/v1.0/ix-windows-x64.exe" -OutFile "ix.exe"

Get a model — download any .gguf file from HuggingFace

Run it

.\ix.exe --model model.gguf

ARM build for Raspberry Pi 4/5

curl -LO https://git.inference-x.com/elmadani/inference-x/releases/download/v1.0/ix-linux-arm64
chmod +x ix-linux-arm64

Get a small model (fits in 1-4GB)

wget https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf

Run on Pi

./ix-linux-arm64 --model Llama-3.2-1B-Instruct-Q4_K_M.gguf
# Pi 4 4GB: runs 1B models at ~8 tok/s

Community

The tools we built together.

Inference-X is the core. Around it, the community builds the ecosystem. Here's what exists today — more is being forged every day.

LIVE

⚡

IX Engine

The core. 228KB C++ binary. 19 backends. Zero dependencies. The foundation everything runs on.

LIVE

🛠

Community SaaS

Cloud playground. Deploy models, test APIs, share with others. No installation. Donation-powered.

LIVE

📡

Hardware Scout

See every IX node running globally. Real-time compute map. Who runs what, how fast.

BUILDING

🫀

Organ Store

Extract, share and transplant AI model components. Attention heads, FFN layers, expert blocks. The future of open AI.

BUILDING

🔬

Organ Architect

Analyze model internals. Visualize layers, heads, topology. Like an MRI for AI models.

BUILDING

🔥

The Forge

Community fine-tuning platform. Contribute training data, improve models, share results. Collective intelligence.

COMING

🎙

GhostVoice

Neural voice synthesis. Clone, create, share voice models. Same philosophy: local, private, yours.

COMING

🌐

Echo Relay

Federated inference network. Your idle hardware earns you compute credits. The khettara for AI power.

The future

AI organ transplants.

Neural networks have anatomy. Layers. Attention heads. Expert blocks. We built tools to extract them, study them, and transplant them between models. The community will fill the store.

🧠

→

⚙️
extract

→

🫀

→

💉
transplant

→

🧬

Vision: A community marketplace where builders extract specialized capabilities from models — multilingual reasoning, code completion, visual understanding — and share them as components others can transplant. The Organ Store doesn't exist yet. The community will build it.

🔍

Analyze

Map model internals

⚗️

Extract

Isolate components

📦

Publish

Share to the store

💉

Transplant

Enhance any model

The vision

"In the Moroccan desert, ancient builders carved underground canals — khettaras — that deliver water to entire villages using only gravity. No pump. No electricity. No central authority. They've worked for centuries. Inference-X is a khettara for intelligence: built by many, maintained by many, flowing to anyone who needs it."

Inference-X has no enemies. Every researcher, every company, every government that processes AI is playing a role. We're not competing — we're building the infrastructure that makes all of it accessible to everyone who was left out.

Backend	Nodes	Avg tok/s	Avg load	Status
Loading community hardware data...

License

Free for those who need it. Fair for those who profit.

No tricks. No hidden limits. The engine is the same everywhere.

Free Forever

Individuals, researchers, students, open-source projects, startups under $1M revenue. No registration. No expiry. No limits. This is the default.

✓ Full engine · All backends · All models

Commercial Fair

20% rev

Companies with $1M+ annual revenue using IX in production. 20% of revenue attributed to IX-powered features goes to the community fund. Transparent. Auditable.

80% flows to community builders

Industrial Embed

Custom

Hardware manufacturers embedding IX in products. Custom licensing for bulk distribution, signed binaries, hardware co-optimization. Contact us.

Redistribute · Co-brand · Optimize

Join the builders

11 seats. One per craton.

The governance of Inference-X is anchored in geology. 11 ancient continental cratons — the most stable structures on Earth — give their names to 11 permanent Core Team seats. One per major civilization region. Designed to last as long as the rocks.

2.7 Ga · Africa

🪨 Anti-Atlas

Morocco · North Africa

⚒ Founder — Elmadani Salka

3.6 Ga · Africa

💎 Kaapvaal

South Africa, Botswana

Apply →

2.9 Ga · Africa

🌍 West African

Ghana, Senegal, Mali

Apply →

2.8 Ga · Africa

🌿 Congo

DRC, Republic of Congo

Apply →

3.1 Ga · Americas

🍁 Superior

Canada, North America

Apply →

2.5 Ga · Americas

🌳 Amazon

Brazil, South America

Apply →

3.1 Ga · Europe

🌊 Baltica

Scandinavia, Eastern Europe

Apply →

3.0 Ga · Asia

🌲 Siberian

Russia, Central Asia

Apply →

3.8 Ga · Asia

🏮 North China

China, East Asia

Apply →

3.0 Ga · Asia

🪷 Dharwar

India, South Asia

Apply →

3.5 Ga · Oceania

🦘 Pilbara

Australia, Oceania

Apply →

What craton leaders do

Represent their region in project decisions. Connect local builders. Translate and adapt for local communities. No salary — compensation is access, visibility, and history.

Apply for your craton →

Intelligence,
for everyone.
No permission needed.

Three things to know. Nothing more.

It's a tiny file

Your words stay yours

It runs on anything

What can YOUR computer do?

Where do your words go?

How small is 305 KB?

One binary to run them all.

Any GGUF model. Zero setup.

How much does AI cost?

OpenAI-compatible API

Ready? Three steps.

The tools we built together.

AI organ transplants.

Every IX node on Earth. Live.

Free for those who need it. Fair for those who profit.

11 seats. One per craton.

Keep the khettara flowing.

Intelligence,for everyone.No permission needed.

Three things to know. Nothing more.

It's a tiny file

Your words stay yours

It runs on anything

What can YOUR computer do?

Where do your words go?

How small is 305 KB?

One binary to run them all.

Any GGUF model. Zero setup.

How much does AI cost?

OpenAI-compatible API

Ready? Three steps.

The tools we built together.

AI organ transplants.

Every IX node on Earth. Live.

Free for those who need it. Fair for those who profit.

11 seats. One per craton.

Keep the khettara flowing.

Intelligence,
for everyone.
No permission needed.