No degree required. If you have a device, you have AI.
📦
It's a tiny file
305 kilobytes. Smaller than a photo on your phone. This file lets your computer run AI — any AI — without the internet. Download it, run it. That's it.
🔒
Your words stay yours
When you use AI online, your questions travel to a distant server. Someone can read them. With Inference-X, nothing leaves your machine. Ever.
⚡
It runs on anything
Old laptop, new phone, Raspberry Pi, datacenter. Same file. It detects your hardware and uses it. No configuration needed.
Your hardware
What can YOUR computer do?
Move the slider to your RAM. See what's possible.
1 GB4 GB8 GB16 GB32 GB64 GB128+ GB
RAM: 8 GB — showing models that fit
Your AI runs locally. No internet. No account. Free forever.
Privacy
Where do your words go?
Cloud AI
Your question leaves your device, crosses the internet, reaches a server in another country, gets processed, stored, and analyzed. You pay per word.
⚠ Your data · their server · their rules
Inference-X
Your question stays on your desk. The answer is computed by your own processor. Nothing leaves. Nothing is stored. You pay nothing.
✓ Your data · your processor · your rules
Footprint
How small is 305 KB?
The entire AI engine — smaller than what you think.
Inference-X305 KB
iPhone photo~3 MB
Average app~50 MB
Chrome~200 MB
All 19 hardware targets, all 23 formats — in less space than a single photo on your phone.
The engine
One binary to run them all.
Written in C++. No dependencies. No runtime. No cloud. Any silicon, any OS, any AI model.
CUDANVIDIA GPU
MetalApple Silicon
VulkanAny GPU
ROCmAMD GPU
OpenCLAny GPU
SYCLIntel GPU
CPU x86Intel/AMD
CPU ARMMobile/Pi
RISC-VEmerging
WebGPUBrowser
TPUGoogle
FPGACustom HW
InferentiaAWS
GaudiIntel
GroqLPU
CerebrasWafer
SambaNovaRDU
GraphcoreIPU
Custom+ your HW
Zero-Copy Inference
Dequantization and matrix multiply in one instruction loop. No intermediate buffer.
Trillion-Parameter Native
Only active experts exist in memory. A 1T-parameter model runs on 64 GB RAM.
Smart Precision
Simple questions get compressed layers. Complex reasoning gets full precision.
Zero Telemetry
No network calls. No phone-home. Works on a plane, in a submarine, on the moon.
Auto-Detect
Architecture, chat templates, EOS tokens — auto-detected from model metadata.
Self-Configuring
The Makefile detects your hardware. You don't configure it — it configures itself.
What runs on it
Any GGUF model. Zero setup.
Download a model from HuggingFace or Ollama. Drop it in. Run it. These are models we've benchmarked.
LLaMA 3.2 · 1B
Quick answers. Tiny device. Lightning fast.
1 GB RAMmobile-readyfast
Mistral · 7B
Smart conversations, code help, translations.
5 GB RAMmultilingual
LLaMA 3.1 · 8B
Meta's compact model. Great reasoning at low cost.
Microsoft's small model. Punches far above its weight.
3 GB RAMefficient
Qwen 2.5 · 7B
Chinese-developed. Excellent for multilingual tasks.
5 GB RAMmultilingualcode
+ any GGUF
Download from HuggingFace. Drop in folder. Done.
any size
The real cost
How much does AI cost?
Using AI 1 hour per day, every day, for a year.
Cloud API (GPT-4 class)
$2,500+
per year · and rising · your data = their product
API key required · Rate limited · Terms can change
Inference-X (your hardware)
$0
forever · electricity only · your data stays yours
No API key. No subscription. No limit. Your hardware, your AI.
For developers
OpenAI-compatible API
Start with --serve 8080. Drop-in replacement. Any client library works.
# Start the inference server ./inference-x--model llama3.gguf --serve 8080
# Works with any OpenAI SDK curlhttp://localhost:8080/v1/chat/completions-H"Content-Type: application/json" \ -d '{"model":"llama3","messages":[{"role":"user","content":"Hello"}]}'
POST /v1/chat/completionsPOST /v1/completionsGET /v1/modelsGET /healthGET /v1/embeddings
Get started
Ready? Three steps.
Pick your system.
1
Download the binary
# x86_64 with CUDA/CPUcurl -LO https://git.inference-x.com/elmadani/inference-x/releases/download/v1.0/ix-linux-x64chmod +x ix-linux-x64
2
Get a model
# Download any GGUF from HuggingFacewgethttps://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf
3
Run it
./ix-linux-x64 --model Llama-3.2-1B-Instruct-Q4_K_M.gguf # or serve as API: ./ix-linux-x64 --model Llama-3.2-1B-Instruct-Q4_K_M.gguf --serve 8080
# Metal GPU acceleration automatic on Apple Siliconwgethttps://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf
Federated inference network. Your idle hardware earns you compute credits. The khettara for AI power.
The future
AI organ transplants.
Neural networks have anatomy. Layers. Attention heads. Expert blocks. We built tools to extract them, study them, and transplant them between models. The community will fill the store.
🧠
→
⚙️ extract
→
🫀
→
💉 transplant
→
🧬
Vision: A community marketplace where builders extract specialized capabilities from models — multilingual reasoning, code completion, visual understanding — and share them as components others can transplant. The Organ Store doesn't exist yet. The community will build it.
🔍
Analyze
Map model internals
⚗️
Extract
Isolate components
📦
Publish
Share to the store
💉
Transplant
Enhance any model
The vision
"In the Moroccan desert, ancient builders carved underground canals — khettaras — that deliver water to entire villages using only gravity. No pump. No electricity. No central authority. They've worked for centuries. Inference-X is a khettara for intelligence: built by many, maintained by many, flowing to anyone who needs it."
Inference-X has no enemies. Every researcher, every company, every government that processes AI is playing a role. We're not competing — we're building the infrastructure that makes all of it accessible to everyone who was left out.
Community hardware
Every IX node on Earth. Live.
When you run Inference-X, you can optionally report your hardware telemetry. This is the network. Anonymous. Voluntary. Real.
Backend
Nodes
Avg tok/s
Avg load
Status
Loading community hardware data...
License
Free for those who need it. Fair for those who profit.
No tricks. No hidden limits. The engine is the same everywhere.
Free Forever
$0
Individuals, researchers, students, open-source projects, startups under $1M revenue. No registration. No expiry. No limits. This is the default.
✓ Full engine · All backends · All models
Commercial Fair
20% rev
Companies with $1M+ annual revenue using IX in production. 20% of revenue attributed to IX-powered features goes to the community fund. Transparent. Auditable.
80% flows to community builders
Industrial Embed
Custom
Hardware manufacturers embedding IX in products. Custom licensing for bulk distribution, signed binaries, hardware co-optimization. Contact us.
Redistribute · Co-brand · Optimize
Join the builders
11 seats. One per craton.
The governance of Inference-X is anchored in geology. 11 ancient continental cratons — the most stable structures on Earth — give their names to 11 permanent Core Team seats. One per major civilization region. Designed to last as long as the rocks.
Represent their region in project decisions. Connect local builders. Translate and adapt for local communities. No salary — compensation is access, visibility, and history.