+ — total demos run
+ 🟢 0 instances active now
+ ⚡ — demos today
+ 🌍 5 continents · Real hardware · 30 min · Auto-destroy
+
+
+
+
+
+
+
+ Free demo · No account needed
+
+
Try a real AI. Right now.
+
We spin up an actual server in a real datacenter, install the Inference-X engine, load an AI model, and give you a live chat interface. Everything is erased when you're done. No data stored. No account created.
+
+
+
+
+
+
+
+
+
+
+
+
+ ✓ No account · ✓ No credit card · ✓ 30 min · ✓ Auto-erased
+
These three repositories are the foundation. Fork them. Build on them. Propose your changes. The community makes the decisions — no single company controls the roadmap.
+
+
+
+
+
+
+
+
+ ⚙ inference-x
+
+
elmadani/inference-x
+
+ BSL 1.1
+
+
Universal AI inference engine — 305 KB, 19 hardware backends, zero dependencies. The core: loads any GGUF model, exposes an OpenAI-compatible API, runs on any hardware from Raspberry Pi to data center GPU clusters.
CLI tools, deployment scripts, model downloader, benchmark suite. Everything you need to set up, configure, and monitor your Inference-X deployment. Install scripts for all platforms.
The founder's public work — mathematical frameworks, philosophical essays, project architecture documents. Understand the vision behind Inference-X: why it was built, where it's going, and the H5→H6 consciousness framework.
Configure your IX deployment for your exact hardware and use case. Generate a ready-to-run config file you can deploy anywhere.
+
+
+
+
+
🖥 Hardware
+
🧠 Model
+
🎭 Persona
+
📦 Export
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
8 GB
+
+
Models that fit your hardware:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Generated configuration:
+
+
+
+
+
+
+
+ Quick start:
+ 1. Download the config above
+ 2. Download the IX binary: inference-x.com#start
+ 3. Run: ./ix --config ix-config.json
+
+
+
+
+
+
+
+
+
+
+
Community Compute
+
Power the demos. Earn your place in history.
+
+
+
+
+
The Provider Pool.
+
Every free demo needs real compute. Community providers contribute their idle server capacity. In return, they're credited publicly, gain early access to future IX frameworks, and become part of the infrastructure that democratizes AI.
+
Pioneer providers will have priority integration when the Echo Relay (federated inference network) launches.
+
+
✓Your name/brand displayed as compute provider on every demo
+
✓Daily cost limit you control — never exceed your budget
+
✓Keys encrypted server-side — never exposed to public
+
✓Early access: Echo Relay framework for distributed providers
+
✓OneCloud, Hetzner, OVH or any API-compatible provider
+
+
+
+
🔋 Contribute compute
+
+
+
+
+
+
+
Keys encrypted · Used only for IX demos · Revokable anytime
+
✓ You're in! Your compute will power free AI demos.
+
+
+
+
+
+
Active pool contributors
+
No providers yet — be the first to contribute compute.
No degree required. If you have a device, you have AI.
+
+
+ 📦
+
It's a tiny file
+
305 kilobytes. Smaller than a photo on your phone. This file lets your computer run AI — any AI — without the internet. Download it, run it. That's it.
+
+
+ 🔒
+
Your words stay yours
+
When you use AI online, your questions travel to a distant server. Someone can read them. With Inference-X, nothing leaves your machine. Ever.
+
+
+ ⚡
+
It runs on anything
+
Old laptop, new phone, Raspberry Pi, datacenter. Same file. It detects your hardware and uses it. No configuration needed.
+
+
+
+
+
+
+
Your hardware
+
What can YOUR computer do?
+
Move the slider to your RAM. See what's possible.
+
+
+
+ 1 GB4 GB8 GB16 GB32 GB64 GB128+ GB
+
+
+
+
RAM: 8 GB — showing models that fit
+
+
Your AI runs locally. No internet. No account. Free forever.
+
+
+
+
+
+
Privacy
+
Where do your words go?
+
+
+
Cloud AI
+
Your question leaves your device, crosses the internet, reaches a server in another country, gets processed, stored, and analyzed. You pay per word.
+
⚠ Your data · their server · their rules
+
+
+
Inference-X
+
Your question stays on your desk. The answer is computed by your own processor. Nothing leaves. Nothing is stored. You pay nothing.
+
✓ Your data · your processor · your rules
+
+
+
+
+
+
+
Footprint
+
How small is 305 KB?
+
The entire AI engine — smaller than what you think.
+
+
+
+ Inference-X
+ 305 KB
+
+
+
+ iPhone photo
+ ~3 MB
+
+
+
+ Average app
+ ~50 MB
+
+
+
+ Chrome
+ ~200 MB
+
+
+
All 19 hardware targets, all 23 formats — in less space than a single photo on your phone.
+
+
+
+
+
The engine
+
One binary to run them all.
+
Written in C++. No dependencies. No runtime. No cloud. Any silicon, any OS, any AI model.
+
+
CUDANVIDIA GPU
+
MetalApple Silicon
+
VulkanAny GPU
+
ROCmAMD GPU
+
OpenCLAny GPU
+
SYCLIntel GPU
+
CPU x86Intel/AMD
+
CPU ARMMobile/Pi
+
RISC-VEmerging
+
WebGPUBrowser
+
TPUGoogle
+
FPGACustom HW
+
InferentiaAWS
+
GaudiIntel
+
GroqLPU
+
CerebrasWafer
+
SambaNovaRDU
+
GraphcoreIPU
+
Custom+ your HW
+
+
+
Zero-Copy Inference
Dequantization and matrix multiply in one instruction loop. No intermediate buffer.
+
Trillion-Parameter Native
Only active experts exist in memory. A 1T-parameter model runs on 64 GB RAM.
+
Smart Precision
Simple questions get compressed layers. Complex reasoning gets full precision.
+
Zero Telemetry
No network calls. No phone-home. Works on a plane, in a submarine, on the moon.
+
Auto-Detect
Architecture, chat templates, EOS tokens — auto-detected from model metadata.
+
Self-Configuring
The Makefile detects your hardware. You don't configure it — it configures itself.
+
+
+
+
+
+
What runs on it
+
Any GGUF model. Zero setup.
+
Download a model from HuggingFace or Ollama. Drop it in. Run it. These are models we've benchmarked.
+
+
+
LLaMA 3.2 · 1B
+
Quick answers. Tiny device. Lightning fast.
+
1 GB RAMmobile-readyfast
+
+
+
Mistral · 7B
+
Smart conversations, code help, translations.
+
5 GB RAMmultilingual
+
+
+
LLaMA 3.1 · 8B
+
Meta's compact model. Great reasoning at low cost.
Microsoft's small model. Punches far above its weight.
+
3 GB RAMefficient
+
+
+
Qwen 2.5 · 7B
+
Chinese-developed. Excellent for multilingual tasks.
+
5 GB RAMmultilingualcode
+
+
+
+ any GGUF
+
Download from HuggingFace. Drop in folder. Done.
+
any size
+
+
+
+
+
+
+
The real cost
+
How much does AI cost?
+
Using AI 1 hour per day, every day, for a year.
+
+
+
Cloud API (GPT-4 class)
+
$2,500+
+
per year · and rising · your data = their product
+
API key required · Rate limited · Terms can change
+
+
+
Inference-X (your hardware)
+
$0
+
forever · electricity only · your data stays yours
+
No API key. No subscription. No limit. Your hardware, your AI.
+
+
+
+
+
+
+
For developers
+
OpenAI-compatible API
+
Start with --serve 8080. Drop-in replacement. Any client library works.
+
+ # Start the inference server
+ ./inference-x--model llama3.gguf --serve 8080
+ # Works with any OpenAI SDK
+ curlhttp://localhost:8080/v1/chat/completions-H"Content-Type: application/json" \
+ -d '{"model":"llama3","messages":[{"role":"user","content":"Hello"}]}'
+
+
+ POST /v1/chat/completions
+ POST /v1/completions
+ GET /v1/models
+ GET /health
+ GET /v1/embeddings
+
+
+
+
+
+
Get started
+
Ready? Three steps.
+
Pick your system.
+
+
+
+
+
+
+
+
1
Download the binary
# x86_64 with CUDA/CPU
+curl -LO https://git.inference-x.com/elmadani/inference-x/releases/download/v1.0/ix-linux-x64
+chmod +x ix-linux-x64
+
2
Get a model
# Download any GGUF from HuggingFace
+wgethttps://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf
+
3
Run it
./ix-linux-x64 --model Llama-3.2-1B-Instruct-Q4_K_M.gguf # or serve as API: ./ix-linux-x64 --model Llama-3.2-1B-Instruct-Q4_K_M.gguf --serve 8080
# Metal GPU acceleration automatic on Apple Silicon
+wgethttps://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf
Federated inference network. Your idle hardware earns you compute credits. The khettara for AI power.
+
+
+
+
+
+
+
The future
+
AI organ transplants.
+
Neural networks have anatomy. Layers. Attention heads. Expert blocks. We built tools to extract them, study them, and transplant them between models. The community will fill the store.
+
+
🧠
+
→
+
⚙️ extract
+
→
+
🫀
+
→
+
💉 transplant
+
→
+
🧬
+
+
+ Vision: A community marketplace where builders extract specialized capabilities from models — multilingual reasoning, code completion, visual understanding — and share them as components others can transplant. The Organ Store doesn't exist yet. The community will build it.
+
+
+
+
🔍
+
Analyze
+
Map model internals
+
+
+
⚗️
+
Extract
+
Isolate components
+
+
+
📦
+
Publish
+
Share to the store
+
+
+
💉
+
Transplant
+
Enhance any model
+
+
+
+
+
+
+
+
The vision
+
+ "In the Moroccan desert, ancient builders carved underground canals — khettaras — that deliver water to entire villages using only gravity. No pump. No electricity. No central authority. They've worked for centuries. Inference-X is a khettara for intelligence: built by many, maintained by many, flowing to anyone who needs it."
+
+
Inference-X has no enemies. Every researcher, every company, every government that processes AI is playing a role. We're not competing — we're building the infrastructure that makes all of it accessible to everyone who was left out.
+
+
+
+
+
+
Community hardware
+
Every IX node on Earth. Live.
+
When you run Inference-X, you can optionally report your hardware telemetry. This is the network. Anonymous. Voluntary. Real.
+
+
+
+
Backend
+
Nodes
+
Avg tok/s
+
Avg load
+
Status
+
+
+
+
Loading community hardware data...
+
+
+
+
+
+
+
License
+
Free for those who need it. Fair for those who profit.
+
No tricks. No hidden limits. The engine is the same everywhere.
+
+
+
Free Forever
+
$0
+
Individuals, researchers, students, open-source projects, startups under $1M revenue. No registration. No expiry. No limits. This is the default.
+
✓ Full engine · All backends · All models
+
+
+
Commercial Fair
+
20% rev
+
Companies with $1M+ annual revenue using IX in production. 20% of revenue attributed to IX-powered features goes to the community fund. Transparent. Auditable.
+
80% flows to community builders
+
+
+
Industrial Embed
+
Custom
+
Hardware manufacturers embedding IX in products. Custom licensing for bulk distribution, signed binaries, hardware co-optimization. Contact us.
+
Redistribute · Co-brand · Optimize
+
+
+
+
+
+
+
Join the builders
+
11 seats. One per craton.
+
The governance of Inference-X is anchored in geology. 11 ancient continental cratons — the most stable structures on Earth — give their names to 11 permanent Core Team seats. One per major civilization region. Designed to last as long as the rocks.
Represent their region in project decisions. Connect local builders. Translate and adapt for local communities. No salary — compensation is access, visibility, and history.