Better output from the same model. Fused computation, adaptive precision, surgical expert loading. 305 KB, 19 backends, zero dependencies. https://inference-x.com |
||
|---|---|---|
| .. | ||
| chat.html | ||
| ix_server.py | ||
| README.md | ||
IX Web — Web Interface for Inference-X
IX Web is a self-contained web chat interface for Inference-X. It lets you talk to any AI model running on your own hardware, with a model selector, hardware stats, and an OpenAI-compatible API.
Zero dependencies. Pure Python stdlib + one HTML file. No npm, no Node.js, no frameworks.
Quickstart
# 1. Build Inference-X (from repo root)
make
# 2. Download a model
./ix download qwen-2.5-3b
# 3. Start IX Web
python3 web/ix_server.py
Open http://localhost:9090 — that's it. You have your own AI.
What you get
- Chat interface at
/— dark theme, model selector, typing indicator, markdown rendering - OpenAI-compatible API at
/v1/chat/completions— drop-in replacement for any OpenAI client - Model list at
/v1/models— all detected GGUF models with sizes - Hardware stats at
/health— CPU, RAM, core count - Hot-swap models — switch between models from the dropdown, no restart needed
Architecture
Browser → ix_server.py (port 9090) → inference-x binary → .gguf model
IX Web spawns the IX binary per request. The model loads, generates, and exits. This means:
- Any silicon — the protocol routes to your hardware
- No persistent memory — each request is independent
- Any model size — from 135M to 1T parameters, if you have the RAM
Options
python3 web/ix_server.py --help
--port 8080 # Custom port (default: 9090)
--host 127.0.0.1 # Bind to localhost only
--ix /path/to/inference-x # Custom binary path
--models /path/to/models # Custom model directory (repeatable)
Model auto-detection
IX Web scans these directories for .gguf files:
./models/(repo root)~/.cache/inference-x/models/~/models/- Any path passed via
--models
API usage
IX Web is OpenAI-compatible. Use any client:
import requests
r = requests.post("http://localhost:9090/v1/chat/completions", json={
"model": "qwen-2.5-3b",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 256
})
print(r.json()["choices"][0]["message"]["content"])
curl http://localhost:9090/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"Hi"}]}'
Files
web/
├── ix_server.py # HTTP server (Python, 0 dependencies)
├── chat.html # Chat interface (single HTML file)
└── README.md # This file
License
BSL-1.1 — same as Inference-X. Free for all use under $1M revenue.