Better output from the same model. Fused computation, adaptive precision, surgical expert loading. 305 KB, 19 backends, zero dependencies. https://inference-x.com
95 lines
2.6 KiB
Markdown
95 lines
2.6 KiB
Markdown
# IX Web — Web Interface for Inference-X
|
|
|
|
IX Web is a self-contained web chat interface for Inference-X. It lets you talk to any AI model running on your own hardware, with a model selector, hardware stats, and an OpenAI-compatible API.
|
|
|
|
**Zero dependencies.** Pure Python stdlib + one HTML file. No npm, no Node.js, no frameworks.
|
|
|
|
## Quickstart
|
|
|
|
```bash
|
|
# 1. Build Inference-X (from repo root)
|
|
make
|
|
|
|
# 2. Download a model
|
|
./ix download qwen-2.5-3b
|
|
|
|
# 3. Start IX Web
|
|
python3 web/ix_server.py
|
|
```
|
|
|
|
Open http://localhost:9090 — that's it. You have your own AI.
|
|
|
|
## What you get
|
|
|
|
- **Chat interface** at `/` — dark theme, model selector, typing indicator, markdown rendering
|
|
- **OpenAI-compatible API** at `/v1/chat/completions` — drop-in replacement for any OpenAI client
|
|
- **Model list** at `/v1/models` — all detected GGUF models with sizes
|
|
- **Hardware stats** at `/health` — CPU, RAM, core count
|
|
- **Hot-swap models** — switch between models from the dropdown, no restart needed
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Browser → ix_server.py (port 9090) → inference-x binary → .gguf model
|
|
```
|
|
|
|
IX Web spawns the IX binary per request. The model loads, generates, and exits. This means:
|
|
|
|
- **Any silicon** — the protocol routes to your hardware
|
|
- **No persistent memory** — each request is independent
|
|
- **Any model size** — from 135M to 1T parameters, if you have the RAM
|
|
|
|
## Options
|
|
|
|
```
|
|
python3 web/ix_server.py --help
|
|
|
|
--port 8080 # Custom port (default: 9090)
|
|
--host 127.0.0.1 # Bind to localhost only
|
|
--ix /path/to/inference-x # Custom binary path
|
|
--models /path/to/models # Custom model directory (repeatable)
|
|
```
|
|
|
|
## Model auto-detection
|
|
|
|
IX Web scans these directories for `.gguf` files:
|
|
|
|
1. `./models/` (repo root)
|
|
2. `~/.cache/inference-x/models/`
|
|
3. `~/models/`
|
|
4. Any path passed via `--models`
|
|
|
|
## API usage
|
|
|
|
IX Web is OpenAI-compatible. Use any client:
|
|
|
|
```python
|
|
import requests
|
|
|
|
r = requests.post("http://localhost:9090/v1/chat/completions", json={
|
|
"model": "qwen-2.5-3b",
|
|
"messages": [{"role": "user", "content": "Hello!"}],
|
|
"max_tokens": 256
|
|
})
|
|
print(r.json()["choices"][0]["message"]["content"])
|
|
```
|
|
|
|
```bash
|
|
curl http://localhost:9090/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model":"auto","messages":[{"role":"user","content":"Hi"}]}'
|
|
```
|
|
|
|
## Files
|
|
|
|
```
|
|
web/
|
|
├── ix_server.py # HTTP server (Python, 0 dependencies)
|
|
├── chat.html # Chat interface (single HTML file)
|
|
└── README.md # This file
|
|
```
|
|
|
|
## License
|
|
|
|
BSL-1.1 — same as Inference-X. Free for all use under $1M revenue.
|