inference-x/web/README.md
Salka Elmadani ec36668cf5 Inference-X v1.0 — Universal AI Inference Engine
Better output from the same model. Fused computation, adaptive precision,
surgical expert loading. 305 KB, 19 backends, zero dependencies.

https://inference-x.com
2026-02-23 07:10:47 +00:00

95 lines
2.6 KiB
Markdown

# IX Web — Web Interface for Inference-X
IX Web is a self-contained web chat interface for Inference-X. It lets you talk to any AI model running on your own hardware, with a model selector, hardware stats, and an OpenAI-compatible API.
**Zero dependencies.** Pure Python stdlib + one HTML file. No npm, no Node.js, no frameworks.
## Quickstart
```bash
# 1. Build Inference-X (from repo root)
make
# 2. Download a model
./ix download qwen-2.5-3b
# 3. Start IX Web
python3 web/ix_server.py
```
Open http://localhost:9090 — that's it. You have your own AI.
## What you get
- **Chat interface** at `/` — dark theme, model selector, typing indicator, markdown rendering
- **OpenAI-compatible API** at `/v1/chat/completions` — drop-in replacement for any OpenAI client
- **Model list** at `/v1/models` — all detected GGUF models with sizes
- **Hardware stats** at `/health` — CPU, RAM, core count
- **Hot-swap models** — switch between models from the dropdown, no restart needed
## Architecture
```
Browser → ix_server.py (port 9090) → inference-x binary → .gguf model
```
IX Web spawns the IX binary per request. The model loads, generates, and exits. This means:
- **Any silicon** — the protocol routes to your hardware
- **No persistent memory** — each request is independent
- **Any model size** — from 135M to 1T parameters, if you have the RAM
## Options
```
python3 web/ix_server.py --help
--port 8080 # Custom port (default: 9090)
--host 127.0.0.1 # Bind to localhost only
--ix /path/to/inference-x # Custom binary path
--models /path/to/models # Custom model directory (repeatable)
```
## Model auto-detection
IX Web scans these directories for `.gguf` files:
1. `./models/` (repo root)
2. `~/.cache/inference-x/models/`
3. `~/models/`
4. Any path passed via `--models`
## API usage
IX Web is OpenAI-compatible. Use any client:
```python
import requests
r = requests.post("http://localhost:9090/v1/chat/completions", json={
"model": "qwen-2.5-3b",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 256
})
print(r.json()["choices"][0]["message"]["content"])
```
```bash
curl http://localhost:9090/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"Hi"}]}'
```
## Files
```
web/
├── ix_server.py # HTTP server (Python, 0 dependencies)
├── chat.html # Chat interface (single HTML file)
└── README.md # This file
```
## License
BSL-1.1 — same as Inference-X. Free for all use under $1M revenue.