inference-x/web/README.md

# IX Web — Web Interface for Inference-X

IX Web is a self-contained web chat interface for Inference-X. It lets you talk to any AI model running on your own hardware, with a model selector, hardware stats, and an OpenAI-compatible API.

**Zero dependencies.** Pure Python stdlib + one HTML file. No npm, no Node.js, no frameworks.

## Quickstart

```bash
# 1. Build Inference-X (from repo root)
make

# 2. Download a model
./ix download qwen-2.5-3b

# 3. Start IX Web
python3 web/ix_server.py
```

Open http://localhost:9090 — that's it. You have your own AI.

## What you get

- **Chat interface** at `/` — dark theme, model selector, typing indicator, markdown rendering
- **OpenAI-compatible API** at `/v1/chat/completions` — drop-in replacement for any OpenAI client
- **Model list** at `/v1/models` — all detected GGUF models with sizes
- **Hardware stats** at `/health` — CPU, RAM, core count
- **Hot-swap models** — switch between models from the dropdown, no restart needed

## Architecture

```
Browser → ix_server.py (port 9090) → inference-x binary → .gguf model
```

IX Web spawns the IX binary per request. The model loads, generates, and exits. This means:

- **Any silicon** — the protocol routes to your hardware
- **No persistent memory** — each request is independent
- **Any model size** — from 135M to 1T parameters, if you have the RAM

## Options

```
python3 web/ix_server.py --help

  --port 8080              # Custom port (default: 9090)
  --host 127.0.0.1         # Bind to localhost only
  --ix /path/to/inference-x # Custom binary path
  --models /path/to/models  # Custom model directory (repeatable)
```

## Model auto-detection

IX Web scans these directories for `.gguf` files:

1. `./models/` (repo root)
2. `~/.cache/inference-x/models/`
3. `~/models/`
4. Any path passed via `--models`

## API usage

IX Web is OpenAI-compatible. Use any client:

```python
import requests

r = requests.post("http://localhost:9090/v1/chat/completions", json={
    "model": "qwen-2.5-3b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 256
})
print(r.json()["choices"][0]["message"]["content"])
```

```bash
curl http://localhost:9090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"Hi"}]}'
```

## Files

```
web/
├── ix_server.py   # HTTP server (Python, 0 dependencies)
├── chat.html        # Chat interface (single HTML file)
└── README.md        # This file
```

## License

BSL-1.1 — same as Inference-X. Free for all use under $1M revenue.