Salka Elmadani ec36668cf5 Inference-X v1.0 — Universal AI Inference Engine

Better output from the same model. Fused computation, adaptive precision,
surgical expert loading. 305 KB, 19 backends, zero dependencies.

https://inference-x.com

2026-02-23 07:10:47 +00:00

2.6 KiB

Raw Permalink Blame History

IX Web — Web Interface for Inference-X

IX Web is a self-contained web chat interface for Inference-X. It lets you talk to any AI model running on your own hardware, with a model selector, hardware stats, and an OpenAI-compatible API.

Zero dependencies. Pure Python stdlib + one HTML file. No npm, no Node.js, no frameworks.

Quickstart

# 1. Build Inference-X (from repo root)
make

# 2. Download a model
./ix download qwen-2.5-3b

# 3. Start IX Web
python3 web/ix_server.py

Open http://localhost:9090 — that's it. You have your own AI.

What you get

Chat interface at / — dark theme, model selector, typing indicator, markdown rendering
OpenAI-compatible API at /v1/chat/completions — drop-in replacement for any OpenAI client
Model list at /v1/models — all detected GGUF models with sizes
Hardware stats at /health — CPU, RAM, core count
Hot-swap models — switch between models from the dropdown, no restart needed

Architecture

Browser → ix_server.py (port 9090) → inference-x binary → .gguf model

IX Web spawns the IX binary per request. The model loads, generates, and exits. This means:

Any silicon — the protocol routes to your hardware
No persistent memory — each request is independent
Any model size — from 135M to 1T parameters, if you have the RAM

Options

python3 web/ix_server.py --help

  --port 8080              # Custom port (default: 9090)
  --host 127.0.0.1         # Bind to localhost only
  --ix /path/to/inference-x # Custom binary path
  --models /path/to/models  # Custom model directory (repeatable)

Model auto-detection

IX Web scans these directories for .gguf files:

./models/ (repo root)
~/.cache/inference-x/models/
~/models/
Any path passed via --models

API usage

IX Web is OpenAI-compatible. Use any client:

import requests

r = requests.post("http://localhost:9090/v1/chat/completions", json={
    "model": "qwen-2.5-3b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 256
})
print(r.json()["choices"][0]["message"]["content"])

curl http://localhost:9090/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"Hi"}]}'

Files

web/
├── ix_server.py   # HTTP server (Python, 0 dependencies)
├── chat.html        # Chat interface (single HTML file)
└── README.md        # This file

License

BSL-1.1 — same as Inference-X. Free for all use under $1M revenue.

2.6 KiB Raw Permalink Blame History