Better output from the same model. Fused computation, adaptive precision, surgical expert loading. 305 KB, 19 backends, zero dependencies. https://inference-x.com
27 lines
578 B
Markdown
27 lines
578 B
Markdown
# Examples
|
|
|
|
## Quick start
|
|
|
|
```bash
|
|
# Build first
|
|
make -j$(nproc)
|
|
|
|
# Hello world
|
|
./examples/hello.sh /path/to/model.gguf
|
|
|
|
# Chat with system prompt
|
|
./examples/chat.sh /path/to/model.gguf "You are a desert ecology expert."
|
|
|
|
# Benchmark
|
|
./examples/bench.sh /path/to/model.gguf 10
|
|
|
|
# Expert profiling (MoE models)
|
|
./examples/profile_experts.sh /path/to/kimi-k2.5.gguf expert_data.csv
|
|
```
|
|
|
|
## Notes
|
|
|
|
- All scripts take the model path as first argument
|
|
- Chat template is auto-detected from GGUF metadata
|
|
- Expert profiling only produces data for MoE models (Kimi K2.5, DeepSeek V3)
|