Better output from the same model. Fused computation, adaptive precision, surgical expert loading. 305 KB, 19 backends, zero dependencies. https://inference-x.com
713 B
713 B
| 1 | model | params | quant | hardware | time_ms | tok_s | tokens | quality |
|---|---|---|---|---|---|---|---|---|
| 2 | SmolLM2-135M | 135M | Q8_0 | EPYC-16T-64GB | 643 | 12.44 | 8 | GARB |
| 3 | Llama-3.2-1B | 1B | Q4_K_M | EPYC-16T-64GB | 2702 | 2.96 | 8 | OK |
| 4 | Qwen2.5-3B | 3B | Q4_K_M | EPYC-16T-64GB | 5499 | 1.45 | 8 | PASS |
| 5 | Llama-3.2-3B | 3B | Q4_K_M | EPYC-16T-64GB | 5336 | 1.49 | 8 | OK |
| 6 | Phi-3.5-mini | 3.8B | Q4_K_M | EPYC-16T-64GB | 5700 | 0 | 0 | CRASH |
| 7 | Mistral-7B | 7B | Q4_K_M | EPYC-16T-64GB | 300000 | 0 | 0 | TIMEOUT |
| 8 | Qwen2.5-7B | 7B | Q4_K_M | EPYC-16T-64GB | 300000 | 0 | 0 | TIMEOUT |
| 9 | DeepSeek-R1-7B | 7B | Q4_K_M | EPYC-16T-64GB | 300000 | 0 | 0 | TIMEOUT |
| 10 | Llama-3.1-8B | 8B | Q4_K_M | EPYC-16T-64GB | 300000 | 0 | 0 | TIMEOUT |
| 11 | Gemma-2-9B | 9B | Q4_K_M | EPYC-16T-64GB | 300000 | 0 | 0 | TIMEOUT |
| 12 | DeepSeek-R1-14B | 14B | Q4_K_M | EPYC-16T-64GB | 300000 | 0 | 0 | TIMEOUT |
| 13 | Qwen2.5-14B | 14B | Q4_K_M | EPYC-16T-64GB | 300000 | 0 | 0 | TIMEOUT |