Better output from the same model. Fused computation, adaptive precision, surgical expert loading. 305 KB, 19 backends, zero dependencies. https://inference-x.com
14 lines
713 B
Plaintext
14 lines
713 B
Plaintext
model,params,quant,hardware,time_ms,tok_s,tokens,quality
|
|
SmolLM2-135M,135M,Q8_0,EPYC-16T-64GB,643,12.44,8,GARB
|
|
Llama-3.2-1B,1B,Q4_K_M,EPYC-16T-64GB,2702,2.96,8,OK
|
|
Qwen2.5-3B,3B,Q4_K_M,EPYC-16T-64GB,5499,1.45,8,PASS
|
|
Llama-3.2-3B,3B,Q4_K_M,EPYC-16T-64GB,5336,1.49,8,OK
|
|
Phi-3.5-mini,3.8B,Q4_K_M,EPYC-16T-64GB,5700,0,0,CRASH
|
|
Mistral-7B,7B,Q4_K_M,EPYC-16T-64GB,300000,0,0,TIMEOUT
|
|
Qwen2.5-7B,7B,Q4_K_M,EPYC-16T-64GB,300000,0,0,TIMEOUT
|
|
DeepSeek-R1-7B,7B,Q4_K_M,EPYC-16T-64GB,300000,0,0,TIMEOUT
|
|
Llama-3.1-8B,8B,Q4_K_M,EPYC-16T-64GB,300000,0,0,TIMEOUT
|
|
Gemma-2-9B,9B,Q4_K_M,EPYC-16T-64GB,300000,0,0,TIMEOUT
|
|
DeepSeek-R1-14B,14B,Q4_K_M,EPYC-16T-64GB,300000,0,0,TIMEOUT
|
|
Qwen2.5-14B,14B,Q4_K_M,EPYC-16T-64GB,300000,0,0,TIMEOUT
|