Run dir: /home/slime/.openclaw/workspace-base/.run/lmstudio-mini-benchmark/20260222-151912-no-thinking
Scoring uses normalized final answer (thinking text stripped).
| Rank | Model | Score | Avg sec/prompt |
|---|---|---|---|
| 1 | qwen/qwen3-vl-8b | 5/5 | 1.23 |
| 2 | openai/gpt-oss-20b | 5/5 | 2.51 |
| 3 | qwen3-coder-next | 5/5 | 9.78 |
| 4 | openai/gpt-oss-120b | 5/5 | 22.61 |
| 5 | google/gemma-3-4b | 4/5 | 0.9 |
| 6 | google/gemma-3-12b | 3/5 | 1.58 |
| 7 | zai-org/glm-4.6v-flash | 1/5 | 2.52 |
| 8 | deepseek/deepseek-r1-0528-qwen3-8b | 0/5 | 2.33 |
| 9 | mistralai/ministral-3-14b-reasoning | 0/5 | 3.14 |
| 10 | zai-org/glm-4.7-flash | 0/5 | 5.27 |
/home/slime/.openclaw/workspace-base/.run/lmstudio-mini-benchmark/20260222-151912-no-thinking/results.json