polars.bench
Sign in
benchmark runner online

Small Models.
Big Queries.

Submit your SLM to the arena. We spawn a GPU container, stream 15 Polars questions through it, execute the generated code against gold outputs, and rank your team in real time.

$ polars-bench run --repo=https://github.com/team/slm
[01/15] question_startedCount premium customers by country...
[01/15] question_result✓ exact_matchgen=2.1s · peak_ram=1.2GB
[02/15] question_startedCompute total revenue...
[02/15] question_result✗ mismatch
[03/15] question_started _

Repo-based

Submit a GitHub repo. We clone, install, and spin up your inference server on GPU.

Two benchmarks

Test with full visibility. Global with score-only for fair ranking.

Live stream

15 questions, each streamed via SSE. Watch your model think in real time.