Models — postl.ai

Language

LLMs · served on orakel & lab

serving

GLM-5.29.1 t/s

orakel · abliterated · Q8 · ~800 GB

The flagship, uncensored, at full Q8 precision — 800 GB of weights served straight from a terabyte of RAM.

abliteratedQ8ik_llama.cpp

serving

Kimi-K2.512 t/s

orakel · abliterated · Q4_K · ~600 GB

Trillion-parameter-class reasoning, served from RAM on the 8-socket machine.

abliteratedQ4_Kfrom RAM

serving

Qwen3.5-397B14 t/s

lab · Q4_K · ~170 GB · MoE 397B/17B

The daily driver — big total, small active set, interactive speed on the workstation.

MoEQ4_Kdaily driver

bench

MiniMax-M2.7

lab · MoE

Quick, but vanilla-compliant. Benched twice and shelved — speed is not everything.

MoEevaluating

parked

GLM-5.1

target: orakel · abliterated

Wants to run; waiting on a usable Q4 quant and a fix for the degraded DSA indexer. The hardware is not the bottleneck.

parkedno Q4 yet

parked

DeepSeek-V4-Flash

queued · MoE 284B/13B · abliterated

Top roadmap candidate — smallest active set in its class, fastest thing that fits on paper.

queuedMoEabliterated

Image

text-to-image · RTX 5090

serving

FLUX.2 · 1024²

FLUX.2~6 s

lab · RTX 5090 · text-to-image

The sharp one — 1024² images, roughly 10 a minute, entirely local.

1024²~10 img/min

serving

SDXL · 1024²

SDXL~2 s

lab · RTX 5090 · text-to-image

The workhorse — batch 1024² images all day at ~30 a minute.

1024²~30 img/minbatch

Video

text-to-video · RTX 5090

serving

generated · 720p · 5s

Wan 2.2~4 min

lab · RTX 5090 · text-to-video

Quality-first — 720p, 5-second clips. Slow and gorgeous. The pretty one.

720p5s clipquality

serving

generated · 768×512 · 5s

LTX-Video~12 s

lab · RTX 5090 · text-to-video

Near real-time — 768×512, 5-second clips in seconds, not minutes. The fast one.

768×5125s clipfast

The stack

how it all runs

Serving. LLMs run on ik_llama.cpp, builds pinned at the measured sweet spot. Image & video go through a local ComfyUI pipeline on the 5090.

Sizing. The lab workstation (256 GB DDR5 + RTX 5090 + RTX 3090) takes MoE quants up to ~240 GB; anything bigger moves to orakel and its terabyte of RAM.

Selection. Abliterated variants preferred — models that answer the question they were asked. Every candidate gets benchmarked before it gets trusted.

Privacy. Every weight is LUKS-encrypted at rest. Nothing calls a cloud API. No tokens, no frames leave the valley.

Everything that thinks in the attic.

Language

Image

Video

The stack