postl.ai
← the valleylanguageimagevideothe stack

The model fleet

Everything that thinks in the attic.

Text, image and video models — quantized, mostly abliterated, benchmarked, and served entirely on the machines in the fleet. No tokens, no frames leave the valley.

Language

LLMs · served on orakel & lab
serving
GLM-5.29.1 t/s
orakel · abliterated · Q8 · ~800 GB
The flagship, uncensored, at full Q8 precision — 800 GB of weights served straight from a terabyte of RAM.
abliteratedQ8ik_llama.cpp
serving
Kimi-K2.512 t/s
orakel · abliterated · Q4_K · ~600 GB
Trillion-parameter-class reasoning, served from RAM on the 8-socket machine.
abliteratedQ4_Kfrom RAM
serving
Qwen3.5-397B14 t/s
lab · Q4_K · ~170 GB · MoE 397B/17B
The daily driver — big total, small active set, interactive speed on the workstation.
MoEQ4_Kdaily driver
bench
MiniMax-M2.7
lab · MoE
Quick, but vanilla-compliant. Benched twice and shelved — speed is not everything.
MoEevaluating
parked
GLM-5.1
target: orakel · abliterated
Wants to run; waiting on a usable Q4 quant and a fix for the degraded DSA indexer. The hardware is not the bottleneck.
parkedno Q4 yet
parked
DeepSeek-V4-Flash
queued · MoE 284B/13B · abliterated
Top roadmap candidate — smallest active set in its class, fastest thing that fits on paper.
queuedMoEabliterated

Image

text-to-image · RTX 5090
serving
FLUX.2 · 1024²
FLUX.2~6 s
lab · RTX 5090 · text-to-image
The sharp one — 1024² images, roughly 10 a minute, entirely local.
1024²~10 img/min
serving
SDXL · 1024²
SDXL~2 s
lab · RTX 5090 · text-to-image
The workhorse — batch 1024² images all day at ~30 a minute.
1024²~30 img/minbatch

Video

text-to-video · RTX 5090
serving
generated · 720p · 5s
Wan 2.2~4 min
lab · RTX 5090 · text-to-video
Quality-first — 720p, 5-second clips. Slow and gorgeous. The pretty one.
720p5s clipquality
serving
generated · 768×512 · 5s
LTX-Video~12 s
lab · RTX 5090 · text-to-video
Near real-time — 768×512, 5-second clips in seconds, not minutes. The fast one.
768×5125s clipfast

The stack

how it all runs

Serving. LLMs run on ik_llama.cpp, builds pinned at the measured sweet spot. Image & video go through a local ComfyUI pipeline on the 5090.

Sizing. The lab workstation (256 GB DDR5 + RTX 5090 + RTX 3090) takes MoE quants up to ~240 GB; anything bigger moves to orakel and its terabyte of RAM.

Selection. Abliterated variants preferred — models that answer the question they were asked. Every candidate gets benchmarked before it gets trusted.

Privacy. Every weight is LUKS-encrypted at rest. Nothing calls a cloud API. No tokens, no frames leave the valley.