The model fleet
Text, image and video models — quantized, mostly abliterated, benchmarked, and served entirely on the machines in the fleet. No tokens, no frames leave the valley.
Serving. LLMs run on ik_llama.cpp, builds pinned at the measured sweet spot. Image & video go through a local ComfyUI pipeline on the 5090.
Sizing. The lab workstation (256 GB DDR5 + RTX 5090 + RTX 3090) takes MoE quants up to ~240 GB; anything bigger moves to orakel and its terabyte of RAM.
Selection. Abliterated variants preferred — models that answer the question they were asked. Every candidate gets benchmarked before it gets trusted.
Privacy. Every weight is LUKS-encrypted at rest. Nothing calls a cloud API. No tokens, no frames leave the valley.