How to Run a 355B Parameter MoE Model on Ancient 8-Socket CPU-Only Hardware: A Complete Chronological Optimization Guide for GLM-4.7 on Lenovo System x3950 X6
Hardware Overview Inference Engine Compilation First Attempts with BF16 Quantization Early Q8_0 Tests (Pre-BIOS Optimization) BIOS Optimization and BLAS Experiments Batch Size Sweep (with optimal 64 threads) Testing Experimental Flags (Graph Reuse & RoPE) Final Linux Tweaks (Including THP, Cache Drop, and BLAS Threading Control) Final Optimized Setup Some Real-World Benchmarks (some more Tweaks)Lockstep vs