Practical writing on ML infrastructure, SRE, and Kubernetes. Opinions earned from clusters that failed, models that drifted, and runbooks that saved the day.