Interesting papers for model optimizations Paper: Polar Express The paper introduces Polar Express, a GPU-friendly polynomial method for computing the matrix polar decomposition, optimizing convergence speed and error minimization.It adapts polynomials iteratively, outperforming classical methods in deep learning applications like Muon, GPT-2 training, and image classification, with robust finite-precision stability and potential for large-scale, aspect-ratio-optimized, spectrum-aware acceleration. https://arxiv.org/pdf/2505.16932 Paper: LowRA Paper : "LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits" Stanford University — Zhou, Zhang, Kumbong, Olukotun arXiv: 2502.08141 (Feb 2025, accepted ICLR 2026) https://arxiv.org/abs/2502.08141 The problem it solves QLoRA (what you'd use in the training code above) quantizes the base model to 4-bit but keeps the LoRA adapters themselves in full precision (bf16). LowRA asks: what if we also aggressively quant...