[Submitted on 25 Oct 2025]
Orthogonal Initialization in Transformer Feedforward Networks: A Systematic Study
View PDFAbstract:This paper presents a systematic investigation of orthogonal initialization in transformer feedforward networks. Through extensive experiments on the FineWeb benchmark using a Qwen 3 architecture (83M parameters), we demonstrate that careful initialization of feedforward projections can lead to modest improvements in model performance. Our approach achieves a mean validation loss of 4.926 (±0.001) across 5 runs, slightly outperforming the SwiGLU baseline (4.927 ±0.001). While the improvement is small, our analysis provides insights into the role of initialization in transformer optimization and suggests directions for future research.
Submission history
[v1] Sat, 25 Oct 2025 17:49 UTC