[Submitted on 26 Oct 2025]
Revisiting SwiGLU: An Empirical Study of Feedforward Networks in Transformers
View PDFAbstract:This paper presents a comprehensive empirical investigation of feedforward network variants in transformer language models. Through systematic ablation studies with rigorous statistical analysis, we validate the effectiveness of the SwiGLU architecture while exploring alternative gating mechanisms. Our experiments, conducted across multiple random seeds and model scales, demonstrate that while several theoretically promising modifications show potential, the original SwiGLU implementation remains remarkably robust, achieving a validation loss of 4.918 on the FineWeb dataset. We analyze the mathematical properties that contribute to SwiGLU's success, provide insights into why alternative approaches failed to provide significant improvements, and discuss implications for future architectural innovations.
Submission history
[v1] Sun, 26 Oct 2025 21:33 UTC