[Submitted on 22 Oct 2025]
Wide Gated Feedforward Networks: An Empirical Study of Complexity in Transformer Architectures
View PDFAbstract:We present a systematic investigation of Wide Gated Feedforward Networks (WGFN), exploring whether increased architectural complexity in transformer feedforward layers can improve performance. Through rigorous ablation studies on the FineWeb benchmark using a Qwen 3 architecture (83M parameters), we demonstrate that our approach achieves a validation loss of 5.008, underperforming both the SwiGLU baseline (4.9266) and state-of-the-art methods (best 4.793). Our analysis reveals important insights about the tradeoffs between architectural complexity and optimization stability in feedforward network design, confirming recent findings that simpler approaches often outperform complex ones \cite{tay2021efficient, shazeer2020glu}. The paper includes detailed experimental protocols, ablation studies, and analysis to support these conclusions.
Submission history
[v1] Wed, 22 Oct 2025 18:47 UTC