[Submitted on 29 Oct 2025]
Improving Transformer Feedforward Networks with GEGLU Activations
View PDFAbstract:This paper presents a comprehensive empirical investigation into activation functions for transformer feedforward networks, focusing on the Gated Gaussian Error Linear Unit (GEGLU). Through systematic ablation studies on a 134M parameter transformer model trained on the FineWeb dataset, we demonstrate that GEGLU achieves a statistically significant 1.09\% improvement in validation loss compared to the standard SwiGLU baseline. We further explore polynomial and sparse variants, finding that simpler implementations consistently outperform more complex alternatives. Our results suggest that GEGLU represents a low-risk, high-reward modification for transformer architectures, requiring no additional parameters or computational overhead while providing consistent performance gains. The paper includes detailed statistical analysis, implementation specifics, and a thorough discussion of limitations and future work directions.
Submission history
[v1] Wed, 29 Oct 2025 18:33 UTC