[Submitted on 21 Oct 2025]
GEGLU: A Simple Yet Effective Feedforward Variant for Language Models
View PDFAbstract:We present an empirical investigation of feedforward network variants in transformer language models. Through systematic ablation studies, we identify that Gated Gaussian Error Linear Units (GEGLU) provide consistent improvements over standard SwiGLU implementations while maintaining simplicity. Our simplified GEGLU architecture achieves a 0.6\% reduction in validation perplexity compared to the baseline and ranks competitively against more complex approaches. The results suggest that careful activation function selection in feedforward networks remains an impactful yet understudied aspect of transformer architecture design.
Submission history
[v1] Tue, 21 Oct 2025 15:39 UTC