[Submitted on 1 Nov 2025]
PolyGLU: A Study of Polynomial Expansions in Transformer Feedforward Networks
View PDFAbstract:This paper presents a systematic investigation of polynomial expansions in transformer feedforward networks through the PolyGLU architecture. While building on established gated linear unit designs, we rigorously examine the practical challenges of incorporating higher-order polynomial terms. Our experiments demonstrate that fixed polynomial coefficients with L2 normalization achieve stable training, though final performance (validation loss 5.015) falls short of both the SwiGLU baseline (4.9266) and contemporary approaches. We provide detailed ablation studies and discuss why polynomial expansions underperform compared to other feedforward enhancements, offering insights for future research directions.
Submission history
[v1] Sat, 1 Nov 2025 18:17 UTC