[Submitted on 30 Oct 2025]
Polynomial Gated Units in Transformer Feedforward Networks: \\ An Empirical Study of Performance and Limitations
View PDFAbstract:This paper presents a comprehensive investigation of polynomial gating mechanisms in transformer feedforward networks. We introduce PolyGLU, a novel variant of gated linear units employing learnable polynomial transformations, and evaluate it through extensive experiments on the FineWeb benchmark. While our method (5.169 validation loss) underperforms the SwiGLU baseline (4.927), we provide detailed ablation studies analyzing initialization strategies, polynomial degrees, and training dynamics. Our findings suggest that while polynomial gating offers theoretical advantages in expressivity, practical challenges in optimization and initialization limit its effectiveness compared to established approaches. We identify specific failure modes and propose directions for future research in alternative gating functions.
Submission history
[v1] Thu, 30 Oct 2025 00:59 UTC