[Submitted on 29 Oct 2025]
Understanding Polynomial-Gated Feedforward Networks: A Study of Negative Results in Transformer Architectures
View PDFAbstract:This paper presents a detailed investigation of Polynomial-Gated Feedforward Networks (PGFN), a novel variant of gated linear units that incorporates learnable polynomial activation functions. While theoretically motivated by the potential of polynomial compositions to capture higher-order interactions, our comprehensive evaluation on the FineWeb dataset reveals that PGFN underperforms established baselines, achieving a validation loss of 4.976 compared to the SwiGLU baseline of 4.9266. We provide a thorough analysis of this negative result, examining architectural considerations, training dynamics, and potential failure modes. Our work contributes valuable empirical evidence about the challenges of integrating polynomial activations in transformer feedforward networks.
Submission history
[v1] Wed, 29 Oct 2025 00:53 UTC