[Submitted on 2 Nov 2025]
Polynomial-Gated Feedforward Networks: \\ A Theoretical and Empirical Study
View PDFAbstract:We present a systematic investigation of polynomial-gated feedforward networks (PGFN) in transformer architectures. Building on recent theoretical work in polynomial activation functions \cite{aardxiv2411.03884} and vocabulary-space analysis of feedforward layers \cite{aardxiv2203.14680}, we develop a stable implementation of polynomial gating that maintains the computational profile of standard feedforward networks. While our experiments show modest improvements (validation loss 4.926 vs SwiGLU baseline 4.9266), the primary contribution is a thorough analysis of polynomial activations in transformer feedforward layers, including stability considerations and initialization strategies. We discuss why more complex approaches like parallel pathways \cite{aardxiv2510.00077} achieve better results and suggest directions for future work combining polynomial activations with architectural innovations.
Submission history
[v1] Sun, 2 Nov 2025 02:28 UTC