[Submitted on 29 Oct 2025]
Polynomial-Activated Feedforward Networks: \\ A Systematic Study of Dynamic Polynomial Gating in Transformers
View PDFAbstract:This paper presents a comprehensive investigation of Polynomial-Activated Feedforward Networks (PAFN), examining both the theoretical foundations and empirical performance of dynamic polynomial gating in transformer architectures. We introduce a carefully designed architecture featuring parallel coefficient networks with constrained initialization, achieving a 1.1\% improvement over SwiGLU (4.871 vs 4.9266) on the AardAct benchmark. Through extensive ablation studies and computational analysis, we demonstrate that polynomial activations offer a favorable trade-off between expressiveness and training stability. Our implementation adds minimal computational overhead (\textless 5\%) while providing consistent improvements across multiple random seeds (p\textless0.05). The paper includes detailed architectural specifications, complete training protocols, and an expanded discussion of limitations to facilitate reproducibility and future research.
Submission history
[v1] Wed, 29 Oct 2025 05:09 UTC