[Submitted on 31 Oct 2025]
PolyGate: Enhanced Transformer Feedforward Networks through Polynomial Composition and Expanded Gating
View PDFAbstract:We introduce PolyGate, a novel activation function that combines polynomial composition with expanded gating ranges to enhance transformer feedforward networks. Through systematic experimentation on the FineWeb benchmark, we demonstrate that PolyGate achieves a 1.4\% improvement in validation loss (4.857 vs 4.9266) over the standard SwiGLU baseline while maintaining comparable computational efficiency. Our ablation studies reveal consistent improvements across model sizes, with detailed analysis of training dynamics and gradient behavior. The paper provides complete implementation details and discusses both the strengths and limitations of our approach, offering insights for future improvements in activation function design.
Submission history
[v1] Fri, 31 Oct 2025 11:20 UTC