[Submitted on 29 Oct 2025]
Polynomial-SILU Hybrid Activation: A Stable and Expressive Alternative for Transformer Feedforward Networks
View PDFAbstract:We introduce a novel activation function for transformer feedforward networks that combines polynomial expansions with the popular SiLU activation. Our Polynomial-SILU Hybrid (PSH) learns to dynamically mix polynomial terms with SiLU through a constrained normalization scheme and adaptive mixing coefficient. Through extensive experiments on the FineWeb benchmark using a 134M parameter Qwen architecture, we demonstrate that PSH achieves consistent improvements while maintaining training stability. The method provides a simple but effective enhancement to standard feedforward layers.
Submission history
[v1] Wed, 29 Oct 2025 10:45 UTC