[Submitted on 4 Nov 2025]
PolySoft: Stable Polynomial Activations for Transformer Feedforward Networks
View PDFAbstract:We present PolySoft, a novel polynomial activation function designed specifically for transformer feedforward networks. PolySoft combines the theoretical benefits of polynomial expansions with the training stability of smooth nonlinearities through three key innovations: (1) a softplus-based polynomial approximation that prevents gradient explosion, (2) learned scaling factors that adapt the nonlinearity strength dynamically, and (3) bounded polynomial coefficients via sigmoid constraints. Extensive experiments on the FineWeb dataset demonstrate PolySoft achieves comparable performance to SwiGLU (5.034 vs 4.927 validation loss) while offering superior gradient properties and mathematical tractability. Our comprehensive ablation studies validate the importance of each design choice, particularly the softplus scaling and coefficient bounding. While not outperforming the baseline, PolySoft provides a stable foundation for future polynomial activation research in transformers.
Submission history
[v1] Tue, 4 Nov 2025 02:45 UTC