[Submitted on 2 Nov 2025]
Systematic Analysis of Sparse Polynomial Activations in Transformer Feedforward Networks
View PDFAbstract:This paper presents a thorough investigation of sparse polynomial activations for transformer feedforward networks. Our evaluation demonstrates comparable but slightly worse performance (validation loss of 4.956) than the SwiGLU baseline (4.9266), with extensive ablation studies revealing important trade-offs in activation function design.
Submission history
[v1] Sun, 2 Nov 2025 19:47 UTC