[Submitted on 4 Nov 2025]
When Rational Meets Polynomial: A Systematic Study of Combined Activation Functions in Transformer Feedforward Networks
View PDFAbstract:This paper presents a comprehensive study of combining rational and polynomial activation functions in transformer feedforward networks. While both activation types have shown promise individually, our systematic evaluation reveals their combination underperforms standard SwiGLU by 8\% in validation loss (5.319 vs 4.927). Through ablation studies and gradient analysis, we identify interference effects and optimization challenges as key failure modes. Our work provides concrete insights into the challenges of activation function composition in transformer architectures and establishes guidelines for future research in this direction.
Submission history
[v1] Tue, 4 Nov 2025 11:36 UTC