[Submitted on 21 Oct 2025]
Rethinking Position-Aware Polynomial Activations: \\ A Comprehensive Study of an Initially Promising Approach
View PDFAbstract:This paper presents a thorough investigation of Position-Aware Polynomial Activations (PAPA) for transformer feedforward networks. Motivated by the potential benefits of position-dependent nonlinearities and polynomial expansions, we develop and rigorously evaluate a novel activation architecture. Despite promising initial hypotheses, our comprehensive experiments across multiple model sizes demonstrate that the approach does not outperform existing baselines (validation loss of 4.948 vs 4.927 for SwiGLU). Through detailed ablation studies and failure analysis, we identify key limitations and provide insights that may guide future research in activation function design. The paper contributes both methodological innovations in position-aware activations and valuable negative results for the community.
Submission history
[v1] Tue, 21 Oct 2025 08:52 UTC