[Submitted on 29 Oct 2025]
Revisiting Polynomial Components in Transformer Feedforward Networks: \\ A Constrained Dynamic Approach
View PDFAbstract:We present a constrained dynamic polynomial approach for transformer feedforward networks, building on the established SwiGLU architecture. While our method demonstrates modest improvements (0.7\% reduction in validation loss) on the FineWeb dataset, we provide a comprehensive analysis of its limitations, computational trade-offs, and position relative to contemporary approaches. The paper includes detailed ablation studies, implementation specifics, and discusses why constrained polynomial components may offer benefits in certain scenarios despite their small empirical gains. Our analysis suggests these benefits come primarily from improved stability during early training rather than increased asymptotic performance.
Submission history
[v1] Wed, 29 Oct 2025 20:11 UTC