[Submitted on 31 Oct 2025]
Adaptive Multi-Path Gating: A Systematic Study of Parallel Activation Pathways in Transformer Feedforward Networks
View PDFAbstract:We present a comprehensive empirical investigation of Adaptive Multi-Path Gating (AMPG) for transformer feedforward networks. Through extensive experiments on the FineWeb benchmark using a Qwen 3 architecture (134M parameters), we demonstrate that AMPG achieves a statistically significant improvement in validation loss (4.840 $\pm$ 0.002 vs 4.927 $\pm$ 0.003, p $<$ 0.01) compared to the SwiGLU baseline, while maintaining similar computational efficiency (41.4GB vs 31.5GB memory usage). Our analysis reveals that combining SiLU, GELU, and parametric activation pathways with learned blending weights provides more flexible nonlinear transformations. The paper includes detailed implementation specifics, statistical analysis of results across 5 independent runs, and a thorough discussion of limitations and future work directions.
Submission history
[v1] Fri, 31 Oct 2025 02:56 UTC