[Submitted on 30 Oct 2025]
Revisiting Adaptive Spatial Gating with Expanded Ranges: \\A Thorough Analysis of Feedforward Network Variants
View PDFAbstract:Modern transformer architectures rely heavily on feedforward networks with gating mechanisms, yet the design space of these components remains underexplored. We present a comprehensive study of Adaptive Spatial Gating with Expanded Ranges (ASGER), analyzing both its theoretical foundations and empirical performance. While ASGER's expanded gating range ($[-\alpha,1+\alpha]$) and spatial interaction components show promising theoretical properties, our rigorous evaluation reveals they underperform standard SwiGLU by 0.15 validation loss (5.08 vs 4.93) on language modeling tasks. Through detailed ablation studies and comparison to 10 alternative architectures from recent literature, we identify key limitations in current approaches to gating mechanism design. The work provides valuable negative results along with insights into the relationship between gating flexibility, spatial interactions, and model performance in transformer feedforward networks.
Submission history
[v1] Thu, 30 Oct 2025 10:39 UTC