[Submitted on 26 Oct 2025]
Adaptive Gated Feedforward Networks: \\ Analysis of a Constrained Approach
View PDFAbstract:We present an analysis of Adaptive Gated Feedforward Networks (AGFN), a variant of gated linear units with layer-specific temperature scaling and learned output ranges. While showing promise in initial ablation studies, the final implementation achieved a validation loss of 4.931, slightly underperforming the SwiGLU baseline (4.9266) on the FineWeb benchmark. This paper examines the architectural choices, presents ablation results, and analyzes why the constraints may have limited the approach's effectiveness compared to other gating variants.
Submission history
[v1] Sun, 26 Oct 2025 09:41 UTC