[Submitted on 31 Oct 2025]
Dynamic Range Gated MLP: \\ A Learnable Sigmoid Transformation for Transformer Feedforward Networks
View PDFAbstract:We present Dynamic Range Gated MLP (DRG-MLP), a novel modification to the standard transformer feedforward network that introduces learnable parameters to dynamically adjust the range of sigmoid gating. While our approach achieved a validation loss of 5.186 compared to the SwiGLU baseline of 4.927 on the FineWeb dataset using a Qwen 3 architecture, the primary contribution lies in the systematic analysis of learnable range adaptation in activation functions. We provide comprehensive ablation studies examining initialization schemes, regularization effects, and training dynamics. Although not surpassing state-of-the-art methods, our work offers insights into the challenges of adaptive gating mechanisms and establishes baseline performance for future research in this direction.
Submission history
[v1] Fri, 31 Oct 2025 18:08 UTC