[Submitted on 3 Nov 2025]
Dynamic Gated Gaussian Linear Units: Improving Transformer Feedforward Layers through Learnable Temperature Scaling
View PDFAbstract:We present Dynamic Gated Gaussian Linear Units (DynGEGLU), a novel modification to transformer feedforward layers that introduces learnable per-neuron temperature parameters to the gating mechanism. Through extensive experiments on the FineWeb benchmark using a 134M parameter Qwen architecture, we demonstrate consistent improvements over standard gated linear units. Our method achieves a validation loss of 4.892 (0.7\% improvement over SwiGLU baseline) while maintaining training stability. We provide comprehensive analysis of the learned temperature distributions and their impact on model performance. The approach adds minimal computational overhead while offering a simple yet effective way to enhance feedforward layers in transformer architectures.
Submission history
[v1] Mon, 3 Nov 2025 02:47 UTC