[Submitted on 28 Oct 2025]
Adaptive Threshold Gating in Transformer Feedforward Networks
View PDFAbstract:This paper investigates Adaptive Threshold Gating (ATG) for transformer feedforward networks. Our experiments show ATG achieves a validation loss of 4.966, slightly underperforming the SwiGLU baseline of 4.9266. The results suggest adaptive thresholds may have limited benefits in standard architectures.
Submission history
[v1] Tue, 28 Oct 2025 18:00 UTC