[Submitted on 1 Nov 2025]
Re-examining Gated Feedforward Networks
View PDFAbstract:This paper investigates dynamic scaling modifications to gated feedforward networks in transformers. Our modified architecture achieves a validation loss of 5.239 compared to the SwiGLU baseline of 4.927. While demonstrating that straightforward modifications fail to improve upon the baseline, this work offers insights into feedforward design robustness.
Submission history
[v1] Sat, 1 Nov 2025 23:33 UTC