[Submitted on 23 Oct 2025]
Analysis of Dynamic Activation Weighting in Transformer Networks
View PDFAbstract:This paper investigates dynamic activation weighting in transformer feedforward networks. We evaluate a dual-pathway architecture combining SiLU and GELU activations with learned weights. Experiments on an 83M parameter model show our approach achieves 5.124 validation loss, underperforming the SwiGLU baseline (4.927) while using more memory. The results suggest current implementations of dynamic weighting may not outperform simpler approaches.
Submission history
[v1] Thu, 23 Oct 2025 18:35 UTC