[Submitted on 29 Oct 2025]
Adaptive Activation Blending in Transformer Feedforward Networks
View PDFAbstract:This paper investigates an adaptive activation function approach for transformer feedforward networks. We propose dynamically blending SiLU and GELU activations through per-neuron learned weights, combined with a residual connection. While our method achieves comparable performance (loss of 4.929) to the SwiGLU baseline (4.9266), statistical analysis shows no significant improvement (p > 0.05). The results suggest that simple activation blending may not provide advantages over established approaches in standard transformer architectures. We analyze the training dynamics, computational overhead, and blending behavior to provide insights into this outcome.
Submission history
[v1] Wed, 29 Oct 2025 16:13 UTC