[Submitted on 3 Nov 2025]
Dual-Activation Feedforward Networks with Dynamic Residual Scaling
View PDFAbstract:We investigate a novel feedforward network architecture combining SiLU and GELU activations with dynamic residual scaling. Our experiments on the FineWeb dataset using a 134M parameter model show competitive performance (validation loss 4.993 vs SwiGLU baseline 4.927). The method provides insights into activation function interactions while maintaining simplicity.
Submission history
[v1] Mon, 3 Nov 2025 20:28 UTC