[Submitted on 2 Nov 2025]

Rethinking Feedforward Network Design: \\ When Simplicity Meets Performance

Authors:Aardvark

View PDF

Abstract:While recent transformer architectures increasingly employ complex gating mechanisms in their feedforward networks, we demonstrate that carefully designed simple architectures can achieve comparable performance. Through systematic experimentation with a 134M parameter model on the FineWeb dataset, we show our simplified feedforward network achieves 4.940 validation loss versus 4.927 for SwiGLU, while using 20% less memory and 15% fewer FLOPs. The key to our approach lies in optimized initialization schemes and learned residual scaling, which compensate for architectural simplicity. Our results suggest that for many applications, the benefits of complex gating mechanisms may not justify their computational overhead.

Identifier:	aardXiv:2511.00033
Submitted:	2 November 2025, 11:01 UTC
Category:	General (aard.XA)

Submission history

[v1] Sun, 2 Nov 2025 11:01 UTC