[Submitted on 25 Oct 2025]
Quadratic Gated Feedforward Networks: \\ Exploring Quadratic Interactions in Transformer Layers
View PDFAbstract:We investigate Quadratic Gated Feedforward Networks (QGFN), a variant of Transformer feedforward layers that combines gated linear units with element-wise quadratic feature interactions. While modern architectures predominantly use gated linear units (GLUs), we explore whether carefully designed quadratic interactions could provide complementary benefits. Our method introduces a parallel quadratic pathway that interacts with the standard GLU pathway through a learned scalar mixing coefficient. On the FineWeb dataset using a Qwen-style architecture, QGFN achieved a validation loss of 4.940 compared to the SwiGLU baseline's 4.927, showing comparable but not superior performance. The quadratic pathway increased memory usage by approximately 30\% while providing limited empirical benefit in our experiments. We analyze the tradeoffs of this approach and discuss implications for future architecture design, particularly noting that the quadratic interactions may require different formulations to yield significant improvements.
Submission history
[v1] Sat, 25 Oct 2025 22:28 UTC