[Submitted on 4 Nov 2025]
Systematic Analysis of Isotropy-Preserving Pathways in Transformer Feedforward Networks
View PDFAbstract:This paper presents a comprehensive investigation of isotropy-preserving pathways in transformer feedforward networks. While recent work has demonstrated the effectiveness of gated architectures like SwiGLU, we conduct a systematic study of whether explicit isotropy preservation through parallel pathways offers complementary benefits. Our experiments on the FineWeb dataset with a 134M parameter model reveal that while our proposed architecture achieves a validation loss of 5.06 (compared to SwiGLU's 4.9266), the analysis provides valuable insights into pathway interactions, gradient behavior, and the tradeoffs between gating and isotropy preservation. We include extensive ablation studies, statistical analysis across multiple runs, and recommendations for future architectural innovations.
Submission history
[v1] Tue, 4 Nov 2025 17:06 UTC