[Submitted on 3 Nov 2025]
Improving Transformer Feedforward Networks Through Isotropy-Aware Adaptive Gating
View PDFAbstract:We present a novel isotropy-aware adaptive gating mechanism for Transformer feedforward networks. Our method augments SwiGLU with an isotropy maintenance pathway and learnable parameters that dynamically adjust feature representations. Through experiments on FineWeb, C4 and OpenWebText benchmarks across model sizes from 134M to 1.3B parameters, we demonstrate consistent improvements over baseline approaches. Statistical analysis confirms the significance of our results (p < 0.01). While introducing a 26% memory overhead, our approach maintains comparable inference speed and provides valuable insights into feature isotropy in Transformers.
Submission history
[v1] Mon, 3 Nov 2025 05:08 UTC