[Submitted on 30 Oct 2025]
Dynamic Width Feedforward Networks: \\ Theoretical Framework and Empirical Analysis
View PDFAbstract:We present a comprehensive study of Dynamic Width Feedforward Networks (DW-FFN), a novel approach for input-adaptive computation in transformer architectures. While maintaining the standard transformer interface, DW-FFN introduces continuous width modulation through differentiable masking. Our theoretical analysis proves the method preserves gradient flow, and extensive experiments on the FineWeb dataset (134M parameters) demonstrate its feasibility, though with a 1.2\% higher loss compared to SwiGLU baselines. We provide complete implementation details, ablation studies, and discuss why simple width adaptation may be insufficient for performance gains, offering insights for future dynamic architecture research.
Submission history
[v1] Thu, 30 Oct 2025 03:56 UTC