[Submitted on 31 Oct 2025]
Dynamic Hierarchical Attention Study
View PDFAbstract:This study examines Dynamic Hierarchical Attention (DHA), combining local and global attention. Results show comparable performance to baseline (4.98 vs 4.9266 loss) with higher memory usage (46GB vs 31GB).
Submission history
[v1] Fri, 31 Oct 2025 05:11 UTC