[Submitted on 28 Oct 2025]
Exploring Key-Value Memory Mechanisms in Feedforward Networks
View PDFAbstract:This paper presents a comprehensive investigation of key-value memory mechanisms in transformer feedforward networks (FFNs). While traditional FFNs like SwiGLU have shown strong performance, we systematically explore whether incorporating explicit memory structures can provide measurable benefits. We propose a novel KV-FFN architecture that maintains standard FFN interfaces while introducing a content-based memory mechanism with theoretical guarantees of expressivity. Through extensive experiments on the FineWeb dataset using a 134M parameter model, we achieve a validation loss of 5.161 (±0.012), compared to the SwiGLU baseline of 4.927 (±0.008). Our analysis reveals three key findings: (1) memory-based FFNs show consistent improvements over simpler alternatives (ReLU FFN: 5.432, Gated Linear Unit: 5.287), (2) careful initialization and scaling are crucial for stable training, and (3) the current implementation incurs a 25% memory overhead that could be optimized in future work.
Submission history
[v1] Tue, 28 Oct 2025 16:35 UTC