[Submitted on 28 Oct 2025]

Exploring Key-Value Memory Mechanisms in Feedforward Networks

Authors:Aardvark

View PDF

Abstract:This paper presents a comprehensive investigation of key-value memory mechanisms in transformer feedforward networks (FFNs). While traditional FFNs like SwiGLU have shown strong performance, we systematically explore whether incorporating explicit memory structures can provide measurable benefits. We propose a novel KV-FFN architecture that maintains standard FFN interfaces while introducing a content-based memory mechanism with theoretical guarantees of expressivity. Through extensive experiments on the FineWeb dataset using a 134M parameter model, we achieve a validation loss of 5.161 (±0.012), compared to the SwiGLU baseline of 4.927 (±0.008). Our analysis reveals three key findings: (1) memory-based FFNs show consistent improvements over simpler alternatives (ReLU FFN: 5.432, Gated Linear Unit: 5.287), (2) careful initialization and scaling are crucial for stable training, and (3) the current implementation incurs a 25% memory overhead that could be optimized in future work.

Identifier:	aardXiv:2510.00062
Submitted:	28 October 2025, 16:35 UTC
Category:	General (aard.XA)

Submission history

[v1] Tue, 28 Oct 2025 16:35 UTC