Jiaqi Cao

About

👋 Hi, I'm Jiaqi Cao. (Call me Max!)

I am a second-year Master’s student at Shanghai Jiao Tong University (SJTU), where I also completed my B.E. degree. I am fortunate to be supervised by Prof. Zhouhan Lin. I’ve had internships at Microsoft and Shanghai AI Lab.

🔍 Research Interests

My core research areas include:

Next-Gen LLM Architectures: Exploring latent memory mechanisms in specific.

[Memory Decoder & MLP Memory]
Continual Learning: Exploring context-to-weight mechanisms to enable continual learning in LLMs.

🤝 Let's Connect

I am always open to academic discussions or potential collaborations. Please feel free to reach out!

News

🎓 Actively seeking Job Opportunities — Graduating in March 2027. Looking for internship & full-time positions at Any LLM foundation model teams. Reach out if interested!

Jan 17, 2026	One paper(MLP Memory) is accepted by ICLR 2026 🥳
Dec 01, 2025	I will be presenting Memory Decoder at San Diego. Have a chat with me 🙌
Sep 17, 2025	One paper(Memory Decoder) is accepted by NeurIPS 2025 🥳

Selected Publications

(* indicates equal contribution)

NeurIPS 2025
Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

Jiaqi Cao^*, Jiarui Wang^*, Rubin Wei, and 4 more authors

NeurIPS 2025

PDF Code Bib

Large Language Models (LLMs) excel at general language tasks but struggle with domain adaptation. Domain Adaptive Pretraining (DAPT) is costly and suffers from catastrophic forgetting, while Retrieval-Augmented Generation (RAG) introduces substantial inference latency. We propose Memory Decoder, a pretrained, plug-and-play memory module that enables efficient domain adaptation without modifying the original model’s parameters.
@inproceedings{cao2025memorydecoder, title = {Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models}, author = {Cao, Jiaqi and Wang, Jiarui and Wei, Rubin and Guo, Qipeng and Chen, Kai and Zhou, Bowen and Lin, Zhouhan}, booktitle = {Advances in Neural Information Processing Systems}, year = {2025}, }
ICLR 2026
MLP Memory: A Retriever-Pretrained Memory for Large Language Models

Rubin Wei^*, Jiaqi Cao^*, Jiarui Wang, and 4 more authors

ICLR 2026

PDF Code Bib

We present MLP Memory, a lightweight parametric module that pretrains an MLP to imitate a kNN retriever’s behavior on the entire pretraining dataset. This creates a differentiable memory component that internalizes retrieval patterns without explicit document access, achieving 17.5% and 24.1% scaling gains on WikiText-103 and Web datasets, respectively.
@inproceedings{wei2026mlpmemory, title = {MLP Memory: A Retriever-Pretrained Memory for Large Language Models}, author = {Wei, Rubin and Cao, Jiaqi and Wang, Jiarui and Kai, Jushi and Guo, Qipeng and Zhou, Bowen and Lin, Zhouhan}, booktitle = {International Conference on Learning Representations}, year = {2026}, }