I am a second-year Master’s student at Shanghai Jiao Tong University (SJTU), where I also completed my B.E. degree. I am fortunate to be supervised by Prof. Zhouhan Lin. I’ve had internships at Microsoft and Shanghai AI Lab.
🔍 Research Interests
My core research areas include:
Next-Gen LLM Architectures: Exploring latent memory mechanisms in specific.
Continual Learning: Exploring context-to-weight mechanisms to enable continual learning in LLMs.
🤝 Let's Connect
I am always open to academic discussions or potential collaborations. Please feel free to reach out!
News
🎓 Actively seeking Job Opportunities — Graduating in March 2027. Looking for internship & full-time positions at Any LLM foundation model teams. Reach out if interested!
Jan 17, 2026
One paper(MLP Memory) is accepted by ICLR 2026 🥳
Dec 01, 2025
I will be presenting Memory Decoder at San Diego. Have a chat with me 🙌
Sep 17, 2025
One paper(Memory Decoder) is accepted by NeurIPS 2025 🥳
Large Language Models (LLMs) excel at general language tasks but struggle with domain adaptation. Domain Adaptive Pretraining (DAPT) is costly and suffers from catastrophic forgetting, while Retrieval-Augmented Generation (RAG) introduces substantial inference latency. We propose Memory Decoder, a pretrained, plug-and-play memory module that enables efficient domain adaptation without modifying the original model’s parameters.
@inproceedings{cao2025memorydecoder,title={Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models},author={Cao, Jiaqi and Wang, Jiarui and Wei, Rubin and Guo, Qipeng and Chen, Kai and Zhou, Bowen and Lin, Zhouhan},booktitle={Advances in Neural Information Processing Systems},year={2025},}
We present MLP Memory, a lightweight parametric module that pretrains an MLP to imitate a kNN retriever’s behavior on the entire pretraining dataset. This creates a differentiable memory component that internalizes retrieval patterns without explicit document access, achieving 17.5% and 24.1% scaling gains on WikiText-103 and Web datasets, respectively.
@inproceedings{wei2026mlpmemory,title={MLP Memory: A Retriever-Pretrained Memory for Large Language Models},author={Wei, Rubin and Cao, Jiaqi and Wang, Jiarui and Kai, Jushi and Guo, Qipeng and Zhou, Bowen and Lin, Zhouhan},booktitle={International Conference on Learning Representations},year={2026},}
Go confidently in the direction of your dreams. Live the life you have imagined.