深入 Claude Code - AI Agent 系统设计空间
论文信息 标题: Dive into Claude Code: The Design Space of Today’s and Future AI Agent Systems 作者: Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, Zhiqiang Shen (VI...
论文信息 标题: Dive into Claude Code: The Design Space of Today’s and Future AI Agent Systems 作者: Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, Zhiqiang Shen (VI...
高维相似性查询基数估计:基于自适应桶探测的方法Cardinality Estimation for High Dimensional Similarity Queries with Adaptive Bucket Probing 作者:Zhonghan Chen, Qintian Guo, Ruiy...
Q&C: 当量化遇见缓存——高效生成中的技术融合标题: Q&C: When Quantization Meets Cache in Efficient Generation 作者: Xin Ding, Xin Li, Haotong Qin, Zhibo Chen (中国科学技术大学...
RRAttention:基于逐头轮询移位的动态块稀疏注意力用于长上下文推理作者:Siran Liu¹’², Guoxia Wang¹, Sa Wang¹’²*, Jinle Zeng¹, HaoYang Xie¹, Siyu Lou¹, JiaBin Yang¹, DianHai Yu¹’², Ha...
SWARM: 多SSD协同激活感知的KVCache卸载作者: Tuowei Wang, Liyun Chu, Ruwen Fan, Ju Ren(清华大学) arXiv: 2603.17803 发表: FAST 2026(File and Storage Technologies) 摘要键值(KV...
时间扩展的混合专家模型作者: Zeyu Shen, Peter Henderson(普林斯顿大学) 摘要混合专家(Mixture-of-Experts, MoE)模型目前在固定推理速度下扩展容量方面广受欢迎,但几乎在每个 token 都会切换专家。一旦模型超出可用 GPU 内存,这种频繁切换会使卸载...
论文信息 标题: On the Theoretical Limitations of Embedding-Based Retrieval 会议: ICLR 2026 作者: Orion Weller (Google DeepMind & Johns Hopkins University), ...