高维相似性查询基数估计 - PVLDB 2020
高维相似性查询基数估计:基于自适应桶探测的方法Cardinality Estimation for High Dimensional Similarity Queries with Adaptive Bucket Probing 作者:Zhonghan Chen, Qintian Guo, Ruiy...
高维相似性查询基数估计:基于自适应桶探测的方法Cardinality Estimation for High Dimensional Similarity Queries with Adaptive Bucket Probing 作者:Zhonghan Chen, Qintian Guo, Ruiy...
论文信息 标题: Dive into Claude Code: The Design Space of Today’s and Future AI Agent Systems 作者: Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, Zhiqiang Shen (VI...
Q&C: 当量化遇见缓存——高效生成中的技术融合标题: Q&C: When Quantization Meets Cache in Efficient Generation 作者: Xin Ding, Xin Li, Haotong Qin, Zhibo Chen (中国科学技术大学...
RRAttention:基于逐头轮询移位的动态块稀疏注意力用于长上下文推理作者:Siran Liu¹’², Guoxia Wang¹, Sa Wang¹’²*, Jinle Zeng¹, HaoYang Xie¹, Siyu Lou¹, JiaBin Yang¹, DianHai Yu¹’², Ha...
时间扩展的混合专家模型作者: Zeyu Shen, Peter Henderson(普林斯顿大学) 摘要混合专家(Mixture-of-Experts, MoE)模型目前在固定推理速度下扩展容量方面广受欢迎,但几乎在每个 token 都会切换专家。一旦模型超出可用 GPU 内存,这种频繁切换会使卸载...
SWARM: 多SSD协同激活感知的KVCache卸载作者: Tuowei Wang, Liyun Chu, Ruwen Fan, Ju Ren(清华大学) arXiv: 2603.17803 发表: FAST 2026(File and Storage Technologies) 摘要键值(KV...
论文信息 标题: On the Theoretical Limitations of Embedding-Based Retrieval 会议: ICLR 2026 作者: Orion Weller (Google DeepMind & Johns Hopkins University), ...
DeepSeek-AI research@deepseek.com 论文原文:DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence 模型权重:https://huggingface.co/collectio...
Gated DeltaNet 架构流程图graph TD %% 定义样式 classDef storage fill:#f9f,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5; classDef core fill:#ff...
原文链接:知乎 - Guanlan 作者:Guanlan,Runta 创始人CEO,打造 Agent 原生 Infra 🔑 存储IO瓶颈要点总结这篇文章来自 ASPLOS 2026 的 AgenticOS Workshop,核心发现令人震惊: LLM 推理时间只占端到端延迟的 30%~40%...