王二的数字花园

2026年04月29日

高维相似性查询基数估计 - PVLDB 2020

高维相似性查询基数估计：基于自适应桶探测的方法Cardinality Estimation for High Dimensional Similarity Queries with Adaptive Bucket Probing 作者：Zhonghan Chen, Qintian Guo, Ruiy...

#AI生成 #论文翻译

2026年04月29日

深入 Claude Code - AI Agent 系统设计空间

论文信息标题: Dive into Claude Code: The Design Space of Today’s and Future AI Agent Systems 作者: Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, Zhiqiang Shen (VI...

#AI生成 #论文翻译

2026年04月29日

Q&C - 量化与缓存融合的高效生成

Q&C: 当量化遇见缓存——高效生成中的技术融合标题: Q&C: When Quantization Meets Cache in Efficient Generation 作者: Xin Ding, Xin Li, Haotong Qin, Zhibo Chen (中国科学技术大学...

#AI生成 #论文翻译

2026年04月29日

RRAttention动态块稀疏注意力

RRAttention：基于逐头轮询移位的动态块稀疏注意力用于长上下文推理作者：Siran Liu¹’², Guoxia Wang¹, Sa Wang¹’²*, Jinle Zeng¹, HaoYang Xie¹, Siyu Lou¹, JiaBin Yang¹, DianHai Yu¹’², Ha...

#AI生成 #论文翻译

2026年04月29日

时间扩展的混合专家模型

时间扩展的混合专家模型作者: Zeyu Shen, Peter Henderson（普林斯顿大学）摘要混合专家（Mixture-of-Experts, MoE）模型目前在固定推理速度下扩展容量方面广受欢迎，但几乎在每个 token 都会切换专家。一旦模型超出可用 GPU 内存，这种频繁切换会使卸载...

#AI生成 #论文翻译

2026年04月29日

SWARM - 多SSD协同激活感知的KVCache卸载

SWARM: 多SSD协同激活感知的KVCache卸载作者: Tuowei Wang, Liyun Chu, Ruwen Fan, Ju Ren（清华大学） arXiv: 2603.17803 发表: FAST 2026（File and Storage Technologies）摘要键值（KV...

#AI生成 #论文翻译

2026年04月29日

向量检索的理论局限性分析 - ICLR 2026

论文信息标题: On the Theoretical Limitations of Embedding-Based Retrieval 会议: ICLR 2026 作者: Orion Weller (Google DeepMind & Johns Hopkins University), ...

#AI生成 #论文翻译

2026年04月24日

DeepSeek V4 Pro 技术报告解读

DeepSeek-AI research@deepseek.com 论文原文：DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence 模型权重：https://huggingface.co/collectio...

#AI生成 #论文解读 #DeepSeek #MoE #长上下文

2026年04月07日

Qwen3.5 Gated DeltaNet 核心机制解析

Gated DeltaNet 架构流程图graph TD %% 定义样式 classDef storage fill:#f9f,stroke:#333,stroke-width:2px,stroke-dasharray: 5 5; classDef core fill:#ff...

#AI生成 #Qwen #DeltaNet #线性注意力 #LLM架构

2026年04月04日

4万美金的H100都在等磁盘IO——Agent的真实瓶颈根本不是推理

原文链接：知乎 - Guanlan 作者：Guanlan，Runta 创始人CEO，打造 Agent 原生 Infra 🔑 存储IO瓶颈要点总结这篇文章来自 ASPLOS 2026 的 AgenticOS Workshop，核心发现令人震惊： LLM 推理时间只占端到端延迟的 30%~40%...

#AI生成 #存储 #IO #性能