issue/834: add paged attention for nvidia gpu #836

spike-zhu · 2025-12-23T12:28:20Z

python 测试截图：

PanZezhong1725 · 2025-12-24T01:00:15Z

include/infiniop/ops/paged_attention.h

+ * This function initializes a descriptor that holds all the metadata needed
+ * for the paged attention computation.
+ *
+ * @param handle The handle to the InfiniOP library context.


请在描述中标出各张量的形状和含义

PanZezhong1725 · 2025-12-24T01:06:20Z

src/infiniop/ops/paged_caching/cuda/kernel.cuh

+    // ----- Input Tensors -----
+    const Tdata *k_ptr,              // Pointer to the source Keys, shape [ntok, nkvh, dh]
+    const Tdata *v_ptr,              // Pointer to the source Values, shape [ntok, nkvh, dh]
+    const int32_t *slot_mapping_ptr, // Pointer to the slot mapping, shape [ntok]


改成int64，或者最好使用template

PanZezhong1725 · 2025-12-24T01:16:16Z

src/infiniop/ops/paged_attention/cuda/kernel.cuh

+    const Tdata *q_,
+    const Tdata *k_cache_,
+    const Tdata *v_cache_,
+    const int32_t *block_tables_,


用template或者更大的数据类型，int32容易溢出

PanZezhong1725 · 2025-12-24T01:18:10Z

src/infiniop/ops/paged_attention/info.h

+        float scale) {
+
+        auto dtype = q_desc->dtype();
+        CHECK_DTYPE(dtype, INFINI_DTYPE_F16, INFINI_DTYPE_BF16, INFINI_DTYPE_F32);


缺少block_tables_desc、seq_lens_desc的类型检查

PanZezhong1725 · 2025-12-24T01:19:32Z

src/infiniop/ops/paged_attention/nvidia/paged_attention_nvidia.cu

+    void *stream_) const {
+    cudaStream_t stream = (cudaStream_t)stream_;
+    if (_opaque->internal->maxThreadsPerBlock() == CUDA_BLOCK_SIZE_1024) {
+        if (_info.head_size == 128) {


如果计算时对head dim有要求，请在创建算子的描述时做检查

PanZezhong1725 · 2025-12-24T09:59:01Z

src/infiniop/ops/paged_attention/cuda/kernel.cuh

+    const int seq_idx = blockIdx.y;
+    const int head_idx = blockIdx.x;
+    const int num_heads = gridDim.x;
+    const ptrdiff_t o_stride = q_stride / 3; // qkv


这里也要改吧

issue/834 - feat: add paged attention for nvidia gpu

84df02d

spike-zhu requested a review from PanZezhong1725 December 23, 2025 12:28

spike-zhu self-assigned this Dec 23, 2025

spike-zhu added 准备好了模块：算子 labels Dec 23, 2025

PanZezhong1725 requested review from ma-hang and pengcheng888 December 23, 2025 12:30

PanZezhong1725 requested changes Dec 24, 2025

View reviewed changes

fix: adjust o_stirde get method

4214829

PanZezhong1725 requested changes Dec 24, 2025

View reviewed changes

fix: update head_dim set message

9695b7f

PanZezhong1725 force-pushed the issue/834 branch from fb9bd9c to 9695b7f Compare December 25, 2025 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

issue/834: add paged attention for nvidia gpu #836

issue/834: add paged attention for nvidia gpu #836

Uh oh!

spike-zhu commented Dec 23, 2025

Uh oh!

PanZezhong1725 Dec 24, 2025

Uh oh!

PanZezhong1725 Dec 24, 2025

Uh oh!

PanZezhong1725 Dec 24, 2025

Uh oh!

PanZezhong1725 Dec 24, 2025

Uh oh!

PanZezhong1725 Dec 24, 2025

Uh oh!

PanZezhong1725 Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

issue/834: add paged attention for nvidia gpu #836

Are you sure you want to change the base?

issue/834: add paged attention for nvidia gpu #836

Uh oh!

Conversation

spike-zhu commented Dec 23, 2025

Uh oh!

PanZezhong1725 Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

PanZezhong1725 Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

PanZezhong1725 Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

PanZezhong1725 Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

PanZezhong1725 Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

PanZezhong1725 Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants