Skip to content

Conversation

@EnricoDeg
Copy link
Contributor

Proposed changes

Supported types combinations using BQuant=e8m0:

  • A=bf16
  • B=bf16,bf8,fp4

Summary:

  • remove usage of pk_fp4_raw_t: consistent with other implementations and avoid taking into account of the packed size explicitly. In general, the raw type should not be used because CK Tile internally takes care of the PackedSize, so using the raw type adds unnecessary complexity to the implementation
  • handle microscaling by checking for e8m0 type for BQuant (previous implementation was inconsistent)
  • add support for scaling instructions in DequantPack8
  • mx pipeline:
    • extend existing pipeline to support different B types
    • add support to scale and cast before writing to LDS or after reading from LDS (this can be defined in the Problem by the user)
  • block gemm:
    • mx pipeline is now using block gemm BQuant
    • block gemm BQuant can now load from LDS and apply scale and then call block gemm universal operator. This adds new functionalities and remove code duplication
  • warp gemm:
    • add case to support 128bit ds_read/write for both A and B when A=16bit and B=8bit
  • add examples and tests: note that some tests for bf16/fp4 already existed but were removed during previous tests refactoring. I added them again and other relevant tests for new types combinations

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

@EnricoDeg EnricoDeg requested review from a team and tenpercent as code owners January 30, 2026 13:59
@EnricoDeg EnricoDeg marked this pull request as draft January 30, 2026 13:59
@EnricoDeg EnricoDeg changed the title [CK_TILE] Extend support for mixed precision microscaling BQuant [CK_TILE] Extend support of mixed precision microscaling BQuant Jan 30, 2026
@EnricoDeg EnricoDeg changed the title [CK_TILE] Extend support of mixed precision microscaling BQuant [CK_TILE] Extend support of mix precision microscaling BQuant Jan 30, 2026
@EnricoDeg EnricoDeg self-assigned this Jan 30, 2026
@EnricoDeg EnricoDeg force-pushed the streamhpc/mix_prec_microscaling_bquant branch from 3e1b273 to 95c082d Compare January 30, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants