Skip to content

Conversation

@danielvegamyhre
Copy link
Contributor

@danielvegamyhre danielvegamyhre commented Dec 17, 2025

Summary

  • Support RCEIL in triton_to_mxfp8_dim0 kernel with inline PTX, for consistency with other kernels. We are moving toward RCEIL as the sensible default for accuracy + perf.
  • After this change, for MoE training code, all quantization ops have an option that supports RCEIL.

Reference:

__device__ __forceinline__ e8m0_t float_to_e8m0(float val) {

Tests

  • pytest test/prototype/mx_formats/test_kernels.py -k dim0
  • pytest pytest test/prototype/mx_formats/test_mx_linear.py

Benchmarks

[danvm@devgpu031.atn1 ~/ao (rceil2)]$ cd /home/danvm/ao && PYTHONPATH=/home/danvm/ao:$PYTHONPATH conda run -n torch2 CUDA_VISIBLE_DEVICES=7 python benchmarks/mx_formats/cast_bench.py --mode dim0_mxfp8_triton_rceil
M 16384 K 16384 BLOCK_SIZE 32
GPU: NVIDIA B200
torch version: 2.11.0.dev20251216+cu128
triton version: 3.6.0
mode: dim0_mxfp8_triton_rceil
time_us 139.45600390434265
mem_bw_gbps 5834.779093183679

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3498

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit dc219c2 with merge base b9e5780 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 17, 2025
@danielvegamyhre danielvegamyhre added mx topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) labels Dec 17, 2025
@danielvegamyhre danielvegamyhre changed the title [mxfp8] support RCEIL in triton_to_mxfp8_dim0 kernel [mxfp8] support RCEIL in triton_to_mxfp8_dim0 kernel with inline PTX Dec 17, 2025
@danielvegamyhre danielvegamyhre added topic: new feature Use this tag if this PR adds a new feature and removed topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) labels Dec 18, 2025
@danielvegamyhre
Copy link
Contributor Author

danielvegamyhre commented Dec 18, 2025

@drisspg CI is green now if you want to take another look (only failing test is unrelated).

I had to require sm100 for some tests, namely triton dim0 and dim1 kernels, for both rceil and floor, since they share the "calculate scale" triton jit func and just compiling it at all with the inline PTX requires sm100 now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. mx topic: new feature Use this tag if this PR adds a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants