-
Notifications
You must be signed in to change notification settings - Fork 385
[mxfp8] support RCEIL in triton_to_mxfp8_dim0 kernel with inline PTX #3498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3498
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit dc219c2 with merge base b9e5780 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
b37e86a to
15356b8
Compare
|
@drisspg CI is green now if you want to take another look (only failing test is unrelated). I had to require sm100 for some tests, namely triton dim0 and dim1 kernels, for both rceil and floor, since they share the "calculate scale" triton jit func and just compiling it at all with the inline PTX requires sm100 now |
Summary
Reference:
ao/torchao/csrc/cuda/mx_kernels/mxfp8_quantize.cuh
Line 211 in 8555713
Tests
pytest test/prototype/mx_formats/test_kernels.py -k dim0pytest pytest test/prototype/mx_formats/test_mx_linear.pyBenchmarks