-
Notifications
You must be signed in to change notification settings - Fork 0
Comparing changes
Open a pull request
base repository: Vmjkom/TransformerEngine
base: dev
head repository: ROCm/TransformerEngine
compare: dev
- 10 commits
- 27 files changed
- 6 contributors
Commits on Jan 26, 2026
-
[CI] Automate AITER prebuilt upload flow with GitHub actions (ROCm#412)
* CI: add manual aiter prebuilt upload flow Add a workflow_dispatch GHA to build/upload aiter prebuilts using a chosen image and GPU arch list, reusing the shell script. Make ci/aiter_upload.sh handle build + package + upload with optional env-based upload, respecting GPU_ARCHS input and defaulting to gfx942;gfx950. Strip upload/packaging logic out of the CMake helper so normal builds only download/use prebuilts. * Addressed reviews Move aiter upload helper to .github/scripts, add copyright header, and use a temp gitconfig for safe.directory/commit lookup set CK_TILE_FLOAT_TO_BFLOAT16_DEFAULT, added functionality to verify remote SHA after upload Trim workflow diagnostics/cleanup, use --rm container, pass GPU_ARCHS input directly * Adressed comments * Copyright update
Configuration menu - View commit details
-
Copy full SHA for 39af98f - Browse repository at this point
Copy the full SHA 39af98fView commit details -
CI: Serialize core sgpu test (ROCm#426)
* Run core with all GPUs * Change n_parallel_jobs number to use all GPUs
Configuration menu - View commit details
-
Copy full SHA for 05707a3 - Browse repository at this point
Copy the full SHA 05707a3View commit details
Commits on Jan 27, 2026
-
HOTFIX Update JAX MNIST example's load_dataset function call (ROCm#431)
* Update MNIST dataset loading to use 'ylecun/mnist' instead of 'mnist' for both training and testing datasets. * Update dataset loading in encoder tests to use 'nyu-mll/glue' for both training and validation datasets. --------- Co-authored-by: sugovind <sugovind@amd.com>
Configuration menu - View commit details
-
Copy full SHA for aa8faa9 - Browse repository at this point
Copy the full SHA aa8faa9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2652462 - Browse repository at this point
Copy the full SHA 2652462View commit details
Commits on Jan 28, 2026
-
Fixed setting safe.directory in aiter-prebuilt-upload.yml (ROCm#429)
* Fixed setting safe.directory for aiter upload * Added EOF to docker exec * Fixed dubious ownership * Added container cleanup
Configuration menu - View commit details
-
Copy full SHA for 5042c37 - Browse repository at this point
Copy the full SHA 5042c37View commit details -
Configuration menu - View commit details
-
Copy full SHA for c0acdb9 - Browse repository at this point
Copy the full SHA c0acdb9View commit details
Commits on Jan 29, 2026
-
Clean up PyTorch FA tests (ROCm#427)
* Clean up testing of outdated behavior * Minimize cumulative diff from upstream
Configuration menu - View commit details
-
Copy full SHA for 3d1e089 - Browse repository at this point
Copy the full SHA 3d1e089View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8dab156 - Browse repository at this point
Copy the full SHA 8dab156View commit details -
Configuration menu - View commit details
-
Copy full SHA for aec00a7 - Browse repository at this point
Copy the full SHA aec00a7View commit details
Commits on Jan 30, 2026
-
Enable MXFP8 support in TE JAX integration (ROCm#424)
* [CI] Skipped test_gpt_full_activation_recompute tests for gfx950 * [CI] Skipped unsupported test_basic_linear_quantized tests on gfx950 * [CI] Fixed test_numerics, test_norms, test_fused_optimizer failures for gfx950 ci enablement * [CI] Disabled gfx950 support until FP8 GEMM layout coverage is verified with hipblaslt * [CI] [gfx950] Disable cudaGraph for gemmm and grouped-gemm * Addressed reviews * [CI] Add MI355 nodes to github actions workflow * [CI] Update docker image * [CI] add MI355 runner matrix and keep matrix legs independent * Skip unstable Gemm tests on gfx950 * Addressed reviews * Guard gfx950 TN skip by ROCm version and adjust MXFP8 Dq test size * Removed ROCM7.2 guards * Reverted ROCM7.2 guards * Corrected Normalization scale_inv padding removal * Updated cast behavior, added safeguards around MXFP8 GEMM * Add partial progress * Improved guards on LayerNormMLP tests * Remove swizzle in JAX GEMM primitive * Added unique factors for sharding scales, added xfail to test * Removed old code * Added bias parameterization * Refactored test guard, added bias guard in hipblasgemm * PR comments adressed * Minor typo * Updated test per PR comments * Improve test to cover padded scale_inv for mxfp8 gemm * Address PR review comments * Added arch specific guard for FP8 GEMM config * Reformat inline comment * Minor code reformat * Remove debug statement, reformat code * Formatting * Reverted unnecessary shardy changes --------- Co-authored-by: Veera Rajasekhar Reddy Gopu <veerarajasekharreddy.gopu@amd.com>
Configuration menu - View commit details
-
Copy full SHA for fc2caf5 - Browse repository at this point
Copy the full SHA fc2caf5View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff dev...dev