Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: Vmjkom/TransformerEngine
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: dev
Choose a base ref
...
head repository: ROCm/TransformerEngine
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: dev
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 10 commits
  • 27 files changed
  • 6 contributors

Commits on Jan 26, 2026

  1. [CI] Automate AITER prebuilt upload flow with GitHub actions (ROCm#412)

    * CI: add manual aiter prebuilt upload flow
    
    Add a workflow_dispatch GHA to build/upload aiter prebuilts using a chosen image and GPU arch list, reusing the shell script.
    
    Make ci/aiter_upload.sh handle build + package + upload with optional env-based upload, respecting GPU_ARCHS input and defaulting to gfx942;gfx950.
    
    Strip upload/packaging logic out of the CMake helper so normal builds only download/use prebuilts.
    
    * Addressed reviews
    
    Move aiter upload helper to .github/scripts, add copyright header, and use a temp gitconfig for safe.directory/commit lookup
    
    set CK_TILE_FLOAT_TO_BFLOAT16_DEFAULT, added functionality to verify remote SHA after upload
    
    Trim workflow diagnostics/cleanup, use --rm container, pass GPU_ARCHS input directly
    
    * Adressed comments
    
    * Copyright update
    VeeraRajasekhar authored Jan 26, 2026
    Configuration menu
    Copy the full SHA
    39af98f View commit details
    Browse the repository at this point in the history
  2. CI: Serialize core sgpu test (ROCm#426)

    * Run core with all GPUs
    
    * Change n_parallel_jobs number to use all GPUs
    leo-amd authored Jan 26, 2026
    Configuration menu
    Copy the full SHA
    05707a3 View commit details
    Browse the repository at this point in the history

Commits on Jan 27, 2026

  1. HOTFIX Update JAX MNIST example's load_dataset function call (ROCm#431)

    * Update MNIST dataset loading to use 'ylecun/mnist' instead of 'mnist' for both training and testing datasets.
    
    * Update dataset loading in encoder tests to use 'nyu-mll/glue' for both training and validation datasets.
    
    ---------
    
    Co-authored-by: sugovind <sugovind@amd.com>
    sudhu2k and sugovind authored Jan 27, 2026
    Configuration menu
    Copy the full SHA
    aa8faa9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2652462 View commit details
    Browse the repository at this point in the history

Commits on Jan 28, 2026

  1. Fixed setting safe.directory in aiter-prebuilt-upload.yml (ROCm#429)

    * Fixed setting safe.directory for aiter upload
    
    * Added EOF to docker exec
    
    * Fixed dubious ownership
    
    * Added container cleanup
    VeeraRajasekhar authored Jan 28, 2026
    Configuration menu
    Copy the full SHA
    5042c37 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c0acdb9 View commit details
    Browse the repository at this point in the history

Commits on Jan 29, 2026

  1. Clean up PyTorch FA tests (ROCm#427)

    * Clean up testing of outdated behavior
    
    * Minimize cumulative diff from upstream
    Micky774 authored Jan 29, 2026
    Configuration menu
    Copy the full SHA
    3d1e089 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8dab156 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    aec00a7 View commit details
    Browse the repository at this point in the history

Commits on Jan 30, 2026

  1. Enable MXFP8 support in TE JAX integration (ROCm#424)

    * [CI] Skipped test_gpt_full_activation_recompute tests for gfx950
    
    * [CI] Skipped unsupported test_basic_linear_quantized tests on gfx950
    
    * [CI] Fixed test_numerics, test_norms, test_fused_optimizer failures for gfx950 ci enablement
    
    * [CI] Disabled gfx950 support until FP8 GEMM layout coverage is verified with hipblaslt
    
    * [CI] [gfx950] Disable cudaGraph for gemmm and grouped-gemm
    
    * Addressed reviews
    
    * [CI] Add MI355 nodes to github actions workflow
    
    * [CI] Update docker image
    
    * [CI] add MI355 runner matrix and keep matrix legs independent
    
    * Skip unstable Gemm tests on gfx950
    
    * Addressed reviews
    
    * Guard gfx950 TN skip by ROCm version and adjust MXFP8 Dq test size
    
    * Removed ROCM7.2 guards
    
    * Reverted ROCM7.2 guards
    
    * Corrected Normalization scale_inv padding removal
    
    * Updated cast behavior, added safeguards around MXFP8 GEMM
    
    * Add partial progress
    
    * Improved guards on LayerNormMLP tests
    
    * Remove swizzle in JAX GEMM primitive
    
    * Added unique factors for sharding scales, added xfail to test
    
    * Removed old code
    
    * Added bias parameterization
    
    * Refactored test guard, added bias guard in hipblasgemm
    
    * PR comments adressed
    
    * Minor typo
    
    * Updated test per PR comments
    
    * Improve test to cover padded scale_inv for mxfp8 gemm
    
    * Address PR review comments
    
    * Added arch specific guard for FP8 GEMM config
    
    * Reformat inline comment
    
    * Minor code reformat
    
    * Remove debug statement, reformat code
    
    * Formatting
    
    * Reverted unnecessary shardy changes
    
    ---------
    
    Co-authored-by: Veera Rajasekhar Reddy Gopu <veerarajasekharreddy.gopu@amd.com>
    Micky774 and VeeraRajasekhar authored Jan 30, 2026
    Configuration menu
    Copy the full SHA
    fc2caf5 View commit details
    Browse the repository at this point in the history
Loading