Comparing changes

* CI: add manual aiter prebuilt upload flow Add a workflow_dispatch GHA to build/upload aiter prebuilts using a chosen image and GPU arch list, reusing the shell script. Make ci/aiter_upload.sh handle build + package + upload with optional env-based upload, respecting GPU_ARCHS input and defaulting to gfx942;gfx950. Strip upload/packaging logic out of the CMake helper so normal builds only download/use prebuilts. * Addressed reviews Move aiter upload helper to .github/scripts, add copyright header, and use a temp gitconfig for safe.directory/commit lookup set CK_TILE_FLOAT_TO_BFLOAT16_DEFAULT, added functionality to verify remote SHA after upload Trim workflow diagnostics/cleanup, use --rm container, pass GPU_ARCHS input directly * Adressed comments * Copyright update

* Run core with all GPUs * Change n_parallel_jobs number to use all GPUs

* Update MNIST dataset loading to use 'ylecun/mnist' instead of 'mnist' for both training and testing datasets. * Update dataset loading in encoder tests to use 'nyu-mll/glue' for both training and validation datasets. --------- Co-authored-by: sugovind <sugovind@amd.com>

* Fixed setting safe.directory for aiter upload * Added EOF to docker exec * Fixed dubious ownership * Added container cleanup

…kages (ROCm#433)

* Clean up testing of outdated behavior * Minimize cumulative diff from upstream

* [CI] Skipped test_gpt_full_activation_recompute tests for gfx950 * [CI] Skipped unsupported test_basic_linear_quantized tests on gfx950 * [CI] Fixed test_numerics, test_norms, test_fused_optimizer failures for gfx950 ci enablement * [CI] Disabled gfx950 support until FP8 GEMM layout coverage is verified with hipblaslt * [CI] [gfx950] Disable cudaGraph for gemmm and grouped-gemm * Addressed reviews * [CI] Add MI355 nodes to github actions workflow * [CI] Update docker image * [CI] add MI355 runner matrix and keep matrix legs independent * Skip unstable Gemm tests on gfx950 * Addressed reviews * Guard gfx950 TN skip by ROCm version and adjust MXFP8 Dq test size * Removed ROCM7.2 guards * Reverted ROCM7.2 guards * Corrected Normalization scale_inv padding removal * Updated cast behavior, added safeguards around MXFP8 GEMM * Add partial progress * Improved guards on LayerNormMLP tests * Remove swizzle in JAX GEMM primitive * Added unique factors for sharding scales, added xfail to test * Removed old code * Added bias parameterization * Refactored test guard, added bias guard in hipblasgemm * PR comments adressed * Minor typo * Updated test per PR comments * Improve test to cover padded scale_inv for mxfp8 gemm * Address PR review comments * Added arch specific guard for FP8 GEMM config * Reformat inline comment * Minor code reformat * Remove debug statement, reformat code * Formatting * Reverted unnecessary shardy changes --------- Co-authored-by: Veera Rajasekhar Reddy Gopu <veerarajasekharreddy.gopu@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Uh oh!

Commits on Jan 26, 2026

Commits on Jan 27, 2026

Commits on Jan 28, 2026

Commits on Jan 29, 2026

Commits on Jan 30, 2026

This comparison is taking too long to generate.

Uh oh!