Skip to content

Tags: ROCm/TransformerEngine

Tags

v2.2_rocm

Toggle v2.2_rocm's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Release v2.2 cherrypicks and bugfixes megatron lm (#362)

* Ensure weight transpose is valid for FP8 training (#1596) (#276)

* Update usage of weightmat before saving for backward

* Added keep_fp8_weight_transpose_cache checks while updating transpose in fwd pass (#298)

* Added keep_fp8_weight_transpose_cache checks while updating transpose

* Added unittest for the fix

* Added comment for the unit test

* Fixed comment

* Reverted test for single iteration, added assert statements to check for transpose cache, Modified docstring

* Fixed test_numerics spacing

* Added HIP Guards

* Addressed PR Comments, and moved assertion statements under fp8 check

* Reverting assertion to fix the dev ticket

* Removed spacing

---------

Co-authored-by: Sudharshan Govindan <sugovind@amd.com>

* Bug fix for get_fp8_metas

* Added keep_fp8_transpose_cache fix for base.py

* added _fp8_metas check for None

* Added comment

---------

Co-authored-by: Sudharshan Govindan <sugovind@amd.com>

v2.1_rocm

Toggle v2.1_rocm's commit message
Fix for datasets version in JAX examples (#228)

(cherry picked from commit cc96041)

v1.14_rocm

Toggle v1.14_rocm's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[CI] deprecate praxis installation and tests

- Removed praxis installation and related test setup from `ci/jax.sh`
- Installed `flax>=0.7.1`, with typing_extensions>=4.12.2

v1.13_rocm

Toggle v1.13_rocm's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[CI] deprecate praxis installation and tests

- Removed praxis installation and related test setup from `ci/jax.sh`
- Installed `flax>=0.7.1`, with typing_extensions>=4.12.2

v1.9_rocm

Toggle v1.9_rocm's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[ROCm] backport rmsnorm triton kernels into rocm v1.9 (#169)

* [ROCm] backport rmsnorm triton kernels into rocm v1.9

* [ROCm] use single worker for CI

v1.12_rocm

Toggle v1.12_rocm's commit message
IFU release v1.12

v1.11_rocm

Toggle v1.11_rocm's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[PyTorch] Drop FA as an installation requirement (#1226) (#125)

Upstream cherry-pick 161b1d9 + partially e762592

Co-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>