add pinned memory support for int8tensor #3489

liangel-02 · 2025-12-15T17:10:55Z

as title

Test

in torchao
python test/quantization/quantize_/workflows/int8/test_int8_tensor.py -k test_pin_memory

in diffusers
python -m pytest tests/quantization/torchao/test_torchao.py -k test_torch_compile_with_group_offload_leaf -s

no longer seeing

NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.is_pinned', overload='default')>, types=(<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>,), arg_types=(<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>,), kwarg_types={}` for `use_stream=True

however, still seeing the known Dynamo error

torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function linear>(*(FakeTensor(..., device='cuda:0', size=(s65, 256), dtype=torch.bfloat16), Int8Tensor(act_quant_kwargs=None, qdata=FakeTensor(..., size=(1536, 256), dtype=torch.int8), scale=FakeTensor(..., size=(1536, 1), dtype=torch.bfloat16), act_scale=None, block_size=[1, 256], shape=torch.Size([1536, 256]), device=cpu, dtype=torch.bfloat16), Parameter(FakeTensor(..., device='cuda:0', size=(1536,), dtype=torch.bfloat16, requires_grad=True))), **{}): got RuntimeError('Unhandled FakeTensor Device Propagation for aten.mm.default, found two different devices cuda:0, cpu')

from user code:
      File "/home/liangel/local/diffusers/src/diffusers/hooks/hooks.py", line 189, in torch_dynamo_resume_in_new_forward_at_188
      output = function_reference.forward(*args, **kwargs)
      File "/home/liangel/local/pytorch/torch/nn/modules/linear.py", line 134, in forward
          return F.linear(input, self.weight, self.bias)

../pytorch/torch/_subclasses/fake_tensor.py:987: TorchRuntimeError

pytorch-bot · 2025-12-15T17:10:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3489

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 5bb5818 with merge base ff6d9e2 ():

NEW FAILURE - The following job has failed:

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
test/test_low_bit_optim.py::TestFSDP2::test_fsdp2

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/quantization/quantize_/workflows/int8/test_int8_tensor.py

jerryzh168

looks great, thanks!

cc @sayakpaul please take a look as well and let us know if the current test is enough to cover the functionality of pin_memory

jerryzh168 · 2025-12-17T16:43:09Z

for the known Dynamo error, what's the plan to resolve this?

sayakpaul

This is perfect! Thanks!

I guess float already supports this?

jerryzh168 · 2025-12-18T03:06:31Z

Float8Tensor doesn't support this yet, I think we should follow up with that as well @liangel-02

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 15, 2025

liangel-02 force-pushed the int8_pin_mem branch 2 times, most recently from ab9f34d to 4133727 Compare December 15, 2025 17:21

liangel-02 marked this pull request as ready for review December 15, 2025 17:22

liangel-02 added the topic: new feature Use this tag if this PR adds a new feature label Dec 15, 2025

liangel-02 requested a review from jerryzh168 December 15, 2025 17:22

jerryzh168 reviewed Dec 17, 2025

View reviewed changes

test/quantization/quantize_/workflows/int8/test_int8_tensor.py Outdated Show resolved Hide resolved

jerryzh168 approved these changes Dec 17, 2025

View reviewed changes

add pinned memory support for int8tensor

5bb5818

liangel-02 force-pushed the int8_pin_mem branch from 4133727 to 5bb5818 Compare December 17, 2025 19:25

sayakpaul approved these changes Dec 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add pinned memory support for int8tensor #3489

add pinned memory support for int8tensor #3489

liangel-02 commented Dec 15, 2025

Uh oh!

pytorch-bot bot commented Dec 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

jerryzh168 left a comment •

edited

Loading

Uh oh!

jerryzh168 commented Dec 17, 2025

Uh oh!

sayakpaul left a comment

Uh oh!

jerryzh168 commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

add pinned memory support for int8tensor #3489

Are you sure you want to change the base?

add pinned memory support for int8tensor #3489

Conversation

liangel-02 commented Dec 15, 2025

Uh oh!

pytorch-bot bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3489

❌ 1 New Failure

Uh oh!

Uh oh!

jerryzh168 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Dec 17, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Dec 15, 2025 •

edited

Loading

jerryzh168 left a comment •

edited

Loading