Skip to content

Conversation

@liangel-02
Copy link
Contributor

as title

Test

in torchao
python test/quantization/quantize_/workflows/int8/test_int8_tensor.py -k test_pin_memory

in diffusers
python -m pytest tests/quantization/torchao/test_torchao.py -k test_torch_compile_with_group_offload_leaf -s

no longer seeing

NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.is_pinned', overload='default')>, types=(<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>,), arg_types=(<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>,), kwarg_types={}` for `use_stream=True

however, still seeing the known Dynamo error

torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function linear>(*(FakeTensor(..., device='cuda:0', size=(s65, 256), dtype=torch.bfloat16), Int8Tensor(act_quant_kwargs=None, qdata=FakeTensor(..., size=(1536, 256), dtype=torch.int8), scale=FakeTensor(..., size=(1536, 1), dtype=torch.bfloat16), act_scale=None, block_size=[1, 256], shape=torch.Size([1536, 256]), device=cpu, dtype=torch.bfloat16), Parameter(FakeTensor(..., device='cuda:0', size=(1536,), dtype=torch.bfloat16, requires_grad=True))), **{}): got RuntimeError('Unhandled FakeTensor Device Propagation for aten.mm.default, found two different devices cuda:0, cpu')

from user code:
      File "/home/liangel/local/diffusers/src/diffusers/hooks/hooks.py", line 189, in torch_dynamo_resume_in_new_forward_at_188
      output = function_reference.forward(*args, **kwargs)
      File "/home/liangel/local/pytorch/torch/nn/modules/linear.py", line 134, in forward
          return F.linear(input, self.weight, self.bias)

../pytorch/torch/_subclasses/fake_tensor.py:987: TorchRuntimeError

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3489

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 5bb5818 with merge base ff6d9e2 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 15, 2025
@liangel-02 liangel-02 force-pushed the int8_pin_mem branch 2 times, most recently from ab9f34d to 4133727 Compare December 15, 2025 17:21
@liangel-02 liangel-02 marked this pull request as ready for review December 15, 2025 17:22
@liangel-02 liangel-02 added the topic: new feature Use this tag if this PR adds a new feature label Dec 15, 2025
Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, thanks!

cc @sayakpaul please take a look as well and let us know if the current test is enough to cover the functionality of pin_memory

@jerryzh168
Copy link
Contributor

for the known Dynamo error, what's the plan to resolve this?

Copy link
Contributor

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perfect! Thanks!

I guess float already supports this?

@jerryzh168
Copy link
Contributor

Float8Tensor doesn't support this yet, I think we should follow up with that as well @liangel-02

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature Use this tag if this PR adds a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants