Begin simplifying CrossAttention so that it works better on the Apple Neural Engine by MatthewWaller · Pull Request #691 · huggingface/diffusers

MatthewWaller · 2022-09-30T17:00:30Z

Hi folks,

I converted this CrossAttention portion with coremltools, and it does in fact remove about 4 reshape operation and a few transposes, getting down to, 4 transposes and 4 reshapes left.

Unfortunately, it seems that is still too many to compile on the ANE.

Any ideas about what else I could do to simplify this? I took a stab at using another einsum for the attn and value matmul, but I don't think I was doing it correctly.

Update repo with main

MatthewWaller · 2022-09-30T17:01:07Z

cc: @patrickvonplaten @pcuenca

HuggingFaceDocBuilderDev · 2022-09-30T17:04:22Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2022-10-03T21:17:41Z

src/diffusers/models/attention.py

-        hidden_states = self.reshape_batch_dim_to_heads(hidden_states)
-        return hidden_states
+        batch_size, sequence_length, heads, last_dim = query.shape
+        attn = torch.einsum("bjhd,bihd->bhji", query, key)


I think we moved away from einsum for speed and ONNX-compatibility cc @NouamaneTazi @anton-l no?

In my experiments, einsum is equivalent to matmul in terms for speed, and both support jitting.
I believe we moved away from it because of some MPS compatibility issues. cc @pcuenca

MatthewWaller · 2022-10-13T20:32:30Z

Yeah, this is going to take more investigation. More experimenting has revealed that this may not be the exact pain point for ANE.

I know that einsum can cause problems for certain types. Only two versions were natively supported by coremltools for instance. This one is one of the ones that should work no problem.

But since I haven't been able to fully diagnose where the hangup is, I'll put this PR on ice.

CephalopodStudio and others added 7 commits September 30, 2022 11:42

Add initial einsum to start some simplification.

0908fdd

Merge pull request #1 from huggingface/main

49a7adb

Update repo with main

Remove commented code.

e398dde

Undo the last change

8c1500b

Merge branch 'main' of https://github.com/MatthewWaller/diffusers

c433774

Remove some changes

7f84eba

Fix conflict

5fcbf61

Reformat code for linting

1d462a1

patrickvonplaten reviewed Oct 3, 2022

View reviewed changes

MatthewWaller closed this Oct 13, 2022

PhaneeshB pushed a commit to nod-ai/diffusers that referenced this pull request Mar 1, 2023

add support for clear_all (huggingface#691)

b01f29f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Begin simplifying CrossAttention so that it works better on the Apple Neural Engine#691

Begin simplifying CrossAttention so that it works better on the Apple Neural Engine#691
MatthewWaller wants to merge 8 commits intohuggingface:mainfrom
MatthewWaller:main

MatthewWaller commented Sep 30, 2022

Uh oh!

MatthewWaller commented Sep 30, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Sep 30, 2022 •

edited

Loading

Uh oh!

patrickvonplaten Oct 3, 2022

Uh oh!

NouamaneTazi Oct 5, 2022

Uh oh!

MatthewWaller commented Oct 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

MatthewWaller commented Sep 30, 2022

Uh oh!

MatthewWaller commented Sep 30, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Sep 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten Oct 3, 2022

Choose a reason for hiding this comment

Uh oh!

NouamaneTazi Oct 5, 2022

Choose a reason for hiding this comment

Uh oh!

MatthewWaller commented Oct 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HuggingFaceDocBuilderDev commented Sep 30, 2022 •

edited

Loading