Skip to content

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github May 23, 2024

Bumps auto-gptq from 0.5.0 to 0.7.1.

Release notes

Sourced from auto-gptq's releases.

v0.7.1: patch release

Support loading sharded quantized checkpoints

Sharded checkpoints can now be loaded in the from_quantized method.

Gemma GPTQ quantization

Gemma model can be quantized with AutoGPTQ.

Other changes and fixes

Full Changelog: AutoGPTQ/AutoGPTQ@v0.7.0...v0.7.1

v0.7.0: Marlin int4*fp16 kernel, AWQ checkpoints loading

Marlin efficient int4*fp16 kernel on Ampere GPUs, AWQ checkpoints loading

@​efrantar, GPTQ author, released Marlin, an optimized CUDA kernel for Ampere GPUs for int4*fp16 matrix multiplication, with per-group symmetric quantization support (without act-order), which significantly outperforms other existing kernels when using batching.

This kernel can be used in AutoGPTQ when loading models with the use_marlin=True argument. Using this flag will repack the quantized weights as the Marlin kernel expects a different layout. The repacked weight is then saved locally so as to avoid the need to repack again. Example:

import torch
from auto_gptq import AutoGPTQForCausalLM
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("TheBloke/Llama-2-13B-chat-GPTQ")
model = AutoGPTQForCausalLM.from_quantized("TheBloke/Llama-2-13B-chat-GPTQ", torch_dtype=torch.float16, use_marlin=True, device="cuda:0")
prompt = "Is quantization a good compression technique?"
inp = tokenizer(prompt, return_tensors="pt").to("cuda:0")
res = model.generate(**inp, max_new_tokens=200)
print(tokenizer.decode(res[0]))
Repacking weights to be compatible with Marlin kernel...: 100%|████████████████████████████████████████████████████████████| 566/566 [00:29<00:00, 19.17it/s]

<s> Is quantization a good compression technique?

</tr></table>

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [auto-gptq](https://github.com/PanQiWei/AutoGPTQ) from 0.5.0 to 0.7.1.
- [Release notes](https://github.com/PanQiWei/AutoGPTQ/releases)
- [Changelog](https://github.com/AutoGPTQ/AutoGPTQ/blob/main/docs/NEWS_OR_UPDATE.md)
- [Commits](AutoGPTQ/AutoGPTQ@v0.5.0...v0.7.1)

---
updated-dependencies:
- dependency-name: auto-gptq
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant