Skip to content

Jzz24/LLM_Kernels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM_Kernels

A simple implementation and verification toolkit for LLM kernels.

Quantization

  • fp8 blockwise gemm
  • int8 gemm
  • w4a8 gemm(triton)
  • int4 weight pack/unpack
  • w4a16 gemm (cuda simple Marlin)
  • w4a8 gemm (cuda, simple Qserve)
  • fp4/6/8 fake quantize function

MoE

  • Multiple communication strategies (All-to-All, AllGather)
  • Group GEMM acceleration
  • Quantized Group GEMM

Attention

  • sage attention

About

MoE, Group GEMM, MHA, Quantization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published