- core
- MoE on NPU
- MoE from scratch
-
pre-train
- GRPO R1-zero like recurrent from scratch on Qwen
-
post-training
- SFT for R1
- Distill for R1[黑盒]
-
DeepSeek-V3 MoE from scratch
-
MTP
- MTP on Qwen
-
MLA
- [白盒]
- PPO
- GRPO
- DPO
- ppo
- reward model
- llama 源码阅读
- LoRA & PEFT
- DL