Blog
MMMU-Pro Needs an Update
Text shortcuts, image quality, and label noise in the popular multimodal evalRead More →
Implementing UL2 for Decoder-Only Language Models
An in-depth look at modeling considerationsRead More →
How does torch.compile speed up a transformer?
A case study of kernel fusion for a vision transformerRead More →
Transformer FLOPs
How to count FLOPs and why it's usefulRead More →