Blog

MMMU-Pro Needs an Update

Text shortcuts, image quality, and label noise in the popular multimodal evalRead More →

Implementing UL2 for Decoder-Only Language Models

An in-depth look at modeling considerationsRead More →

How does torch.compile speed up a transformer?

A case study of kernel fusion for a vision transformerRead More →

Transformer FLOPs

How to count FLOPs and why it's usefulRead More →

© Adam Casson.