22 Aug 2025
Between verbalizers and per-token likelihood, why are we using causal language models for clasification tasks? This is a fair question, as downstream classification is technically not what these models were designed for.
03 Jul 2025
State space models are an interesting alternative to the ubiquitous transformer architecture, and the Mamba architecture is a clear example of years of progress, as well as potential for the future of language modeling.
15 Jun 2025
What do people mean when they talk about test-time compute? And how is this related to so-called reasoning LLMs? In this article, I give a high-level overview of these concepts, including a discussion of DeepSeek-R1.