03 Sep 2025
What do people mean when they talk about test-time compute? And how is this related to so-called reasoning LLMs? In this article, I give a high-level overview of these concepts, including a discussion of DeepSeek-R1.
22 Aug 2025
Between verbalizers and per-token likelihood, why are we using causal language models for clasification tasks? This is a fair question, as classification is technically not what these models were designed for.
03 May 2025
State-space models are an interesting alternative to the ubiquitous transformer architecture, and the Mamba architecture is a clear example of years of progress, as well as potential for the future of language modeling. [COMING SOON!]