03 Sep 2025
            What do people mean when they talk about test-time compute? And how is this related to so-called reasoning LLMs? In this article, I give a high-level overview of these concepts, including a discussion of DeepSeek-R1.
            
           
          
          
            
            22 Aug 2025
            Between verbalizers and per-token likelihood, why are we using causal language models for clasification tasks? This is a fair question, as classification is technically not what these models were designed for.
            
           
          
          
            
            03 May 2025
            State-space models are an interesting alternative to the ubiquitous transformer architecture, and the Mamba architecture is a clear example of years of progress, as well as potential for the future of language modeling. [COMING SOON!]