Writings about LLMs

Suggestions for topics to write about are welcome!

What Are Reasoning LLMs?

What do people mean when they talk about test-time compute? And how is this related to so-called reasoning LLMs? In this article, I give a high-level overview of these concepts, including a discussion of DeepSeek-R1.

How to Use LLMs for Classification Tasks

Between verbalizers and per-token likelihood, why are we using causal language models for clasification tasks? This is a fair question, as classification is technically not what these models were designed for.

State-Space Models and the Mamba Architecture

State-space models are an interesting alternative to the ubiquitous transformer architecture, and the Mamba architecture is a clear example of years of progress, as well as potential for the future of language modeling. [COMING SOON!]