Writings about LLMs

Suggestions for topics to write about are welcome!

How to Use LLMs for Classification Tasks

Between verbalizers and per-token likelihood, why are we using causal language models for clasification tasks? This is a fair question, as downstream classification is technically not what these models were designed for.

State Space Models and the Mamba Architecture

State space models are an interesting alternative to the ubiquitous transformer architecture, and the Mamba architecture is a clear example of years of progress, as well as potential for the future of language modeling.

What Are Reasoning LLMs?

What do people mean when they talk about test-time compute? And how is this related to so-called reasoning LLMs? In this article, I give a high-level overview of these concepts, including a discussion of DeepSeek-R1.