ML.
← Posts

LLM Paper Evolution Map (MOC)

A Map of Content page for reading core LLM papers in order, starting from the Transformer.

SeongHwa Lee··2 min read

Purpose of This Page

This document is a hub (MOC) for reading LLM papers in sequential order. It starts with Attention Is All You Need as the entry point, and links paper notes one by one from there.

Reading Roadmap

Transformer (2017) → BERT (2018) → GPT-2 (2019) → GPT-3 (2020) → InstructGPT (2022) → LLaMA (2023) → Efficiency/Reasoning lines

Current Reading Status

  1. Attention Is All You Need (2017) — Paper Note
  2. BERT (2018) — Paper Note
  3. GPT-2 (2019) — Paper Note
  4. GPT-3 (2020) — planned
  5. InstructGPT (2022) — planned
  6. LLaMA (2023) — planned

Foundational Concept Notes

If you get stuck on terminology while reading a paper note, it helps to read the following in order first.

  1. LLM Math Basics 1: Vectors and Dot Products
  2. LLM Math Basics 2: Softmax and Probabilistic Interpretation
  3. LLM Learning Basics: Cross-Entropy and Perplexity
  4. LLM Basics: RNNs and Sequential Processing
  5. Transformer Basics: Encoder and Decoder
  6. Transformer Basics: Q, K, V Intuition
  7. Transformer Basics: Residual Connections, LayerNorm, and FFN
  8. LLM Architecture Basics: Encoder-only vs. Decoder-only
  9. LLM Learning Basics: Pre-training and Fine-tuning
  10. LLM Learning Basics: Masked Language Model

Minimum Path Before Reading GPT-2

If you want to start directly from GPT-2, you do not need to read all the foundational notes. Covering the four items below first will make it easy to follow the main thread of the GPT-2 paper note.

  1. Cross-Entropy and Perplexity
  2. Encoder-only vs. Decoder-only
  3. Pre-training and Fine-tuning
  4. MLM and encoder-only comparison in the BERT paper note

Expansion Paths by Axis

  • Base Transformer line
  • Instruction tuning line (RLHF, preference alignment)
  • Efficiency line (FlashAttention, long context)
  • Reasoning line (Chain of thought, inference scaling)

Note-Writing Template

Each paper note follows the structure below.

  • One-line summary
  • Problem definition
  • Core idea
  • Key experimental figures
  • Limitations and follow-up research
  • Next paper to read

Operating Rules

  • When a new paper is read, add its link to this MOC first.
  • Before a paper note exists, mark the entry as "planned."
  • Once writing is complete, advance the stage: seedling -> budding -> evergreen.