Attention Is All You Need (2017) — Paper Notes
A reading note on the Transformer paper — the core ideas, why it mattered, and what to read next.
A reading note on the Transformer paper — the core ideas, why it mattered, and what to read next.
Explains cross-entropy and perplexity — the metrics used to measure how wrong a model is — with formulas, commentary, and examples.
A walkthrough of how softmax converts raw scores into probability-like values, with formulas, explanations, and examples.
A walkthrough of vectors and the dot product — with notation, explanations, and examples — covering what you need to know before reading LLM papers.
A Map of Content page for reading core LLM papers in order, starting from the Transformer.