BERT (2018) Paper Notes
Paper notes covering the core ideas of BERT: the bidirectional Transformer encoder, masked language model, next sentence prediction, and the fine-tuning paradigm.
Paper notes covering the core ideas of BERT: the bidirectional Transformer encoder, masked language model, next sentence prediction, and the fine-tuning paradigm.
A clear explanation of how Bloom filters work, how to use them correctly for user-ID duplicate checks during sign-up, and a concise implementation example.
A comparison of encoder-only and decoder-only architectures that distinguish the BERT and GPT families.
Explains how RNNs — the dominant architecture before Transformers — process sequences token by token, and the fundamental limitations that motivated moving beyond them.
Explains BERT's core training objective — the Masked Language Model — with formulas, commentary, and examples.