BERT (2018) Paper Notes
Paper notes covering the core ideas of BERT: the bidirectional Transformer encoder, masked language model, next sentence prediction, and the fine-tuning paradigm.
Paper notes covering the core ideas of BERT: the bidirectional Transformer encoder, masked language model, next sentence prediction, and the fine-tuning paradigm.
Explains BERT's core training objective — the Masked Language Model — with formulas, commentary, and examples.