Alignment

Llama 2 (2023) Paper Notes

Paper notes on Llama 2 covering its commercial license, a base model scaled to 2T tokens with 4k context and GQA, and the Llama 2-Chat alignment pipeline (SFT → two reward models → iterative RLHF with rejection sampling + PPO → GAtt). It fuses LLaMA's open base lineage with InstructGPT's RLHF at production quality.

llm paper-reading transformer llama-2 rlhf alignment open-source meta

2026년 7월 13일

InstructGPT (2022) Paper Notes

Paper notes on InstructGPT covering its core method — the three-step RLHF recipe (SFT → reward model → PPO) — the alignment result where a 1.3B model beats 175B GPT-3, gains in truthfulness and toxicity, the alignment tax and PPO-ptx, and the limitations.

llm paper-reading transformer instructgpt rlhf alignment openai

2026년 6월 29일