Slide talk: Demystifying GPT-3

November 09, 2020 - 803 words - 5 mins
The transformer For another meeting of our reinforcement/machine learning reading group, I gave a talk on the underlying model of GPT-2 and GPT-3, the ‘Transformer’. There are two main concepts I wanted to explain: positional encoding and attention. During the talk, I found that two things were most… read more

The Johnson-Lindenstrauss lemma for the brave

December 23, 2020 - 3484 words - 18 mins
If you are interested in dimensionality reduction, chances are that you have come across the Johnson-Lindenstrauss lemma. I learned about it while studying the Linformer paper, which contains a result on dimensionality reduction for the Transformer. Essentially, they prove that self-attention is lo… read more