A comparison of GPT-2 and BERT

In: Natural Language Generation, Research
Published on
Written from the perspective of a third-year PhD candidate in the Netherlands.

GPT-2 and BERT are two methods for creating language models, based on neural networks and deep learning. GPT-2 and BERT are fairly young, but they are 'state-of-the-art', which means they beat almost every other method in the natural language processing field.

GPT-2 and BERT are extra useable because they come with a set of pre-trained language models, which anyone can download and use. Pre-trained models have as main advantage that user don't have to train a language model from scratch, which is computationally expensive and requires a huge dataset. Instead, you can take a smaller dataset and "fine-tune" the large, pre-trained model to your specific dataset with a bit of additional training, which is much cheaper.

For my next research project, I want to use one of the "state of the art" deep learning approaches to text generation, so I read up on GPT-2 and BERT and created this overview. I hope it is helpful to other people on the internet!

Note that I just compared GPT-2 and BERT for the two languages that are relevant to my research (i.e. Dutch and English). Especially BERT has spawned a long list of alternative, spin-off models. Before you decide on a language model for your particular problem, read up on the strengths and weaknesses of a language model for specific tasks!

Quick decision table

I would use the following approach for specific task/language combinations, based on what I read about GPT-2 and BERT so far:

Task English Dutch
Natural language processing (NLP) BERT BERT (one of the BERT models for Dutch)
Natural language generation (NLG) GPT-2 It depends:
If input and output are not sensitive data:
Combine GPT-2 (English) with GoogleTranslate API (Dutch to English and vice versa)
If you only need short texts (15-token sentences) and you have bi-directional prompts:
BERT (one of the BERT models for Dutch)

A very quick introduction to GPT-2 and BERT



It seems that if you want normal left-to-right generation in English, GPT-2 is still the best way to go. BERT's main strength is NLP tasks, and the variety of languages for which a pre-trained model is available.

If you've used this overview to help you choose a language model, let me know in the comments. I'd love to hear about your NLP/NLG projects.