Model extensibility for few shot GPT-2
You have probably heard about the GPT (generative pre-trained transformer) model or read about it in news articles throughout the last year. It made a pretty big splash at launch. Let’s dig in and start with a question, “What is GPT-2 or even GPT-3?” That is a good question. Let’s get started with a few basics about what exactly a few short learning entails and work toward how it relates to GPT-2 shortly after that. You might be aware that few short learning is a method to take a well built out and defined model and use just a bit of training data to get going by taking a few shots at achieving a favorable outcome.[1] Better ways of saying that exist and have been freely shared online. You can probably find them with a little bit of digging, but I think that sets the stage to start looking at the second part of the topic at hand.
I’m very interested in both of the OpenAI GPT-2 and GPT-3 model projects.[2] They are generative pre-trained transformer models set up toward the goal of predictive language modeling. It is a really interesting set of technology that grabbed my attention and the attention of a lot of other people based on the promise/potential of what it could achieve in practice.[3] Right now the way GPT-3 is being provided you are able to functionally call the model via API deployments and it can have very interesting results. You just give it a prompt or a bit of text and it is ready to spit out content. People worry that GPT-3 could create a nearly endless stream of content that could become so pervasive that it could crowd out all other related content. This could be the first ever denial of content attack from flooding the stream. It could create content that is unique so the filters and blocking methods used only would not be able to contain it easily.
Machine learning deployments to a workflow that are wholesale based on calling either GPT-2 or GPT-3 based generative luggage model is probably not production ready at the moment. You can get some really strange and interesting things coming out of the model and unless that is the use case you really wanted it is probably not ready to spit out help desk type documentation or something that could be instantly customer facing. That might be a very bad plan indeed based on the surprising results of the model from time to time.
You can go out to the OpenAI website and start to really dig into what they are trying to do with language models. The implications of big language models are just now starting to move from research presentations to commercial implementations.[4] People are talking about them and the use cases are really starting to stack up in theory, but not 100% in practice. If that content was not enough for you to consume, then maybe go and learn a little bit about BERT as well.[5]


About 10 months ago I worked on GPT-2 with a great deal of intensity:
https://github.com/nelslindahlx/NLP/blob/master/Yet_another_working_GPT2_corpus_example.ipynb
You can still run that 10 month old GPT-2 notebook on Google’s Colab from that link. I just checked on it and the code still works. It is set to run with 1,000 training steps which will take a bit of time on your Colab instance. You could reduce the training steps if you wanted it to run faster, but that will probably make the output a little bit wilder. With just a little bit of training on my writing corpus it will spit out content that is similar to what I might very well produce. It certainly masters the general form and structure of my whining about the process of writing in only a few shots of learning. That is what makes it so interesting as a model.
Links and thoughts:
Publishing Friday into Saturday seems to be working for the WAN Show. This probably works for other people as well. Every week I seem to be watching a few minutes of Linus and Luke talking about technology
For the last few weeks I have been watching Machine Learning Street talk. This week the guest is Dr. Thomas Zahavy and they talked about meta-gradients in reinforcement learning
You can listen to this “Inesting in AI” podcast episode from Rob May right from the Tweet below (it runs about 45 minutes) and it covers the topic of synthetic data which is really something that you want to understand in more detail


Top 5 Tweets of the week:








Footnotes:
[1] Here is a Medium post that I enjoyed on the subject https://medium.com/quick-code/understanding-few-shot-learning-in-machine-learning-bede251a0f67
[2] https://github.com/openai/gpt-2 or https://openai.com/blog/better-language-models/
[3] https://dailynous.com/2020/07/30/philosophers-gpt-3/#chalmers
[4] https://openai.com/blog/better-language-models/ or read the full technical paper here https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
[5] https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
What’s next for The Lindahl Letter?
Week 11: What is ML scale? The where and the when of ML usage
Week 12: Confounding within multiple ML model deployments
Week 13: Building out your ML Ops
Week 14: My Ai4 Healthcare NYC 2019 talk revisited
Week 15: What are people really doing with machine learning?
I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed reading this content, then please take a moment and share it with a friend.