Model extensibility for few shot GPT-2

Apr 02, 2021

You have probably heard about the GPT (generative pre-trained transformer) model or read about it in news articles throughout the last year. It made a pretty big splash at launch. Let’s dig in and start with a question, “What is GPT-2 or even GPT-3?” That is a good question. Let’s get started with a few basics about what exactly a few short learning entails and work toward how it relates to GPT-2 shortly after that. You might be aware that few short learning is a method to take a well built out and defined model and use just a bit of training data to get going by taking a few shots at achieving a favorable outcome.[1] Better ways of saying that exist and have been freely shared online. You can probably find them with a little bit of digging, but I think that sets the stage to start looking at the second part of the topic at hand.

I’m very interested in both of the OpenAI GPT-2 and GPT-3 model projects.[2] They are generative pre-trained transformer models set up toward the goal of predictive language modeling. It is a really interesting set of technology that grabbed my attention and the attention of a lot of other people based on the promise/potential of what it could achieve in practice.[3] Right now the way GPT-3 is being provided you are able to functionally call the model via API deployments and it can have very interesting results. You just give it a prompt or a bit of text and it is ready to spit out content. People worry that GPT-3 could create a nearly endless stream of content that could become so pervasive that it could crowd out all other related content. This could be the first ever denial of content attack from flooding the stream. It could create content that is unique so the filters and blocking methods used only would not be able to contain it easily.

Machine learning deployments to a workflow that are wholesale based on calling either GPT-2 or GPT-3 based generative luggage model is probably not production ready at the moment. You can get some really strange and interesting things coming out of the model and unless that is the use case you really wanted it is probably not ready to spit out help desk type documentation or something that could be instantly customer facing. That might be a very bad plan indeed based on the surprising results of the model from time to time.

You can go out to the OpenAI website and start to really dig into what they are trying to do with language models. The implications of big language models are just now starting to move from research presentations to commercial implementations.[4] People are talking about them and the use cases are really starting to stack up in theory, but not 100% in practice. If that content was not enough for you to consume, then maybe go and learn a little bit about BERT as well.[5]

Dr. Nels Lindahl @nelslindahl

The Verge: OpenAI’s text-generating system GPT-3 is now spewing out 4.5 billion words a day. theverge.com/2021/3/29/2235… via @googlenews

theverge.comOpenAI’s text-generating system GPT-3 is now spewing out 4.5 billion words a dayOpenAI says its text-generating system GPT-3 is now being used by more than 300 companies and tens of thousands of developers, who are collectively generating more than 4.5 billion words a day. It’s an arbitrary milestone, but a clear example of the potential for AI text-generation.

About 10 months ago I worked on GPT-2 with a great deal of intensity:

https://github.com/nelslindahlx/NLP/blob/master/Yet_another_working_GPT2_corpus_example.ipynb

You can still run that 10 month old GPT-2 notebook on Google’s Colab from that link. I just checked on it and the code still works. It is set to run with 1,000 training steps which will take a bit of time on your Colab instance. You could reduce the training steps if you wanted it to run faster, but that will probably make the output a little bit wilder. With just a little bit of training on my writing corpus it will spit out content that is similar to what I might very well produce. It certainly masters the general form and structure of my whining about the process of writing in only a few shots of learning. That is what makes it so interesting as a model.

Links and thoughts:

Publishing Friday into Saturday seems to be working for the WAN Show. This probably works for other people as well. Every week I seem to be watching a few minutes of Linus and Luke talking about technology

For the last few weeks I have been watching Machine Learning Street talk. This week the guest is Dr. Thomas Zahavy and they talked about meta-gradients in reinforcement learning

You can listen to this “Inesting in AI” podcast episode from Rob May right from the Tweet below (it runs about 45 minutes) and it covers the topic of synthetic data which is really something that you want to understand in more detail

Rob May @robmay

Investing in AI episode 4 - Synthetic data

buzzsprout.comInvesting in AI Episode 4: Synthetic Data with Yashar Behzadi and Sergey Nikolenko - Investing In AIIn this episode, Yashar and Sergey from Synthesis.ai discuss the uses of synthetic data. This topic is becoming increasingly important for AI, and we discuss how it enables companies to expand smaller data sets, run AI simulations, and more.

Dr. Nels Lindahl @nelslindahl

I'm going to spend some time today looking around at the various "synthetic data" repositories on @github to get a feel for what people are doing in that space. I'll try to sort them by most recent to get a feel for the now... github.com/search?q=synth… 1,543 repository results

github.comBuild software better, togetherGitHub is where people build software. More than 56 million people use GitHub to discover, fork, and contribute to over 100 million projects.

Top 5 Tweets of the week:

Tim Scarfe @ecsquendor

Noam Chomsky just said this about gpt-3. Thoughts? (Ref youtu.be/c6MU5zQwtT4)

Dr. Nels Lindahl @nelslindahl

Right now I’m typing with my new keyboard from @CORSAIR. This mechanical keyboard seems to be working well enough. It is the Corsair K65 RGB MINI 60% smaller mechanical gaming keyboard with CHERRY MX SPEED keys. It took me a few minutes to get used to the smaller form factor.

👩‍💻 Paige Bailey @ 127.0.0.1 🏡 #BLM @DynamicWebPaige

"From today, @github will scan every commit to a public repository for exposed #PyPI API tokens. We will forward any tokens we find to PyPI, who will automatically disable them and notify their owners. The end-to-end process takes just a few seconds." 🔐🔍

Hacker News @newsycombinator

The Python Package Index is now a GitHub secret scanning integrator https://t.co/XqhjQzV7XP

Dr. Nels Lindahl @nelslindahl

The Federal Reserve Bank of Boston announces collaboration with MIT to research digital currency bostonfed.org/news-and-event… via @BostonFed

bostonfed.orgThe Federal Reserve Bank of Boston announces collaboration with MIT to research digital currency

Dr. Nels Lindahl @nelslindahl

Engadget: MIT study finds labelling errors in datasets used to test AI. engadget.com/mit-datasets-a… via @googlenews

engadget.comMIT study finds labelling errors in datasets used to test AI | EngadgetOver three percent of data in the most-cited datasets was deemed inaccurate or mislabeled.

Footnotes:

[1] Here is a Medium post that I enjoyed on the subject https://medium.com/quick-code/understanding-few-shot-learning-in-machine-learning-bede251a0f67

[2] https://github.com/openai/gpt-2 or https://openai.com/blog/better-language-models/

[3] https://dailynous.com/2020/07/30/philosophers-gpt-3/#chalmers

[4] https://openai.com/blog/better-language-models/ or read the full technical paper here https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

[5] https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270

What’s next for The Lindahl Letter?

Week 11: What is ML scale? The where and the when of ML usage
Week 12: Confounding within multiple ML model deployments
Week 13: Building out your ML Ops
Week 14: My Ai4 Healthcare NYC 2019 talk revisited
Week 15: What are people really doing with machine learning?

I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed reading this content, then please take a moment and share it with a friend.

The Lindahl Letter

Model extensibility for few shot GPT-2

Discussion about this post