The Lindahl Letter
The Lindahl Letter
Profiling Hugging Face
0:00
Current time: 0:00 / Total time: -4:44
-4:44

Profiling Hugging Face

Something is going to have to give here pretty soon for Hugging Face or they are going to get left behind by the extended model, plugin driven, and now internet connected GPT models that some of the biggest platforms are working to take mainstream. We are seeing the real time development creating extensibility into actions from companies like OpenAI and their partner Microsoft that are going to be a foundational groundwork for how connectivity works within these models. This ecosystem is going to be something so highly proprietary and interconnected between a set of foundational companies that no ability to directly open source a competitor is going to exist. Part of that will be due to the payment models that are going to get setup as a foundational services layer starts to get setup.

People have been really excited about Hugging Face for some time now. Since 2016 the private company Hugging Face has worked to share machine learning content with the world. You can easily get to some of the training courses they have set up on NLP and Deep RL [1]. They are pretty good training courses. People have really dug into models and spaces from Hugging Face to help democratize NLP models. At one point, everybody was checking in with what Hugging Face was up to and they made a very large space in the AI and ML space. You can easily go out to Google Scholar and see a bunch of academic papers that reference Hugging Face either as two words or sometimes one word [2]. I spent some time looking for papers with a decent number of citations. Here are 5 that were selected:

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., ... & Rush, A. M. (2019). Huggingface's transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771. https://arxiv.org/pdf/1910.03771

Jiang, W., Synovic, N., Hyatt, M., Schorlemmer, T. R., Sethi, R., Lu, Y. H., ... & Davis, J. C. (2023). An empirical study of pre-trained model reuse in the hugging face deep learning model registry. arXiv preprint arXiv:2303.02552. https://arxiv.org/pdf/2303.02552

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., ... & Rush, A. M. (2020, October). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations (pp. 38-45). https://aclanthology.org/2020.emnlp-demos.6.pdf 

Pfeiffer, J., Rücklé, A., Poth, C., Kamath, A., Vulić, I., Ruder, S., ... & Gurevych, I. (2020). Adapterhub: A framework for adapting transformers. arXiv preprint arXiv:2007.07779. https://arxiv.org/pdf/2007.07779 

Zhang, Y., Sun, S., Galley, M., Chen, Y. C., Brockett, C., Gao, X., ... & Dolan, B. (2019). Dialogpt: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536. https://arxiv.org/pdf/1911.00536.pdf%7D

You can get a feel for what the Hugging Face community has been working on by taking a look at their GitHub repository [3]. They have transformers, datasets, diffusers, and tools for a variety of things that people find useful. I would describe Hugging Face as a very hands-on community where you can dig in and be a part of what is going on within the space. That is the distinctive difference from some of the other larger corporations that do open source things. Most of the corporate AI labs will produce and distribute things when they are ready. Iterative tool releases and a vibrant community full of contributions are popping up all over. We are also seeing a newer trend where companies like OpenAI are releasing models and other technology via API without open sourcing the content. This pay to play API model shields the intellectual property better, but in this space people are rapidly learning from innovations and working to build alternatives.  

Here are a couple of YouTube videos including one from Hugging Face:

You can dig into the security features offered by Hugging Face [4]. It says clearly on the security page that they are SOC2 Type 1 certified with a link to the AICPA page [5]. With all sources where you might download code and use it you are going to want to understand the code, be ready to keep it current, and of course be prepared to take rapid action. They are even trying to compete with ChatGPT and share more open models for that type of effort [6]. I spent some time looking around to try to find some deep assessments of the security profile for Hugging Face and have not really found what I was looking for yet.

Footnotes:

[1] https://huggingface.co/learn/nlp-course/chapter1/1 

[2] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C6&q=%22hugging+face%22&btnG= 

[3] https://github.com/huggingface 

[4] https://huggingface.co/docs/hub/security 

[5] https://us.aicpa.org/interestareas/frc/assuranceadvisoryservices/aicpasoc1report.html 

[6] https://venturebeat.com/ai/hugging-face-launches-open-source-version-of-chatgpt-in-bid-to-battle-openai/ 

What’s next for The Lindahl Letter? 

  • Week 127: Profiling DeepMind Security

  • Week 128: Democratizing AI system security

  • Week 129: Snapchat Security

  • Week 130: Generative model security

  • Week 131: Profiling Microsoft Azure security

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.

Discussion about this episode