More About LLMs

Data scientist X ML Platforms // Jean-Michel

On this podcast, we had the pleasure of talking to Jean-Michel Daignan, a Data Scientist at Ubisoft. He talked about the ML platform that he is building at Ubisoft.

With a data science background he knows the needs and thought process of a data scientist to execute a task.

When building an ML platform the goal is to communicate the needs and requirements of the data scientists and developers. The platform should accomodate both the data science and business needs for it to be useful.

Good ML Platform
When building an ML platform, having a flexible abstraction layer is an efficient choice to think about. In case any of the tools integration or use cases change for the data scientists, it is easy to fix without affecting their workflow. Also, good documentation about the platform makes it is easy to use.

Video || Spotify || Apple

LLMs in Production

We had a great round table conversation about Large Language Models (LLMs) with our special guest speakers. Rebecca Qian, Research Engineer at Facebook AI research, David Hershey, Vice President at Unusual Ventures VC fund, Hannes Hapke, Machine Learning Engineer at Digits and James Richards, CEO and co-founder of Bountiful. It was moderated by Diego Oppenheimer, a partner at Factory HQ.

As you may know, an LLM is a compound pre-trained NLP model that can carry out a variety of NLP tasks. They are referred to as large, mainly because of the huge amount of data they are trained on, not just their size. But what constitutes large? When does it break the threashold?

In production, there are two tiers of frameworks for using these language models. There are frameworks that implement them using APIs for tasks like text completion/recommendation, email generation, etc.

The other frameworks are used for more complex use cases that require the incorporation of external information from different sources. These production frameworks are closer to engineering workloads, and they usually need the setup of infrastructure, database, integrations, etc.

In the future, we are looking to have advanced production frameworks for these models, that can allow for an increase in model complexity or enable their full automation, taking human-in-the-loop use cases like Co-Pilot a step further.

YouTube || Spotify || Apple

Understanding Machine Learning Systems by Chip Huyen.

When building an ML system the requirements vary from use case to use case. However, most systems should have these four characteristics: reliability, scalability, maintainability, and adaptability.

Developing an ML system is an iterative and, in most cases, never-ending process. Once a system is put into production, it is a continuous cycle of monitoring and updating across the different processes and steps.

From the lens of a data scientist or an ML engineer, the core step-by-step iterative processes involved in developing ML systems in production are:
Project scoping
Data Engineering
ML model development
Deployment
Monitoring and continual learning
Business analysis

Read Here