ML Serverless GPU Infra

Clean Code for Data Scientist

Matt Sharp, data developer at Shopify, talked about writing clean code as a data scientist.

For data scientists, there's a lot of conversation about what it means to write clean code. One of the best ways to view clean code is with the question, "How easily can a person understand this piece of code?". It's a technical skill, but a lot more of a soft skill. The whole point of clean code is to enable collaboration between people. When code is clean, it's communicable.

Naming variables is probably the largest aspect of writing clean code. Reading and understanding the code is seamless if the variables are well-written and defined. The project structure is another important aspect of writing clean code. Data science projects often don't have a default scaffolding that guides every project.

Data science projects are typically developed in Jupyter notebooks, which are hard to manage. However, they present many power benefits in the development phase of the machine learning life cycle. With powerful strategies, they can be managed properly and used efficiently.

A powerful strategy is to ensure that each cell is encapsulated so that each can run on its own. This makes transitioning between a Jupyter Notebook and an actual project easy. Technically, when it comes to communicating or moving into a more foundational Python script or library, it's just a matter of taking each cell and copying it into a script file, with little adjustments to the code.

Video || Spotify || Apple

Fine Tuning LLMs

Mark Kim-Huang, Co-Founder and Head of AI at Preemo discussed efficient ways of fine-tuning Large Language Models.

Using LLMs typically involves using various open-source based models. There has been a lot of momentum to understand the best practices and when smaller traditional models are more appropriate for a use case.

Preemo makes it easier to build and fine-tune models. It democratizes the accessibility of LLMs and enables optimal shipping of AI-powered applications.

Model ownership, domain expertise, security, and privacy are critical to consider custom LLMs over close source LLMs like GPT-4 and PaLM2.

Model fine-tuning can be broken down into four different categories;
• Multitask fine-tuning, where the model performs a range of tasks. They make use of very specific datasets that encompass this ensemble of tasks.
• Few shot fine-tuning, which involves training data with few shots
for each set of examples.
• Domain-specific fine-tuning, where the models are optimized to perform particularly well on a particular subject matter.
• Prompt-based fine-tuning, which typically involves using instructions to adapt foundation models to new downstream tasks without retraining the model and updating its weights.

Prompt-based fine-tuning is the state-of-the-art approach in terms of creating the right interface to expose state-of-the-art performance on downstream tasks, typically LLMs.

When defining prompt tasks, the strategies for improving fine-tuning performance can be broken down into task definition, prompt engineering, and parameter-efficient fine-tuning.

Video

Optimising Routing and Caching ML Models

Paul Hetherington, CEO and Co-Founder of Mystic.ai shared the works on a research project being developed by Mystic.

The typical deployment system for a machine-learning model with the modern tech stack entails wrapping Python code, model, and environment in a Docker container before deploying it with some abstraction of Kubernetes. Although many deployments are executed quickly with many different environments, the system can be quite brittle and not fun or optimal in several use cases.

To improve these systems, they came up with a new set of design constraints typical in many companies. Ideally, companies expect data scientists to be able to manage and deploy all of their workloads. They produce systems for MLOps platforms or orchestration software that are completely ready for someone who knows just Python to interact with and run reliably. They take care of all the infrastructure. The use cases vary, many of which require very low latency and are introduced by scheduling, orchestration, etc.

YouTube

Query Multiple Documents with LLMs

This blog was written by Yujian Tang, Developer Advocate at Zilliz.

Large Language Models (LLMs) are popular for personal projects, but how can we use them in production? One of the immediate use cases that stands out is using an LLM like GPT to query your documents. It’s not enough to query one document; instead, we need to query multiple documents. This article shows how to use LlamaIndex and Milvus in a Jupyter Notebook for querying multiple documents.

Read Here