Democratizing Sandboxes

LLMs + Vector Stores

I chatted with Mr. Yujian Tang about the hot topic of vector databases. We run through a few common use cases and what it actually looks like in practice to use them.

What is on everyone's mind right now when working with LLMs?

Retrieval Augmented Generations (RAGs for short). And basically, vector stores are the key drivers of this.

What's on my mind right now?

Information retrieval vs fine-tuning vs larger context windows.

And we get into it on the pod!

Video || Spotify || Apple

Edinburgh

The Edinburgh MLOps Community talked to Helena Orihuela, MLOps Engineer, and Andy McMahon, Head of MLOps at NatWest.

NatWest is one of the leading banks in the UK. It serves millions of customers and boasts a strong presence in business banking.

With a dedicated team focused on machine learning, they have developed around 3,000 models in various stages of development and production. To ensure the successful deployment of machine learning models, NatWest follows a comprehensive process that includes data collection, configuration, verification, feature extraction, hosting, risk management, and monitoring.

Recognizing the importance of efficient machine learning operations (MLOps), NatWest embarked on its MLOps journey two years ago.

They aimed to tackle challenges related to governance, data management, and the need for faster solution delivery. To achieve these goals, NatWest has implemented standardized templates and adopted tools like AWS Service Catalog, SageMaker Studio, and Triera.

These technologies have facilitated automation and streamlined the entire machine learning lifecycle, encompassing environment setup, model development, tracking, monitoring, and even bias detection.

Video

Distributed Training: Guide for Data Scientists: This article delves into the details of distributed training. It starts by explaining the building block of any distributed system, a worker.

When faced with a complex task, we tend to build systems that split this task and distribute it among workers - united we stand, divided we conquer. This saves time and makes such a complex task doable.

Distributed training shares the same principle of spreading the training workload among the system workers. That being said, the article explains two main approaches for distributed training — data parallelism and model parallelism.

🔷 Data parallelism: As the name suggests, this approach divides the workload based on data partitions. The model is copied first into the available worker nodes and each node uses a subset of the data for training.

Next, each node computes gradients and then waits for other nodes to finish. Once all nodes have finished their training loop, gradients are aggregated and the model weights are then updated accordingly.

🔷 Model parallelism: Occasionally, the model might not fit in a single worker, especially in the era of Large Language models. To overcome this hurdle, the model is divided horizontally or vertically on different workers.

Each part is then trained concurrently with the same data.The article even goes the extra mile by providing details on the benefits of the different types of architectures (centralised or decentralised) for distributed training. Its a great read if you plan on getting your hands dirty with parallel training.
Where did all the memory go ? In this article, Jaideep Ray, shares with us his experience fine tuning LLMs. During the process, Jaideep started to fight an uphill battle with Out Of Memory errors.

After reducing the batch size and the sequence length, he still had no luck in resolving the issue. This left him with a lingering question: Where did all the memory go? After going down the rabbit hole, Jaideep came to the conclusion that the three main memory consumption sources are optimizer states, gradients, and parameters.

For example, in a Llama-7B model, these three components can consume more than 40 GB of memory. With such a huge model, approaches such as Distributed Data Parallelism (DDP) will not work, as the model parameters and weights will have to be copied in a single machine. Jaideep managed to crack the problem by implementing Fully sharded Data parallelism.

FSDP is a type of data parallelism that shards model parameters, optimizer states and gradients across DDP ranks. If you ever find yourself grappling with similar challenges, Pytorch and Deepspeed offer FSDP libraries with memory optimizations.

Unlock LLMs with programmatic labeling

Navigating the complex landscape of Large Language Models (LLMs) comes with a unique set of challenges and Weights & Biases is here to simplify that journey.

New problems require new solutions and our recently released W&B Prompts shines a light on the inner workings of LLM operations, offering interactive evaluation loops for prompt engineering experiments and comprehensive dataset and model versioning tools. Imagine the power of visual, interactive insights combined with detailed traceability, at your fingertips.

On top of Prompts, W&B seamlessly integrates with LangChain, LlamaIndex, the OpenAI API and more, enhancing prompt engineering, resource management, and fine-tuning workflows.

We're also thrilled to announce our new course, "Building LLM-Powered Apps". Dive deep into LLM application development, learning how to interact with APIs, utilize Langchain, and master W&B Prompts, all under the guidance of industry leaders.

The course is led by Darek Kleczek, MLE at Weights & Biases, with guest appearances from Shahram Anver (Co-Creator of Rebuff), Shreya Rajpal (Creator of Guardrails AI) and Anton Troynikov (Co-Founder of Chroma).

Unleash LLMagic Today!

*This post is sponsored