Economics of Ops

Edge MLOps // Jason McCampbell

In this wonderful session, Jason McCampbell, chief architect at Wallaroo, schooled us on the challenges of doing MLOps at the Edge.

Semiconductors are the fundamental building blocks of Edge devices. High-Performance Computing that optimally or efficiently uses the hardware, is a parallel where semiconductor optimization algorithms and ML models overlap.

Depending on whom you ask, an Edge device can mean anything from microcontrollers to servers in data centers. The way the deployment is executed is what defines it as an Edge device.

Edge ML Challenges
There are several restrictions and strict security requirements when using data for EdgeML. They can either be sensitive types of data like IP-protected data, personally identifiable information (PII), etc., or they live across different development regions or servers around the world.

This makes continuous training and monitoring of ML models on the edge a major frustration. Federated learning is one technique that is used to consolidate these kinds of data for EdgeML.

MLOps at the Edge is a bit more complicated than traditional MLOps, particularly for the Ops folks. There are a lot of constraints to think about depending on the implementation and application, like limited network connectivity, efficiently updating the models on the device, data transmission from the device, etc.

Video || Spotify || Apple

Understanding Machine Learning Systems by Chip Huyen.

Machine Learning in Research vs Production

A background in ML, either as a researcher or a traditional software engineer is two sides of a coin when understanding ML systems, which is critical in designing and developing ML systems. This is because there are major differences in the challenges that are encountered with ML in production versus research.

In research, state-of-the-art model performance on benchmark datasets is a major requirement, fast training / high throughput is a computational priority, data is static, and fairness and interpretability are often not a focus.

But in production, different stakeholders have different requirements, fast inference / low latency is a computational priority, data is constantly shifting, and fairness and interpretability must be considered.

Machine Learning Systems vs Traditional Software

For more than half a century, software engineering (SWE) has made the use of traditional software in production a success. If ML experts adopt software engineering skills, ML production would be much better.

However, the challenges between machine learning systems and traditional software are unique. With traditional software, there’s an underlying assumption that code and data are separated. In fact, things are kept as modular and separate as possible. On the contrary, ML systems are part code, part data, and part artifacts created from the two.

Also with traditional software, there is only a need to focus on testing and versioning the code. With ML, we have to test and version our data too, and that’s the hard part.

Read Here

OpenVino toolkit // Adrian Boguszewski

Adrian Boguszewski, AI Evangelist at Intel, was our guest host at the meetup. He showed us how to make our ML deployment easier with Open Visual Inference and Neural network Optimization (OpenVINO) toolkit.

OpenVINO is an open-source toolkit for optimizing and deploying AI Inference. It utilizes any hardware it runs on. It enables seamless deployment of models to production without having to build API servers or worry about hosting servers. It is also the perfect tool for running models locally and efficiently on a CPU without a GPU.
Model Optimizer
OpenVINO uses its model optimizer to convert already trained models from any ML framework (i.eTensorflow, Pytorch, ONNX, Caffe, etc.) to an Intermediate Representation (IR). IR consists of two files, an XML file with the model architecture and a binary file with the weights and biases. The model optimizer also performs some optimizations, like graph pruning and operation fusion, to increase the performance of the models while running Intel hardware.

YouTube

MLOps for Data Scientists // Wallaroo

A data scientist's job is not to eke every last bit of "accuracy" out of a model. Their job is to achieve business goals while meeting operational constraints.

One reason why data scientists might struggle with the model deployment process is that production considerations run counter to many data scientists' training, habits, and interests.

Some best practices to overcome this are; Strive for well structured code, Be mindful of the production environment, ‘Simpler is better than better’, “Faster is better than better".

The Wallaroo platform helps data scientists be more self-sufficient at working with their models in production. Data scientists can easily upload their models and specify modeling pipelines via the Wallaroo SDK, with just a few lines of python, using the notebook environment that they are most comfortable with. Once uploaded, ML Engineers can also monitor and manage the models and model pipelines via the Wallaroo API.

Using Wallaroo, data scientists will continue to have visibility into model behavior, via pipeline inference logs and advanced observability features, like Wallaroo's drift detection functionality. By enabling more data scientist self-sufficiency, and providing an intuitive space for data scientist/ML Engineer collaboration, Wallaroo supports an efficient and effective low-ops ML environment.

Learn more