Share
Preview
tests fail
 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

IMPORTANT!!! Daylight savings hits Europe before the US. The meetup will be an hour later for the American folks this week i.e. at 10am PT.

Reading group is back again this Friday!! More details below.

Coffee Session
Tests
You may have caught this one already in the Mega-Ops newsletter that went out on Sunday. In case you missed it - Tests got a bad rap.

I hear you. They are boring. They are a waste of time.

What if I just throw out the project? I spent all this time on building so many damn tests!


As Svet put it "One of the biggest challenges we are facing is not having enough meaningful and grounded discussions about testing. Ultimately the ML folks (of course myself included) have the mindset of 'Let's deploy our model and just accept the fact that it will fail'."
Meetup
KFserving -> Kserve Tutorial

Special Guest Host

As much as I like to bag on Alexey for copying the MLOps community all the time in his DTC community, I gotta hand it to him, he is a great co-host.

Theo aka the world's biggest kubeflow enthusiast went deep on kserve last week. If you missed it Kserve is now what the KFserving project has been rebranded as.

Not only did we talk about the rebrand, he also did some live coding while Alexey grilled him about all the details.
Guest Wisdom
How to Learn
Eugene Yan, who needs no introduction, takes us through the Why?, What? and How? for Online System Design for Recommendations and Search.

After breaking down his understanding Eugene then walks us through various system designs from some of the top companies like Alibaba, Facebook, DoorDash, YouTube and standardizes them to present it to us in a digestible way.

If that wasn't enough he gives a quick guide of how to create an MVP!!!!


If you have not checked the video jump on it!
Current Meetup
MLOps at Volvo
Vruuuum!

After spending most of his career as a full-stack Data Scientist/ML Engineer, Leonard Aukea has shifted focus towards MLOps and is currently driving Machine Learning Engineering and Operations at Volvo Cars. In particular, focusing in how to effectively reap its benefits at an enterprise scale.

Leonard will introduce the Volvo Cars ML stack and related work focused on stitching these services together in order to reduce friction in the ML value stream. Furthermore, will get into some of the learnings along the way and general thoughts around what it takes to lay a solid ML foundation in a company like Volvo Cars.


Sub to our public calendar or click the button below to jump into the meetup on Wednesday at 10am PST/5pm BST

Reading Group
CI for ML
More Test Talk

The paper for this week's reading group is Building Continuous Integration Services for Machine Learning (Karlaš et al. 2020).

This paper takes applying Continuous Integration (CI) in Machine Learning a step further as it has been "a de facto standard for building industrial-strength software" (Karlaš et al. 2020).

The motivation of the paper comes from the complexity of ML testing, as every model iteration being evaluated in the holdout test set leaks information from itself, which leads to overfitting the test set, and diverging from the real model performance.

An obvious solution to this issue is to sample a new independent test set each time a new model version is implemented. However, labelled data is not cheap to collect and this can lead to a financial problem.

To mitigate this issue, Karlaš et al. 2020 proposes an ML development lifecycle where we have 3 roles:

  1. Data curator - in charge of providing new test data to this lifecycle
  2. Developer - in charge of iterating on ML models by implementing new models, training, and tuning them
  3. Manager - in charge of defining the test to be employed on the ML model (and determine its minimum quality), monitor the current available data for the test set, and the maximum number of runs a test set can have before being replaced.

In addition, Karlaš et al. 2020 provides the mathematical explanation regarding
  1. the maximum number of evaluation runs for a test set
  2. the required test set size to provide a statistical guarantee according to a certain probability that our evaluation metric estimate (i.e. accuracy, f1-score, etc.) will not be different from the real truthful performance by an error tolerance.

The authors put this into practice by demonstrating an experiment where it is possible to observe, on the one hand, that the measured test accuracy with their framework always stays within bounds of this error tolerance regarding the true test accuracy. On the other hand, the test score diverged from the true test score on another baseline approach where the test remained unchanged in subsequent model iterations.
Best of Slack
Jobs
See you in slack, youtube, and podcast land. Oh yeah, and we are also on Twitter if you like chirping birds.



Email Marketing by ActiveCampaign