SRE Masterclass for ML

But First, Inefficiencies

Reading code has not been optimized for. Think about how much time you spend trying to figure out what is going on even on stuff you yourself have written.

Our guest Tudor talked about how our eyes should not be the data mining tools, we have software for that. His team has been researching and implementing an open source tool he showed us called Glamorous Toolkit.

At first glance, it may seem like a fancy notebook. Just wait till you see it in action. I hope you never read code the same.

SRE Masterclass

This week, we had a master of SRE join us. You might think we use that term too liberally, but I'm not exaggerating this time. Niall Murphy, an MLOps Community member, ***literally*** wrote the book about site reliability engineering based on his lessons learned from pioneering the field at Google.

Niall, who most recently was the Global Head of SRE for Azure, shared with the fundamental principles of SRE, how they fit with MLOps, and how team and organizations can best infuse these ideas into making ML products highly reliable in production.

I highly recommend anyone building ML teams or systems listen to this podcast. There are few in the industry that are more knowledgeable and as approachable as Niall. Thank you Niall for taking the time to share your knowledge!

Till next time,
Vishnu

What is Dud?

I heard about Dud, pronounced duhd", not "dood" a few months ago when the creator Kevin Hanselman dropped a few lines in the community about it. Curious and willing to learn more I caught up with him to hear about the new 0.2.0 release, and what exactly the tool aims to do.

Dud is a lightweight tool for versioning data alongside source code and building data pipelines. In practice, Dud extends many of the benefits of source control to large binary data. It is especially a more focused and lighter weight data version control tool.

It strives to be 3 things. Simple. Fast. Transparent.

Simple
Dud should never get in your way (unless you're about to do something stupid). Dud should be less magical, not more. Dud should do one thing well and be a good UNIX citizen.

Fast
Dud should prioritize speed while maintaining sensible assurances of data
integrity. Dud should isolate time-intensive operations to keep the majority of the UX as fast as possible. Dud should scale to datasets in the hundreds of gigabytes and/or hundreds of thousands of files.

Transparent
Dud should explain itself early and often. Dud should maintain its state in a human-readable (and ideally human-editable) form.

To summarize with an analogy: Dud is to DVC what Flask is to Django.
Both Dud and DVC have their strengths. If you want a "batteries included" suite of tools for managing machine learning projects, DVC can be a good fit for you. If data management is your main area of need and you want something lightweight and fast, Dud may be what you are looking for.

Mr. Eugene Yan Everyone!

You have seen him around the community. You know who he is. Maybe you read some of his incredible blogposts? Maybe you saw some of his informative Linkedin Posts? Or maybe you have chatted with him at the bi weekly MLOps reading group sessions.

Well, this week we finally get to have him on the live meetup talking about System Design for Recommendations and Search. 🤩

How does system design for industrial recommendations and search look like? Eugene will share how its often split into:

- Latency-constrained online vs. less-demanding offline environments
- Fast but coarse candidate retrieval vs. slower but more precise ranking

We'll also see examples of system design from companies such as Alibaba, Facebook, JD, DoorDash, and LinkedIn. Maybe if we pester him enough we will get a quick walk-through on how to implement a candidate retrieval MVP.

If you haven't heard already, we have a public cal you can subscribe to. Otherwise, see you tomorrow at 9am PST/5pm BST by clicking on the link below