Share
Preview
Bring Your Own Data
 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

We are back on normal timing. Also we are organizing the first in person 'meetup' (aka happy hour) in San Francisco on Thursday. Reply to this email if you want to get added to the invite. We hope this is the first of many local gatherings!

Meetup
Volvo ML Stack
Lots of conversations were sparked by this talk from Leonard Aukea. Check out the slides from the talk here. All open source tools for the platform that Volvo Cars has built.

Pipeline versioning, continuous monitoring, triggering pipelines with github comments and data versioning all got covered in depth.

My favorite slide of the presentation came towards the end when Leonard spoke about the data scientist + profile they have at Volvo. What should a Data scientist+ know? At least the basics of:
  • Git
  • CI/CD
  • Container basics
  • SW/ML testing
  • SW/ML design patterns
Coffee Sessions
I Don't Like Notebooks
By now, I'm sure all of you have seen that notebooks are a popular topic in the community. On one hand, they are an excellent experimentation tool that lets anyone get started building models quickly and prototyping. On the other hand, they are a nightmare to deal with as their complexity, number, and use cases scale. Well, we heard what you've been thinking got on the OG notebook critic Joel Grus to talk about his opinion about notebooks and much more!

Joel gave the initial talk that inspired many a Slack thread, his (hilarious) I don't like notebooks talk at JupyterCon 2018. We talked to Joel about his take on the talk now, how he came about giving it, and what some of his most interesting tooling hot takes are now, considering his Jupyter take is now considered blasé. We also got to dive deep into the work he does now at Capital Group, the lessons he learned about writing Python code at AI2 and Google, and so much more.

I certainly recommend listening to this podcast for some practical tips on the past/present/future of MLOps, good insights into the practical challenges of ML infra, and great humor courtesy of Joel.

Till next time,
Vishnu
Guest Wisdom
Reading Code
Think about how much time you spend trying to figure out what is going on even on stuff you yourself have written.

Our guest Tudor talked about how our eyes should not be the data mining tools, we have software for that. His team has been researching and implementing an open-source tool he showed us called Glamorous Toolkit.

At first glance, it may seem like a fancy notebook. Just wait till you see it in action. I hope you never read code the same.

Current Meetup
Data-Centric AI Means Centralizing Training Data
The competitive advantage of computer vision companies is no longer in model architectures but in their training data and the MLOps behind it. How do AI-first companies compete?

We'll explore a number of emerging computer vision cases where models become fixed elements, re-trained on continuously evolving datasets as a company's deployments grow. This calls for the need for a CRM-like experience for training data, where ML-Ops tools can apply changes from multiple sources, and enabling complex labeling or inference workflows to occur.

Key takeaways:
  • Gathering training and test datasets in one place is the true asset behind ML teams, more so than trained models.
  • Computer vision is growing up and branching out to traditional industries with unique training data challenges.
  • There's no shortage of SaaS companies in the ML-Ops space, but there's no easy way for them to work together yet.

Sub to our public calendar or click the button below to jump into the meetup on Wednesday at 10am PST/5pm BST
Reading Group
Building CI Services for Machine Learning
We read Building Continuous Integration Services for Machine Learning in the last reading group session. This paper addressed applying CI in the development of Machine Learning as it has been a standard for building software by providing a variety of tools for building, testing, and deploying in an automated and iterative manner.

Before the paper (paper subject):
In the ML development life cycle, there are consecutive iterations towards improving accuracy on a holdout test set by ML practitioners. This can result in overfitting the test set (i.e. even though the prediction error can be small, the generalization error in production can be large). One obvious solution is to draw an independent test set but this can become expensive.
The authors propose a new framework to mitigate this issue that is able to 1) compute the maximum number of test runs to invalidate a test set; 2) understand the probabilistic guarantees for estimating error regarding the above overfit.

Reading group discussion:
During the reading group we discussed that the model releases concerned in the paper in production may not be fully automated with an evaluation on a holdout test (even a new one). Moreover, A/B testing is required to understand if our new challenger model is better than the previous champion, for example if a new model algorithm wants to be released or if our new model leverages more features. A way to put this into practice is to use canary deployments.

What's next?
For the next reading group session, we will have a panel with several guests to better understand if it's possible to bring good software engineering practices regarding design patterns to Machine Learning projects. In addition, we will be discussing the ideas of the book Object Design Style Guide to help in this discussion.
Extra Event
Data Centric AI
We've all heard Andrew Ng and his campaign for Data Centric AI where he wants to shift the focus of AI practitioners from the model/algorithm development to the quality of the data used in the models.

Well, we have news for you!! 

Stanford and ETH Zurich, are organizing the workshop Data Centric AI to bring together researchers, practitioners, organizations and individuals in the discipline of Data Centric-AI.

If you don't want to miss it: go ahead and register!

P.S. Garbage in, garbage out! Can we make that a cliché yet?
Best of Slack
Jobs
See you in slack, youtube, and podcast land. Oh yeah, and we are also on Twitter if you like chirping birds.



Email Marketing by ActiveCampaign