Origin Of Spark

Sparks Translational Gap // Matei Zaharia.

Matei Zaharia co-founder and chief technologist at Databricks, let us pick his brains on this one. He shared how Spark went from being an idea to actually becoming a full-fledged product and being used by a ton of people

Spark started out as a result of kin interest in the kind of data center scale computing that was happening mostly at web companies like Google, Yahoo, Microsoft, and so on. At that time they were, mostly doing things with the web and indexing the whole web and then doing stuff on top of it. Coincidentally, open-source projects like Hadoop had started out, to do map pages.

Since collecting data isn't actually expensive and the cost of storage is pretty low, the company and scientific lab were interested in using large-scale data, with an efficient and optimal trade-off. , However, they were limited in the kind of applications they could run. Although they were good for building a web index, they weren't good for running, more interesting algorithms like machine learning.

Spark was developed from Hadoop to address use cases that people couldn't do well with Hadoop. Spark's engine was primarily built to handle things like machine learning algorithms. Over time it expanded to doing, other things like large-scale on-disk batch processing stuff.

Another reason that Spark became very closely tied to machine learning is that scientists and researchers at Berkeley used it for large-scale machine learning.

Video || Spotify || Apple

Declarative MLOps

Rahul Parundekar, the founder of AI Hero, talked about streamlining the model serving on Kubernetes. This is done by leveraging the concept of declarative MLOps

Declarative Paradigm is defined as what needs to be accomplished, without defining how it needs to get done. This is the way Kubernetes works. Kubernetes allows the orchestration of workloads, IT jobs, servers, databases, etc.

When developing an ML solution with Kubernetes, a target layout is defined using a yaml file, which is then applied to Kubernetes using the Kubernetes API. Kubernetes layout automatically takes care of all orchestrations and serving of the models. That way, end users will get access to these backends, model servers, and any other services.

It starts out by storing the layout in etcd, and then the control plane starts scheduling servers inside pods on virtual machines or nodes using either the cloud provider or the control manager. Each node has a container orchestration system as well as a networking interface.

In a nutshell, "you define what you want and the system takes care
of it". This makes it a useful paradigm for MLOps

YouTube

Traceability & Reproducibility

This blog was written by Vechtomova Maria

In the context of MLOps, traceability is the ability to trace the history of data, code for training and prediction, model artifacts, and environment used in development and deployment. Reproducibility is the ability to reproduce the same results by tracking the history of data and code version.

Machine learning models running on production can fail in different ways. They can provide wrong predictions, and produce biased results. Often, those unexpected behaviors are difficult to detect, especially if the operation seems to be running successfully.

This blog explains how traceability allows us to identify the root cause of the problem and take quick action. Also making it easier to find the code versions responsible for training and prediction, as well as the data used.

Read Here

POSIX for MLOps

This blog was written by Médéric Hurier (Fmind)

If you work on MLOps, you must navigate an ever-growing landscape of tools and solutions. This is both an intense source of stimulation and fatigue for MLOps practitioners.

Vendors and users face the same problem: How can we combine all these tools without the combinatorial complexity of creating custom integrations?

In this article, he proposes a solution analogous to POSIX to address this challenge. First, he motivates the creation of common protocols and schemas for combining MLOps tools. Second, he presents a high-level architecture to support implementation. Third, he conclude with the benefits and limitations of standardizing MLOps.

Read Here

The Buyer’s Guide to Evaluating ML Feature Stores & Feature Platforms

If you’re looking to adopt a feature store or platform, but don’t know where or how to start your research, then this guide is for you.

Download this free guide to:

Access a comprehensive framework for understanding the capabilities of different feature stores and feature platforms
Get examples and tips on how to use a data-driven approach to evaluate vendors so you can find the right solution for your organization’s needs
Learn how the right solution can improve ML model accuracy and unlock new real-time ML use cases using streaming or real-time data.

Download now