Share
Preview
Intractable data to interactable
 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌
Please drop feedback for Antoni Ivanov's meetup about versatile data kit here.
Coffee Session
MLOps for Ads Platform
On this pod, we had Andrew Yates, CEO of Promoted.ai share the do's and don't in Ad Tech.

Promoted.ai Origins X Ad Space
In a nutshell, most MAMAA companies are essentially ad businesses. They have some commercial media that touches every other part of their business.

Because these MAMAA companies have been dominant in this space, there weren't a lot of opportunities for promising startups.

Promoted.ai saw an opportunity to create an outsourced platform for "ads as a service." Its system design aims to be as fair as possible and focuses on generating revenue from ads the right way.

Evolution of Ad Tech
Over the years, there has been a transition from the mundane approach of using collected data for doing ads to using ML.

For instance, Facebook had a closed ecosystem of using the data they collected for the ads. The philosophy and hypothesis in Chaos Monkeys predicted a total failure of Facebook ads in the future due to the method they adopted in building the ad systems. The prediction turned out to be wrong.

Around 2010, Facebook and Google came up with techniques to monetize the ads with machine learning applications like clicks and conversion models, recommender systems, e.t.c.

These machine learning applications made it possible to get huge profits from ads.

Technical Debt with Building Ad Tech
The complicated endeavor of managing technical debt in ad systems can't simply be solved from a technical and infrastructural standpoint using features like feature stores, signal management, versioning, e.t.c.

It takes strong technical leadership to handle these kinds of horse trading issues, a/b experiments, launch reviews, e.t.c. by keeping a more unified vision of processes.

Signals constantly keep changing. Instead of measuring every single a/b impact in the system, more focus should be placed on thinking of the system's components as a service you consume as a black box.

This tracks back to the fundamental questions of who owns the signals and who can apply them.
 
Coffee Session
Distributed Data Science
Blaise Thomson is the founder and CEO of bitfount, a federated learning and analytics platform. He broke down details on some cool layers of a robust federated learning system.

Distributed Data Science
Due to concerns about the ethical use and privacy of data, a lot of legislation have been imposed to regulate and protect data.

With distributed data science, also known as federated learning, gears are switched to approach these concerns. Instead of sending the data to the algorithms, algorithms are sent to where the data exists for processing as tasks.

Tasks are orchestrated with orchestration platforms on a high level to allow federated governance. With usage-based access control, tasks can be further assigned to different computation locations where various privacy protection models are further enforced, like an access control model that checks the kind of permissions the data scientist has over the data usage.

Some of these permissions include evaluating an ML model, training the models, or running arbitrary queries with differential privacy.

Privacy Protection
When you have data that need to travel to another endpoint for insights extraction, it needs to pass through a couple of phases to protect this data.

These phases are in two categories; the data modifications phase and the disclosure control phase.

The data modifications phase includes techniques like tokenization, synthetic data aggregation, and noise addition. The disclosure control phase human checks, k-anonymity, and differential privacy.

The main idea is to protect how the data is used and who is allowed to use the data.
 
Blog post
Production Ready AI
This sponsored blog was written by Ariel Navon.

Run: ai’s 2021 State of AI Infrastructure Survey revealed that 38% of respondents are spending over $1M a year on AI infrastructure (hardware, software, and cloud fees), with 74% of respondents saying that they will increase that spending next year.

The increased AI infrastructure spending is giving rise to “Shadow AI,” a term that describes building AI infrastructure and tooling without IT’s input. In the short term, AI and IT teams may succeed with a decentralized approach, especially as they get started with building new models, and have fewer data scientists and AI teams. In the long term, Shadow AI is likely to lead to the failure of building the infrastructure required to scale.

 
Blog post
Federated Queries
Offisong Emmanuel, a software engineer, took a technical dive into the use of federated queries.

Federated queries can be used to connect our transactional databases to data warehouses for analytics sake. This blog shows how to run federated queries from Postgres database to Amazon Redshift.

 
Sponsored Post
Feature Stores for Real-Time ML
Building a feature store or a feature platform for machine learning is exciting work. However, most teams tasked with building a machine learning platform aren’t aware of all the challenges they will face.  

If your team is wondering whether building or buying would be the best and fastest path to success, this whitepaper breaks down the building vs. buying pros and cons. It also details the considerations you need to take into account when building a solution, including:

  • Managing the data pipelines needed to get machine learning models into production
  • Automating batch, streaming, and real-time data pipelines
  • Reducing training /serving skew
  • Planning for engineering support, tech lock-in, and scaling to support more models and features

Download the whitepaper to learn when it makes more sense to build or buy, based on your organization’s needs.


 
We Have Jobs!!
There is an official MLOps community jobs board now. Post a job and get featured in this newsletter!

IRL Meetups
San Francisco — November 9
Oslo — November 10
Chicago — November 10
Luxembourg — November 15
Copenhagen — November 22

Toronto — November 22
San Francisco — November 23

Thanks for reading. This issue was written by Nwoke Tochukwu and edited by Demetrios Brinkmann and Jessica Rudd. See you in Slack, Youtube, and podcast land. Oh yeah, and we are also on Twitter if you like chirping birds.



Email Marketing by ActiveCampaign