There is a new MLOps youtube channel dedicated to all the IRL meetups we are doing! Subscribe Here.
P.S. our emails keep clipping on Gmail so make sure to click 'see entire message' for all the jobs at the end.
Coffee Session Multi-Model Recsys
The vision is pretty clear when working at a company called EezyLife, like Marc Lindner and Amr Mashlah. Make. Life. Easy.
EezyLife is a personal life assistant that suggests interesting things to its users they will hopefully enjoy.
It's basically a "personalized universal recommendation system" that can integrate with multiple products/providers and make suggestions from the available options. If you aren't sure what to do tonight, it will suggest a few concerts near you, a couple of series you might like, or a three-hour guided meditation. Ok, I
made up the guided meditation.
Data Challenges As we all know by now, data problems are part of life... There are two major challenges in this specific use case: knowing enough about a product to recommend it
and learning enough about the user to know what they want.
Let's say a user is interested in watching an intellectually stimulating documentary. The hard part is in knowing enough about the documentaries to be able to recommend with confidence that any are "intellectually stimulating."
Due to EezyLifes' recsys multi-modal architecture, things get complex fast. Problems like varying data formats and deduplication of items from different products arise.
EezyLife addresses this with a feature-based, collaborative filtering approach.
The idea is to identify the basic identity of relevant features from multiple sources, then extract and
link them together. This episode is highly recommended.
At the most recent IRL in London, Mohammed ElNabawy shared the story of Brix, which is an in-house tool used within Quantum Black Labs.
Brix is a single source of truth of reusable technical analytics assets within QunantumBlack. I know that sounds like a handful of buzzwords but just bear with me.
Brix Origins QB set out to
solve the complexities data professionals have when managing projects. The four key issues Brix covers include:
Difficulty in finding existing solutions to a current problem
Difficulty in reusing monolithic packages or extracting
relevant code from them
No purpose build place to share code and research
Starting every project from scratch due to a lack of reusable components.
The Journey Once upon a time, two common Github repositories emerged at QB. Each repository had assets for both data engineering and data science.
Components were hosted across each of the guilds for "reusableability". However, it was hard to discover these components. They existed in the repositories .... just lost in the wind (like me on a Thursday afternoon).
The QB Alchemy unit was assembled to build python libraries that handle horizontal problem spaces across data science and machine learning. Soon after that, scaling and maintenance became an issue.
Airbnb's Knowledge Repo was deployed as an MVP, to help make these components discoverable. With 26 components and about 150 interactions, it was an improved solution. The MVP demonstrated enough interest to spend time building out a proper solution.
Brix Upsides Project acceleration delivery. Reduced risk. R&D capabilities on steroids. This thing's got legs.
It helps build the code and publish it to an antifactory in a go. It also focuses on standardized contribution, discoverability with multiple code
repositories, and has standard consumption like "pip install" to download.
Search capability is ingrained into our daily life. Arguments
are commonly ended with the conclusion, “just google it”.
Users have come to expect nearly every application and website to provide some type of search functionality. With effective search becoming ever-increasingly relevant (pun intended), finding new methods and architectures to improve search results is critical for architects and developers.
Starting from the basics, this blog describes the AI-powered search capabilities within Redis that utilize vector embeddings created by deep learning models
Vector embeddings are lists of numbers that can represent many types of data.
Vector embeddings are quite flexible. Audio, video, text, and images
can all be represented as vector embeddings Sparsity can present challenges for ML models. Because as the representation of encoding grows in size, the dataset can become computationally expensive to utilize.
Machine learning models rely on the critical assumption that input data distribution does not drift too much over time. Yet once in production, model drift is inevitable.
Learn why model drift matters, how to make sense of it, and how to measure drift, including:
Join us on Thursday, August 18th at 10 am PST (aka 19:00-20:00 Europe (CEST)) for another Ask Me Anything Session!
We’ll be chatting with none other than Ketan Umare CEO and co-founder at Union.ai.
He’s held multiple Senior roles at Lyft, Oracle, and Amazon. His unique experience ranges from Cloud, Distributed storage, Mapping (map making), and machine-learning systems. Ketan is passionate about building software that makes engineers’ lives easier and provides simplified access to large-scale systems.
This is your opportunity to ask questions about:
Convergence of data and ML
How to take your ideas from concept to production
Workflow automation
Best practices for creating concurrent, scalable, and maintainable workflows
and
why airflow is not enough.
Feel free to schedule your questions if you can’t attend in time.
Also, share this event with your friends. Let’s get this thing poppin’!
On Wednesday, 17th August at 11:00 am CST, we will be having a virtual meetup with Fabiana Clemente the Co-Founder & Chief Data Officer at YData.
It's going to be centered around Synthetic data. It's a hot topic in the current Data/Ml space, being proposed as an essential feature in any Data Science toolkit.
In a very hands-on approach, she will showcase and depict how to generate synthetic data, deal with the challenges of leveraging deep learning networks, and overcome them in a workflow that includes data profiling and the definition of expectations for the data generation.
Thanks for reading. This issue was written by Nwoke Tochukwu and edited by Demetrios Brinkmann. See you in Slack, Youtube, and podcast land. Oh yeah, and we are also on Twitter if you like chirping birds.