Share
Preview
The Lost Art of Data Modeling
 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

I understand the desire to keep good things to yourself.  Why not share the love and forward this edition to a friend?

Past Meetup
Scaleout's Open-core Platform
Marco Capuccini, the Lead ML Engineer at Scaleout came on the meetup last week to talk about scaleout's decentralized ML approach using Federated Learning, and FEDn is an open-source federated learning platform developed by scale-out.

Introduction to Fed ML and FEDn: Current "best practices" of machine learning come with challenges such as private/proprietory data, regulated data, and Fog/edge to name a few. The Federated learning approach solves these issues for certain use cases.

Marco explained the FEDn platform which has a friendly UI. Its architecture is divided into three tiers to properly manage and support the different stages of its processes which makes it flexible, and it is also a language-agnostic API.

FLOps Studio: It is a platform for managed federated learning operations. The idea is to have an "Advanced Alliance Management" for federated learning(FEDn) projects which entails (management, monitoring, and authentication). The lifecycle has a  similar design to centralized machine learning and as such supports established MLOps practices tools, hence "FLOps".

Coffee Session
Entity Centric Data Modeling
Last week was an honor to host Maxime Beauchemin on the podcast, the original creator of Apache Airflow and Apache superset. He has had quite an exciting career, working for companies like Facebook, Airbnb, and Lyft. We had a really insightful conversation about the entity-centric approach to data modeling and how it's an excellent solution to handle core data issues and enable the optimal retrieval and consumption of data.

What is entity-centric Data Modeling and how does it apply to ML? It is a form of a data modeling methodology in the modern context. Where the data is associated with entities/users to enable its repeatability and scalability across various use cases. And in machine learning, this is an effective solution because of how dynamic the data(features) that are being used, can change.

Problems of data organization for rapidly scaling organization: Because of the different approaches that organizations take, the data problems that they face tend to be more research inclined for the case of term, like new markets, intra-organizational questions, customers e.t.c. But generally, the common approach would be setting up some sort of short-term feedback based on the day-to-day operations or long-term feedback based on annual or quarterly events, to help draw better insights into their specific data problems.

He wrapped it up by saying "The ideal thing would be to do both the long-term and short-term solution in parallel at the same time. But in reality, do you always have the resources to do both? often not."


Crossroads between data stacks and data scientist/machine learning.
To begin with, being able to have reproducibility is very important in machine learning. Irrespective of not having a clear-cut line to define the subsets for these data stacks with respect to machine learning, which isn't necessarily a negative thing, because flexibility across them is definitely useful. Applying some of the functional programming paradigms to data engineering practices to try and draw some parallels between the two and see where they make sense can aid this.
Resource
Full-Stack Deep Learning
Going from zero to hero in the ML space can be quite overwhelming at every step of the journey, because of the broad scope of the different areas, implementations, and applications.

Over the years there have been a few classic learning resources that we tend to recomend to everyone getting into MLOps. aside from Andrew Ngs infamous course, Full-stack Deep Learning runs a deep dive course every year that aims to get more people to understand the art and designs behind developing a "Full-stack" ML product. Well, it's "go time" in a bit for this annual cohort-based course.


Who is this for? ML Researchers and Engineers, MS students, software engineers looking to get into ML, data scientists looking to up their software engineering game, and PMs on ML teams will all benefit from materials in our course.

What you will learn? This course covers the production of machine learning solutions and products. This isn't just about a general application of MLOps workflows but an understanding of using other techniques together with them and at the same time being cautious of the business implications around use-cases, while driving solutions with these ML powered products. The topics to be covered include:
  • Formulating the problem and estimating project cost
  • Sourcing, cleaning, processing, labeling, synthesizing, and augmenting data
  • Picking the right framework and compute infrastructure
  • Troubleshooting training and ensuring reproducibility
  • Deploying the model at scale
  • Monitoring and continually improving the deployed model
  • How ML teams work and how to manage ML projects
  • Building on Large Language Models and other Foundation Models

Project-based Learning: It has a project-based phase, where participants get to create a working ML-powered application of their choice and share it with their fellow learners while getting feedback from the course staff. Selected projects will get the opportunity to share their work with the broader FSDL community.
Sponsored
Arize Observe
Arize: Observe Unstructured is a free, half-day virtual event taking place tomorrow that is designed to help take your ML initiatives using unstructured data to the next level.

Sessions include:
  • Powering the Next Generation of Products with AI featuring Peter Welinder, VP of Product & Partnerships at OpenAI
  • Accelerating Machine Learning from Research to Production with Hugging Face, featuring the Hugging Face Product Director Jeff Boudier
  • A special presentation on embeddings visualization with UMAP creator and Tutte Institute scholar Leland McInnes
  • A Workshop on Monitoring & Troubleshooting Embeddings featuring the Arize team

Ready for some pre-reading? Learn more about monitoring unstructured data and embedding drift monitoring and see why getting started with embeddings is easier than you think.

We Have Jobs!!
There is an official MLOps community jobs board now. Post a job and get featured in this newsletter!



IRL Meetups
Berlin - June 30th
Lisbon - July 21
Seatle - ??
Denver - ??
Best of Slack
Best of Slack is its own newsletter now. Sign up for it here.
Thanks for reading. This issue was written by Nwoke Tochukwu and edited by Demetrios Brinkmann. See you in Slack, Youtube, and podcast land. Oh yeah, and we are also on Twitter if you like chirping birds.



Email Marketing by ActiveCampaign