Share
Preview
Jeremy Howard came on the pod!
 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌

Someone told me if you want something ask for it. Will you share this newsletter with a friend? And tell them to subscribe here.

Also the sequence newsletter is giving away 3 free memberships to their premium newsletter for the best failure Friday story. Share a time you failed with us in the slack thread and be entered for a chance to win.

Past Meetup
Kubeflow + Spark
Orchestration

Is kubeflow still a headache to work with? you tell me, all I know is what the community talks about, and according to them the tight coupling to argoCD makes things a bit rough around the edges.

That being said, Apache Spark and Kubernetes have been established as de facto standards for data processing and container orchestration respectively. This talk covers how these technologies can be integrated under the orchestration of Kubeflow. Plus there is a demo at the end.
New Tool Tuesday
KubeSurvival
On-Prem Cost Cutting

Right before going on vacation, I saw Alon Gubkin created a nifty tool for saving money with K8S. Naturally, I had to talk with him more about the reasons for creating it and what exactly it does.

You all may remember another New Tool Tuesday I did on BudgetML. I am hoping there can be a community collaboration and these two money savers can have a superhero offspring that pays me for running Kubernetes.

*Enter Alon

So the story is really simple, the idea for this tool came from two real needs:

  1. At Aporia we provide an ML monitoring solution that is on-prem, meaning you can install it in your own Kubernetes. We obviously needed to estimate how much our installation costs to the customer, but when I tried to accurately estimate that, I figured that it can be really hard.
  2. Some of our customers run a LOT of training jobs and model servers on kubernetes, and if you don't configure you cluster correctly this can cost you A LOT (especially if you use GPUs). So after some conversations with them I thought of this idea

There are other similar tools like kubecost (which is great!), but it works on existing clusters with everything already installed.

I wanted a tool that lets you quickly plan your cluster and estimate costs for it, without a real cluster behind the scenes and without installing anything. So I built a really small "programming language" that lets you easily define your workloads:

# My training jobs
pod(cpu: 1, memory: "2Gi", gpu: 2) * 3 +
# Some model servers
pod(cpu: 2, memory: "4Gi") * 10


And it automatically gives you the cheapest node configuration, with its price per month (e.g 2 p3.2xlarge instances for USD $875.16)
Coffee Session
Jeremy Freakin' Howard.
An hour with the legend himself.

Many of may have gotten your start in machine learning with Kaggle. Others may have taken the fast.ai course. For you, Jeremy Howard needs no introduction. For others, he's just as familiar for being the opinionated, prolific blogger on Twitter and other sites about how to build machine learning models. Point is, the man is a walking, talking machine learning icon and we got to chat with him! By the way, if the name isn't familiar to you... Google him. Right now. Yes, right now.

Jeremy, Demetrios and I had a really good conversation, especially about how to write software for machine learning. We talk a lot in the community about platforms, APIs, and abstractions. Jeremy shared his knowledge from building APIs and software used by thousands (if not millions) of people around the world. Intriguingly, we talked about some of the core elements of a successful software writer's mindset. Jeremy never lets himself get too far from the thoughts of the day to day ML developer. He also takes lessons from some of the most influential API designs ever, like the rather timeless .NET API. For anyone writing reliable, maintainable, scalable software for machine learning, I highly recommend listening to this convo from end to end!

Till next time,
Vishnu
Current Meetup
ML Lego Blocks
Kubeflow + Kafka + Spark + Redis + Prom & Grafana

We've got another special meetup with live coding tutorials for all you out there looking to get your hands dirty.

Aniruddha, a Senior Data Scientist at Publicis Sapient will walk us through Kubeflow on ststeroidsWe will dive into using it with a feature store, end-to-end training pipelines and set up the model serving portion.

This meetup we will build the Lego blocks for online and offline data ingestion with spark and Kafka alongside Redis, monitor the metrics in Grafana and Prometheus, and the whole architecture building in GCP.

See you Wednesday aka tomorrow at 9am PT/ 5 pm BST. In case you want to keep up to date, subscribe to our public google calendar.

Best of Slack

  • FAISS Tutorial: Thanks to the Pinecone team, we have an awesome tutorial on the most popular library for similarity search, FAISS.
  • Help Burkay help you!: Community member Burkay Gur is doing free 1-hour analyses of company ML production infra, as long as you're okay with sharing the results with other listeners. Super cool experiment, take him up on it!
  • This community amazes me: Community member Mayez asked a question about working with large-scale data, and a number of people jumped in to help him think it through. Shout out to community member Thomas Gaddy for a super thorough, helpful answer!
  • Heat Wave Jamz
Jobs

See you in slack, youtube, and podcast land. Oh yeah, and we are also on Twitter if you like chirping birds.



Email Marketing by ActiveCampaign