Share
Preview
Voice of reason
 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌
Shout out to all those cities doing a local meet-up in the next week. As a reminder, we post all the local meet-up talks on our in-real-life (IRL) YouTube channel.
Coffee Session
ML Project Management
Simon Thompson, head of data science for GFT technologies, came on this podcast and shared his experience and thoughts based on his timeless and priceless years in the ML industry.

Unexpected changes in ML

Over the years, things have grown out of scale in the adoption and implementation of ML in terms of memory & computation, infrastructure, open-source, e.t.c.

With computation, the idea of a 200 billion-parameter model was an engineering fiction, unimaginably bizarre to practitioners from the early era.

Infrastructural change has moved from a time when the level of impossible problems we faced was putting a couple of working stations over an ethernet network and making them exchange messages. In our current time, we are faced with an indescribable problem of connecting 500 GPUs within a data center to make them work together for a trillion parameter scale models.

ML Projects
Based on Google's ML higher interest credit card debt paper, it has become clearer that choosing and running the algorithms for machine learning is only a little piece of the puzzle.

The project's setting up and organization is another critical piece, though it isn't mentioned in the paper. The paper has much to do with getting the right team set up.

In a commercial context, having one person fully manage most of the critical parts of projects is quite dangerous. People have holidays, they can get ill or get a new job, e.t.c. which can hinder product delivery. Almost nothing gets done by one person anymore.

When working on ML projects, it is crucial to have a shared understanding and create a positive culture. It's a hard process, but you have to have a set of behaviors, expectations, and agreed outcomes to set your project up for success to achieve this. It differs from project to project.

Scaling and Priority
Is scalability the most important priority when building an ML product?

The concern around scaling is always a discussion that pops up, whether it is a valid question or a copped-out one. You need to be open to the idea that you may never need it, although it is nice to have.

However, accountability, manageability, and governability are must-haves with any ML software going into production.
 
Coffee Session
Voice & Language ML
Catherin Breslin is the founder of the Kingfisher Labs consultancy. We had a juicy conversation about her experience as a Machine Learning Researcher in voice and language technology in ML.

Tooling Evolution and Challenges
The tooling for handling voice and language problems in ML has evolved over the past decade in the same way machine learning has become more accessible.

In the early years of ML, many tools were custom-built for specific problems due to the disparity between the fields of application (i.e., audio, speech, language, vision, e.t.c).

Over the past five years, there has been more standardization across machine learning fields, and many of the tools and techniques are now coming together in terms of application. The integration between ML fields and knowledge being shared across the various ML fields has made them more aligned.

Reflective trend
Is there a skill barrier for a generic data scientist to work in the areas of voice and audio?

Well, data science is data science, and having domain knowledge of any application gives you an edge.

It is really important to have a good grounding in signal processing, language, engineering maths, or something that gives more depth of technical understanding or experience in the areas of voice and speech.

Academia to Industry
There are some key differences between working within the research world of academia vs. the practical world of industry.

When researching in the world of academia, your goal is to devise a better machine-learning model for that problem using pre-prepared and static data.

In industry, you do not know or have the right data for the problem at the onset. You have to figure out the best data source to collect the data, how to clean that data, e.t.c. You also have to worry about monitoring and managing the model in production to better serve your users.
 
Upcoming Meetup
MLOps in Practice
On Wednesday, 26th October at 11:00 am CST, we will be having a virtual meetup with Marouen Hizaoui and Mo Basirati, the Senior Consultant and Senior Consultant/MLOps Lead at Machine Learning Reply on the common challenges and lessons learned in practicing MLOps.

The different practices and approaches that would ensure the success of your data products, distilled from doing MLOps at different clients from different industries and different levels of maturity:

- It's much more than technical stuff
- The "best" tool is not always the best solution
- Integrating MLOps in system and infrastructure with different levels of maturity

 
Past Meetup
DevOps to Data & ML
Our guest for the meetup, Ivanov Antoni, is a software engineer currently building VMWare's data analytics platform and lead maintainer of the Versatile Data Kit (VDK). He shared how VDK can help implement DevOps for Data and ML using the new cool tool that he is working on.

DevOps challenges

Quite a few inefficiencies occur between infra/ops teams and data teams when they work together. It is fundamentally a result of their conflicting priorities and goal.

The wall of conflict that introduces these inefficiencies includes; communication failure, blurred lines of responsibilities, insufficient operations, and stalled development. e.t.c

These are similar conflicts that brought about DevOps 20 years ago (i.e. in the early days of scalable and production-level software development).

Versatile Data Kit(VDK)
VDK is a data engineering framework that can create and manage data jobs for the entire data workflow.

The typical journey of data workflow starts from a data source ingestion pipeline, then flows through a data transformation logic before insights are finally extracted.

VDK enables you to create jobs that aid the data workflows at the intersections of the data journey information/injection job for ingesting data, transformation jobs for the transformation logic, and report jobs for insights. It also adapts a lot of DevOps practices to help speed up the development workflow between the players in data teams and the IT/operations teams by abstracting workflows.

 
Sponsored Post
Flyte vs. Kubeflow
As open-source orchestration platforms built on Kubernetes, Flyte and Kubeflow may seem similar at first glance. Looking more closely, however, they offer different developer experiences, especially regarding scale and deployment. As a data, ML, and infrastructure orchestrator, Flyte is fundamentally different from Kubeflow in three ways:

  1. Flyte lets ML practitioners create without navigating infrastructure jargon and Kubernetes details. Kubeflow requires Kubernetes and DevOps expertise, which may slow down the development of ML pipelines.
  2. Flyte’s Python SDK lets ML practitioners write Python code, while Kubeflow’s Python SDK behaves more like an infrastructure DSL.   
  3. Flyte supports a range of data types and transformations. By contrast, Kubeflow enforces a type system and doesn’t support data types beyond fundamental Python types and artifacts/files. Kubeflow must be told what must be done when it encounters another type, such as an s3 URI. Flyte, however, automates the interaction with s3 (and GCS), supports intra- and intercommunication among cloud services and the local file system, and reduces the need to write boilerplate code.

Meet the Flyte team in person at Kubecon 2022, or visit the virtual booth to learn more about how Flyte lets you orchestrate your data/ML pipelines with ease without tinkering with your infrastructure.

Resources:
Subscribe to Flyte Monthly for the latest news, updates and events

 
We Have Jobs!!
There is an official MLOps community jobs board now. Post a job and get featured in this newsletter!

IRL Meetups
Lisbon — October 27
Oslo — November 10
Luxembourg — November 15
Copenhagen — November 22

Toronto — November 22

Thanks for reading. This issue was written by Nwoke Tochukwu and edited by Demetrios Brinkmann and Jessica Rudd. See you in Slack, Youtube, and podcast land. Oh yeah, and we are also on Twitter if you like chirping birds.



Email Marketing by ActiveCampaign