Hello vs Business Model

The very first Bristol in-person meetup was held a few weeks ago. Newly minted LinkedIn influencer Laszlo Sragner talked about good coding practices around Data Science/Machine Learning.

Clean architecture for building machine learning products: Properly structuring ML projects can reduce the technical debt which makes building the solution easier. Factors like business problems with respect to providing a solution with machine learning and fiction between teams in the machine learning lifecycle make posses some of the biggest challenges when building machine learning products. Because in the context of machine learning these factors shape the workflow structure which in turn affects or determines what comes out at the other end of the tunnel (i.e product).

Technical Debt vs Technical Mess: The difference between these concepts gives a clearer knowledge of developing sustainable products but more importantly a good understanding of technical debt which is a concept that is coined from the concept of Co-operate debt. It is an attempt to gain knowledge with a plan of correcting it later, but in terms of programming "paying back tech debt depends on writing clean code enough to be refactored". for more on tech debt in ML see the classic google paper.

Refactoring, Decoupling, and Design Pattern: These are relevant software engineering concepts that should be incorporated into our workflows when developing Machine Learning products to give a clear-cut structure to the design and processes. It helps to properly define what to work on and how to approach each task at every stage of development as well as to ensure a seamless workflow within the product and across the different stages. In general, this concept has a couple of benefits;

Abstraction between analysis, products, and infrastructure
High level of flexibility to enable experimentation without destroying the previous versions
Managed complexities when building complex solutions for complex problems
High-speed iterations when building solutions.

TL;DR - ML is inherently unspecifiable. Technical debt is inevitable. These practices can help mitigate the complexity that comes along with building solutions with ML.

Delina Ivanova is an associate director for the data and insights at HelloFresh. She has an impressive track record as an individual contributor and technical manager. The conversation centered around driving value with data and how good data team management can help speed up that process.

Career Trajectory in the Data World - For starters, the data world is different. It is actually quite accessible as a field. This allows people with different backgrounds to break into the field and learn the skills they need. In fact, having experience in other areas is actually a plus because of the cross-pollination.

Telling Stories with Data - This can be quite challenging. For the most part, you might be talking to people who are not interested in the technical jargon. How you are able to structure thought processes and communication skills? You have to understand who the audience is. Construct an objective that aligns them with the process by explaining the approach and mechanism of arriving at a particular conclusion.

Managing a Technical Team As a manager you become spread thin. You are opining on things which usually include translating ideas and problems into solutions, identifying requirements, determining what your team is doing, and helping them think through the flow as well as scalability in terms of business strategy.

In the engineering space, we are currently reflecting on what good leadership for technical work looks like. I am not sure this problem has been completely solved.

Managing vs Individual Contributor - The irony of being in a managerial position is that more often than not your team will have more technical expertise in solving a given problem. This is okay.

Managing involves a few key parts:

Building trust in the data amongst stakeholders,
Identifying the right opportunities for business solutions
Buying your team the required time that they need to focus on building that solution

This eliminates time to get hands-on with the building process or even keep up with the technology itself. It's a trade-off that has to be made.

Interacting with a database as practitioners within the data space is most likely bound to happen... Hence SQL, right? And when it comes to SQL we can all agree that it's easy to jumble up queries right? Well, I guess we are in luck! Rasgo just announced the release of the SQL Generator.

Rasgo has been investing heavily in giving back to the data community. Lately, they found people were searching on Google and Stack Overflow for required SQL syntax - wasting a lot of time that could be used for data analysis.

SQL Generator is a low-code web app that enables anyone to generate a complex SQL query by generating the SQL syntax needed for specific data transformations without writing a line of code.

It makes use of Rasgo's SQL transforms open-source library, and now you can access that same library and put it to use through the SQL Generator. You are able to input your column headers and the transform that you are trying to replicate - then the generator automatically kicks out the SQL that you will need to transform in your cloud data warehouse.

On Thursday, July 21st, we're hosting a special #ask-me-anything session with Netflix veterans, Romain Cledat and Kedar Sadekar, who work as senior engineers in the Machine Learning Infrastructure group at Netflix! Among other responsibilities, the team maintains the famous Open-Source ML framework Metaflow.

Come to Slack channel #ask-me-anything and ask them anything about how Netflix builds ML, how it manages and scales ML infra, what do Data Scientists think of Metaflow and anything else! If you can't attend, you can leave your question in a scheduled message.Don't miss a rare opportunity to talk directly to the Netflix ML engineering team!

The "What's Next?" wrt MLflow has been ringing in the MLOps community's ears. We got a sneak peek a few weeks ago when Corey came on the pod tot alk to us about the ML flow fundamentals and building/maintaining the beloved tool.

MLflow 2.0 is coming and will include MLflow Pipelines. Does this mean no more Kubeflow? Well, maybe.. I haven't gotten my hands on it yet but according to the release blog.

MLflow Pipelines provides a standardized framework for creating ML pipelines. It introduces the following core components in MLflow:

Pipeline: Each pipeline consists of steps and a blueprint for how those steps are connected to perform end-to-end machine learning operations, such as training a model or applying batch inference. A pipeline breaks down the complex MLOps process into multiple steps that each team can work on independently.
Steps: Steps are manageable components that perform a single task, such as data ingestion or feature transformation. These tasks are often performed at different cadences during model development. Steps are connected through a well-defined interface to create a pipeline and can be reused across multiple pipelines. Steps can be customized through YAML configuration or through Python code.
Pipeline templates: Pipeline templates provide an opinionated approach to solving distinct ML problems or operations, such as regression, classification, or batch inference. Each template includes a pre-defined pipeline with standard steps. MLflow provides built-in templates for common ML problems, and teams can create new pipeline templates to fit needs.

Once we have a play with it I'll get you all more details! If anyone has kicked the tires on it already I would love to hear your feedback!