Scaling Biotech

"It's okay to do a little bit and then scale it up". Andy came on the meetup last week to give us a practical view of doing MLOps. His main message? Don't bite off more than you can chew, make steady progress, there is no need to go from 0 to 100 all at once. Its a process.

I appreciate how he talked about different baby steps you can take. Make sure you are evolving with each project. Andy showed us just what that can look like, starting with version control and maturing all the way to CI/CD, unit testing, infrastructure as code, experiment tracking and performance monitoring.

My favorite quote was when Andy eloquently describe ML Engineering as when you put one model into production and MLOps being when you put n+1 model into production.

With the advent of COVID vaccines, we’ve all been exposed to the amazing advances created by modern biotech. Data science and machine learning played a role in developing these technologies and many other biotech advances. This week, we were joined by Jesse Johnson, a guest steeped in the role of tech, machine learning, and data science in advancing the pace of innovation in biotech!

Jesse is the VP of Data Science and Data Engineering at Dewpoint Therapeutics, a company targeting a class of molecules called condensates. For a company like Dewpoint (and any other biotech), there is a huge amount of complexity to translate across fields; scientists need to work closely with data scientists and software engineers to translate biotech context into ML experiments and systems. This is no small feat, and Jesse shared with us the heavy work he puts into creating systems of communication and "shared mental models" to merge domain expertise and machine learning. We also touched on Jesse’s unique career background, in which he shifted from being a tenure-track math professor to a software engineer at Google and now a data systems leader at a cutting-edge company.

This is a great podcast for anyone interested in hearing real details about how to implement production machine learning systems into a complex, non-software industry context. It’s the story of many industries trying to take advantage of ML nowadays, and I highly recommend listening to Jesse’s nuanced and thoughtful perspective on how to do it right! - Vishnu

In the last reading group session Varun Khare presented a talk about Federated Learning (FL). The main takeaways of this session, which is going to be uploaded for YouTube were:

In FL, each user device has a local training dataset which is never uploaded to the server, and computes an update to the current global model maintained by the server. Only this update is communicated (the model itself, or gradient descent steps), and how many training steps are performed in each client is a training parameter
FL allows hyper personalization by training one model for each user, mitigates data privacy issues since the data never leaves the user device (e.g. smartphone), and is bluntly cheap since the computation is performed at the client/user side, thus, leading to low cloud costs.
FL has several challenges like model bias as client with better hardware (i.e. in better financial conditions) can make more server uploads and have more impact on the global model, and there aren't many open-source solutions to tune and test FL algorithms at scale before production (checkout http://github.com/NimbleEdge/Recoedge that Varun works on)

Check out the paper we went over by clicking here. Watch the session and insights by clicking the button below.