Share
And a special new tool
 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌
Huge thanks to Mohamed Sadek for helping write the New tool Tuesday section.

Let me know if you have something to say and want to write some blurbs.
Coffee Session
Python Power! // Sammy Sidhu

The open-source project daft came on my radar a few months ago after I chatted with one of the creators about some community initiatives. On that call it became apparent that he knew what he was talking about. So I did what anyone in my position would do, invited him on the podcast.

Today that podcast gets to see the world!

Sammy Sidhu is the founder of Eventual and the creator Daft - a tool for distributed Python dataframes. He comes from the world of autonomous driving and machine learning optimization. Obviously with that background he delt with his fair share of OOM errors.

We dive into the common issues faced when dealing with complex data. Traditional methods involve using databases to store metadata and remote storage for assets like images or videos. This approach requires coordination between multiple teams and causes optimization and the famous memory usage issues, especially with Spark and large datasets.



Job of the week

Engineering Manager - Duolingo is hiring someone to lead their new AI Platform (ML Ops) team. They will own the platform powering Duolingo’s AI-powered features, such as learning personalization, notifications, dialogue, and grammar feedback. Role is based in Pittsburg or NYC. 
New Tool Tuesday
LanceDB

Vector databases have stood the test of time. In the 1990s, vector databases were used heavily by genetic researchers. As we transitioned into the 2000s, this technology made its mark on the landscape of search and recommendation systems.

And yet again, 2023 seems to be another groundbreaking year for vector databases. This is mainly driven by the rise of generative AI and the need of building ML systems that hinge on the efficient use of vectors and embeddings.

On the flip side, deploying vector databases comes with its own set of challenges. The setup can be complex and costly, and for ML practitioners, experimenting with an idea before committing to any of the tools can be challenging. To crack these obstacles, a new tool called LanceDB was built.

We recently had the opportunity to speak with Chang She, the co-founder of LanceDB. For more than a decade, he has been working on data science and ML tooling, including pandas library. Here is why Chang She decided to build LanceDB:

"We started LanceDB to provide generative AI developers with a better foundation for building production applications, especially if they have multi-modal data. Serious practitioners would prefer not to have to manage vectors separately from the metadata, documents, images, and other raw data. Depending on the data and use case, the optimal method of retrieval could be vector search, keyword search, or just plain SQL.

The last thing they want is to have a brilliant idea and have to switch out critical data infra before they can experiment on it. So we created LanceDB to fit how developers want to think, instead of forcing them to rewire their brains to fit a vector database API."

LanceDB offers:

1. Zero ops - runs in-process without complicated setup, like SQLite or DuckDB for vector search.

2. Cost-effective scalability - LanceDB doesn’t keep vectors in memory and it’s much cheaper to scale up disk than ram.

3. Combined storage and flexible queries - store all relevant metadata and raw data next to the vectors, and query by vectors, keywords, or just plain SQL.

4. Unstructured data storage - high performance data format for AI data to reduce costs and improve productivity across the whole AI life-cycle from the data lake to real-time serving.

If you're searching for a vector database that meets your requirements, I recommend trying out LanceDB on your local machine. You can get started with a few simple steps.
    Resources
    LLM in Prod 2 Recap

    We released a whole slew of videos from the conference last week. Check 'em out below.

    Blog
    Looking for a job?
    Add your profile to our jobs board here
    IRL Meetups
    Austin, TX - July 14, 2023
    Mexico City, MX - July 19, 2023
    Atlanta, GA - July 21, 2023
    Chicago, IL - July 25, 2023
    Toronto, ON - July 26, 2023
    Seattle, WA - July 28, 2023
    Amsterdam, NL - August 3, 2023


    Thanks for reading. This issue was written by Demetrios and Mohamed Sadek edited by Jessica Rudd. See you in Slack, Youtube, and podcast land. Oh yeah, and we are also on Twitter if you like chirping birds.



    Email Marketing by ActiveCampaign