Share
Preliminary evaluation survey results and a QA bot with LLMs
 â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś â€Ś
Our first online course is now available!

Level up on QA bots and get rocking and rollin’ in no time! Sign up here.

P.S. If you want your boss to pay for it I wrote three different emails that you can use as a template to get them to fork over the cash.

Find them in the blog here.

P.P.S. I am not responsible if you use the template and get fired.
Evaluation
As you may know, we are currently doing a survey around evaluating LLMs. There are some clear trends already. We still need more responses though.

All responses will be open-sourced.

Yes. That means everyone will see the raw data. (I will clean PII first)

Here are some sample questions:

  • What aspects of LLM performance do you consider important when evaluating?
  • What data are you using to evaluate your LLMs?
  • Do you have ground-truth labels for your data? If so, how did you generate the labels?
  • Are you using human evaluators in your LLM evaluation process?

TAKE THE SURVEY NOW
Coffee Session
Harnessing MLOps in Finance

Riddle me this Batman…

How do you move personal financial data of over 30 million customers to the cloud?

Oh, and 70,000 staff too.

Thankfully, that’s not my job. It is the job of Michelle Conway, a lead data scientist at Lloyd's Banking Group.

We managed to cover Michelle’s GCP migration including:

  • Working with small releases in restricted locked-down environments
  • Security protocols and checking packages for malware
  • Doing parallel runs and managing two different systems
  • Negotiating security to access data tables
  • Setting up guardrails for when things go wrong

Last thing I will add here.

I loved hearing her experience of being a female in tech. From being the only female studying maths at her uni to how to support more representation now, it was incredibly insightful.

Job of the week

Senior Analytics Engineer // Honeysuckle Health (Australia based, Remote or Hybrid)

As a Senior Analytics Engineer you will develop and maintain data pipelines to help improve health outcomes and reduce healthcare costs. You will manage infrastructure, utilize workflow orchestration tools, integrate data from various sources and implement data modelling.

Candidates should have 5+ years of data or software engineering experience, strong expertise in Python programming and data manipulation libraries and proven experience with workflow orchestration tools like Dagster, Prefect, or Airflow.

Sponsored
LLMOps in NYC

Join the Arize team this week for insights on how to effectively take LLM-powered systems from experimental stages to real-world production environments.

This event brings together developers, researchers, and technology leaders to explore groundbreaking topics essential to building the next generation of robust AI-centric solutions.

Speakers include:

  • Ilan Bigio, OpenAI
  • Aparna Dhinakaran, Arize AI
  • Jerry Liu, LlamaIndex
  • Jonathan Pedoeem, PromptLayer
  • Chaoyu Yang, BentoML
  • Elan Dekel, Pinecone
  • Morgan Gerlak, TCV
  • …and more!


RESERVE YOUR SPOT
Blogpost
Success, what’s not to love?!

That feeling when a model roll out "just works"....

It’s not always like that though. And it’s easy to forget if we just focus on the wins.

That’s why I like this blog by Biswaroop Bhattacharjee and Casper da Costa-Luis.

They share their frustrations trying to improve inference latency in some of the current LLMs.

Frankly, it's relatable.

You can almost hear their cries of pain as they tinker with Llama 2 and ONNX. They go on to share their troubles with optimization and deployment too.

This blog is a good reminder that while it’s great to share success stories, it’s just as important to share the frustration and hard work getting there.

READ NOW
Looking for a job?
Add your profile to our jobs board here
IRL Meetups
Helsinki - September 7
Stockholm - September 14
London - September 14

Austin, TX - September 14
Berlin - September 14
Thanks for reading. This issue was written by Demetrios and edited by Jessica Rudd. See you in Slack, Youtube, and podcast land. Oh yeah, and we are also on Twitter if you like chirping birds.



Email Marketing by ActiveCampaign