Share
Domain-specific language models (DSLM)
 ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌
AI Twitter is gaslighting me.
 
Coffee Session
  • Start-Up Vs Enterprise

Aaron Maurer & Katrina Ni, Senior Engineering Manager and Machine Learning Engineer at Slack, talked about recommenders and search at Slack.

The technical stack around Slack's recommender system is a recommender API. At Slack, there are a whole lot of different situations that require the recommendation of users or channels in some context. Hundreds of products are set up with their framework by default. Slack must make this process generic in terms of serving it on their back end and also presenting it to other teams with Slack.

Search and recommendations are fundamentally different problems and typically require fairly different stacks. However, they often share a lot of components. When building, they both have a retrievable phase, a re-ranking phase, and in some cases, a role-based phase at the end. Historically, Slack's tent pole has been in search. They didn't have a recsys use case to build around in the beginning like other companies. But they use a few things from search to power the recommender API. It started off as an API to solve generic problems using queries with the context of users and channels. Over time the API made partnering with other teams made delivering stuff easier.


 
LLMs In Production
  • SLMfast

Andrew Seagraves, VP of Research at Deepgram, talks about the importance and use of small, fast, domain-specific language models (DSLM) as opposed to large foundational language models.

Deepgram is a speech-to-text startup, with over one trillion minutes of processed audio data. They believe that language is the universal interface to AI and that business needs to adapt AI to make them useful.

Deepgram predicts that many businesses will start to derive tremendous value from language AI products over the next two years. In the short term, the most impactful products will combine technologies in a multi-modal pipeline. In other, for businesses to derive maximum benefits from language models, they must be cost-effective, reliable, and accurate.

Their language AI pipeline provides this solution for businesses. It contains three stages. First, the Automatic Speech Recognition (ASR) system complemented by a diarization system that predicts speech for the audio and formats the transcript. Next, it passes through the understanding layer where the transcript passes through a language model to produce a distilled output like a summarization of the transcript, detections of topics, sentiments, etc. Finally, there is the interaction layer, which takes the output of the LLM and turns it into audio, using text-to-speech.

YouTube
 
Book Review
  • Understanding Machine Learning Systems by Chip Huyen.

Data Sources  
An ML system can work with data from many different sources. They have different characteristics, can be used for different purposes, and require different processing methods. Understanding data sources enables the efficient use of data. Common data sources include user input data and system-generated data. User input data is data that is explicitly input by users. System-generated data is the data generated by different components of a system, which include various types of logs and system outputs such as model predictions.


 
We Have Jobs!!
From now on we will highlight one awesome job per week! Please reach out if you want your job featured.

  • Senior Data Scientist at Stack Overflow - If you enjoy developing and deploying NLP solutions, this is an place to do something interesting with NLP. We can all imagine what comes next. Did someone say StackGPT?

IRL Meetups

Bristol - May 15, 2023
Madrid - May 23, 2023
Toronto - May 23, 2023
San Francisco - May 24, 2023
Stockholm - May 25, 2023
Munich - May 25, 2023
Amsterdam - June 15, 2023

Thanks for reading. This issue was written by Nwoke Tochukwu and edited by
Demetrios Brinkmann and Jessica Rudd. See you in Slack, Youtube, and podcast land. Oh yeah, and we are also on Twitter if you like chirping birds.




Email Marketing by ActiveCampaign