Why LinkedIn says prompting was a non-starter — and small models was the breakthrough

LinkedIn is a leader in AI recommender systems, having developed them over the last 15-plus years. But getting to a next-gen recommendation stack for the job-seekers of tomorrow required a whole new technique. The company had to look beyond off-the-shelf models to achieve next-level accuracy, latency, and efficiency.

“There was just no way we were gonna be able to do that through prompting,” Erran Berger, VP of product engineering at LinkedIn, says in a new Beyond the Pilot podcast. “We didn't even try that for next-gen recommender systems because we realized it was a non-starter.”

Instead, his team set to develop a highly detailed product policy document to fine-tune an initially massive 7-billion-parameter model; that was then further distilled into additional teacher and student models optimized to hundreds of millions of parameters.

The technique has created a repeatable cookbook now reused across LinkedIn’s AI products.

“Adopting this eval process end to end will drive substantial quality improvement of the likes we probably haven't seen in years here at LinkedIn,” Berger says.

Why multi-teacher distillation was a ‘breakthrough’ for LinkedIn

Berger and his team set out to build an LLM that could interpret individual job queries, candidate profiles and job descriptions in real time, and in a way that mirrored LinkedIn’s product policy as accurately as possible.

Working with the company's product management team, engineers eventually built out a 20-to-30-page document scoring job description and profile pairs “across many dimensions.”

“We did many, many iterations on this,” Berger says. That product policy document was then paired with a “golden dataset” comprising thousands of pairs of queries and profiles; the team fed this into ChatGPT during data generation and experimentation, prompting the model over time to learn scoring pairs and eventually generate a much larger synthetic data set to train a 7-billion-parameter teacher model.

However, Berger says, it's not enough to have an LLM running in production just on product policy. “At the end of the day, it's a recommender system, and we need to do some amount of click prediction and personalization.”

So, his team used that initial product policy-focused teacher model to develop a second teacher model oriented toward click prediction. Using the two, they further distilled a 1.7 billion parameter model for training purposes. That eventual student model was run through “many, many training runs,” and was optimized “at every point” to minimize quality loss, Berger says.

This multi-teacher distillation technique allowed the team to “achieve a lot of affinity” to the original product policy and “land” click prediction, he says. They were also able to “modularize and componentize” the training process for the student.

Consider it in the context of a chat agent with two different teacher models: One is training the agent on accuracy in responses, the other on tone and how it should communicate. Those two things are very different, yet critical, objectives, Berger notes.

“By now mixing them, you get better outcomes, but also iterate on them independently,” he says. “That was a breakthrough for us.”

Changing how teams work together

Berger says he can’t understate the importance of anchoring on a product policy and an iterative eval process.

Getting a “really, really good product policy” requires translating product manager domain expertise into a unified document. Historically, Berger notes, the product management team was laser focused on strategy and user experience, leaving modeling iteration approaches to ML engineers. Now, though, the two teams work together to “dial in” and create an aligned teacher model.

“How product managers work with machine learning engineers now is very different from anything we've done previously,” he says. “It’s now a blueprint for basically any AI products we do at LinkedIn.”

Watch the full podcast to hear more about:

How LinkedIn optimized every step of the R&D process to support velocity, leading to real results with days or hours rather than weeks;
Why teams should develop pipelines for plugability and experimentation and try out different models to support flexibility;
The continued importance of traditional engineering debugging.

You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.

Source link