Jeff Dean On Combining Google Search With LLM In-Context Studying

Dwarkesh Patel interviewed Jeff Dean and Noam Shazeer of Google and one subject he requested about what wouldn’t it be wish to merge or mix Google Search with in-context studying. It resulted in a captivating reply from Jeff Dean.

Earlier than you watch, here’s a definition you would possibly want:

In-context studying, also called few-shot studying or immediate engineering, is a method the place an LLM is given examples or directions throughout the enter immediate to information its response. This methodology leverages the mannequin’s means to grasp and adapt to patterns introduced within the instant context of the question.

The context window (or “context size”) of a giant language mannequin (LLM) is the quantity of textual content, in tokens, that the mannequin can contemplate or “keep in mind” at anybody time. A bigger context window allows an AI mannequin to course of longer inputs and incorporate a better quantity of knowledge into every output.

This query and reply begins on the 32 minute mark on this video:

Right here is the transcript if you don’t want to learn this:

Query:

I do know one factor you are engaged on proper now’s longer context. For those who consider Google Search, it is obtained your complete index of the web in its context, however it’s a really shallow search. After which clearly language fashions have restricted context proper now, however they will actually assume. It is like darkish magic, in-context studying. It might probably actually take into consideration what it’s seeing. How do you consider what it could be wish to merge one thing like Google Search and one thing like in-context studying?

Yeah, I will take a primary stab at it as a result of – I’ve considered this for a bit. One of many stuff you see with these fashions is that they’re fairly good, however they do hallucinate and have factuality points generally. A part of that’s you have educated on, say, tens of trillions of tokens, and you have stirred all that collectively in your tens or lots of of billions of parameters. However it’s all a bit squishy since you’ve churned all these tokens collectively. The mannequin has a fairly clear view of that knowledge, however it generally will get confused and can give the incorrect date for one thing. Whereas data within the context window, within the enter of the mannequin, is actually sharp and clear as a result of now we have this very nice consideration mechanism in transformers. The mannequin can take note of issues, and it is aware of the precise textual content or the precise frames of the video or audio or no matter that it is processing. Proper now, now we have fashions that may take care of tens of millions of tokens of context, which is kind of a lot. It is lots of of pages of PDF, or 50 analysis papers, or hours of video, or tens of hours of audio, or some mixture of these issues, which is fairly cool. However it could be very nice if the mannequin might attend to trillions of tokens.

Might it attend to your complete web and discover the precise stuff for you? Might it attend to all of your private data for you? I’d love a mannequin that has entry to all my emails, all my paperwork, and all my pictures. After I ask it to do one thing, it will possibly type of make use of that, with my permission, to assist resolve what it’s I am wanting it to do.

However that is going to be a giant computational problem as a result of the naive consideration algorithm is quadratic. You’ll be able to barely make it work on a good bit of {hardware} for tens of millions of tokens, however there is not any hope of creating that simply naively go to trillions of tokens. So, we’d like an entire bunch of attention-grabbing algorithmic approximations to what you would actually need: a manner for the mannequin to attend conceptually to a lot and much extra tokens, trillions of tokens. Possibly we will put all the Google code base in context for each Google developer, all of the world’s supply code in context for any open-source developer. That might be superb. It will be unbelievable.

Right here is the place I discovered this:

Related: pic.twitter.com/N8fECkK36M

— DEJAN (@dejanseo) February 15, 2025

I am enamored of mixing many approaches. Listed here are some which might be attention-grabbing and public:

Numerous dense retrieval strategies

TreeFormer (https://t.co/aplh2tS9DM)

Excessive-Recall Approximate High-Ok Estimation (https://t.co/rVcYm5vltU)

Numerous types of KV cache quantization and…

— Jeff Dean (@JeffDean) February 15, 2025

Discussion board dialogue at X.

Leave a Reply Cancel reply

Related News

How Manufacturers Can Win Actual Property in AI-Powered Search

Google Is Stealing Your Worldwide Search Site visitors With Automated Translations