
17 Apr 26
Every AI system you've ever used was built to sound right. Not to be right. Dan Klein — co-founder and CTO of Scaled Cognition and professor of computer science at UC Berkeley — joined the Beyond the Prompt podcast to explain why that distinction is the core problem facing enterprise AI today, why chaining models together doesn't fix it, and what it actually takes to build systems you can trust.
Key Takeaways
- LLMs are completion engines optimized to produce fluent, confident outputs — not systems with any internal awareness of whether what they're saying is true
- Fluency is the trap: we've built instincts that fluency correlates with accuracy, but AI systems are fluent even when they're wrong — and that breaks every cue we've trained ourselves to use
- Hallucination isn't a bug in the traditional sense — for generative use cases like image creation or brainstorming, it's the product; the problem is when that same technology is asked to be a reliable source of truth
- The jagged frontier is real: these systems can be superhuman in some areas and surprisingly brittle in adjacent ones, and navigating that gap is a learned skill most people are still developing
- Chaining models to check other models doesn't solve the reliability problem — it just multiplies noisy systems, burns tokens, adds latency, and still can't guarantee anything
- The right fix isn't more prompting or more model layers — it's building models whose fundamental architecture is structured around determinism and reliability from the start
- Metacognition is what's missing: humans know when they don't know something; today's LLMs don't — they just output tokens, and whether those tokens are right or wrong is indistinguishable from the inside
- Nobody is getting new manager training for their AI team — enterprises are deploying AI without investing in the human skills needed to delegate, verify, and edit AI output effectively
- Going from author to editor sounds easy; it isn't — editing requires a different and often harder skill set than writing, and most people don't realize they've been handed an editor's job
- You don't have to build on Jell-O — reliable, trustable AI is possible, but it requires architecture designed for that from the ground up, not bolted on after the fact
Full Transcript
0:00 — Fluency vs. truth
Dan Klein: If you're working as a human with another human and you're trying to delegate to them, you do trust them to come back and say, "Well, I actually couldn't find this information for you, or I got blocked" — as opposed to "I couldn't find it, but here's my wild guess and I'm not going to tell you it's a wild guess." That would not be good behavior from a human, but we see it all the time for machines. The systems we've built really are fundamentally systems designed to produce outputs indistinguishable from the truth. That's different than outputting correct answers. They're fluent. They're confident. The parts we do understand look correct, we assume that everything else is correct. And that's not always true.
0:38 — Meet Dan Klein
Host: Hi, I'm Dan Klein. I'm professor of computer science at UC Berkeley and CTO at Scaled Cognition. I'm excited to talk to you today about hallucinations and reliability in AI. Give us a sense for your background and why somebody who's listening to this episode would go, "I need this — this is one I can't miss." Like, where are you coming from in the world?
Dan Klein: Well, I've been thinking about artificial intelligence for a long time and my background is in natural language processing and human language, so I've been thinking a lot about how we can build these sorts of systems. So much has changed in the time since I started my research work — when I was in grad school, the big problem in natural language processing was like finding the verb. Well, since then we've found the verb and we've got other issues now. A lot of the problems in artificial intelligence historically have come from systems working too poorly, things not working well enough. And a lot of the problems now are coming from this contrast between the ways in which it works very well — maybe even superhuman — and then of course still the ways where there are gaps. And it's those gaps that really are still a problem. My personal interest right now is in trying to figure out how to make systems which are reliable and trustable, and right now that is a big gap.
Host: I presume you're kind of alluding to what's known as the jagged frontier — where some of AI's capabilities dramatically outperform, others dramatically underperform. It's disappointing and the fact that there's jaggedness causes perhaps some jadedness. I've heard anecdotally that more experienced individuals tend to be able to navigate that jaggedness. Do you find that to be true? What helps someone be a deft navigator of the weird, unpredictable capabilities of these models?
Dan Klein: Yeah, that's a great question. I think ultimately that also comes down to really important questions we have to face as a society about digital literacy. The capabilities of the systems we're talking to are very different.
2:58 — Why fluency misleads
Dan Klein: A good example of this would be something like search or machine translation. If you think about the technology in say the 2000s, when you would enter something into a system like Google Translate and you would get a bad translation out, it would also look kind of bumpy and you could tell pretty quickly that the system isn't fluent and therefore it's maybe not accurate. Or if you were doing a search in the standard way we do search, you type in your query and you get back results and you can see well some of these are relevant, some of these are not relevant — and you go into that search process knowing that you're going to have to be doing some filtering. Systems today hide a lot of that from you. The systems are very fluent even when they're wrong. And when systems are fluently wrong and you've built up all of these instincts that fluency correlates to accuracy, it's very easy to not notice mistakes.
Host: Now define fluency there.
Dan Klein: Fluency here is really about the appearance of truth and the smoothness of the language. The systems we've built really are fundamentally systems designed to produce outputs indistinguishable from the truth. That's different than outputting correct answers. And that means there are a couple of problems. One is even the system itself doesn't know when it is outputting a correct versus incorrect answer — when it's guessing. And the reason for that is it's always guessing. It's just sometimes it guesses right.
Host: And that puts a big load on us that we're not used to — systems confidently and fluently giving us answers which are sometimes right and sometimes wrong and you can't tell. Why the load? Because as you just said — I actually love the idea of framing this as digital literacy because I don't think we've had a guest that really talks about it exactly like that. You contrasted with Google and described what I think is very familiar to most of us where we get a bunch of results and then it's incumbent upon us to sort through them. Now why is it any different with an AI? Is it because the appearance of confidence lowers our own inhibition? Why is there a difference?
5:17 — How LLMs guess
Dan Klein: I think it's two things coming together. I think it's partly how the technology works. Fundamentally, all these technologies — anything that's backed by an autoregressive next token predictor — the way they work is at their core they're predicting the next token based on what's come before. They're completion engines. And so if you in its kind of raw state have it complete the sentence "The population of Berkeley is" — well, the system, it's not a database. It's not like there's an entry or not and it has a metacognitive awareness of whether it knows the answer. It's just a matter of what density is predicted over these next tokens and some numbers will come out. And because the system has such a generalized knowledge of language and context and many aspects of how the world works, the population is going to be the kind of population a city would have. In fact, maybe it's seen enough web pages that it'll actually output the correct number. Or maybe it'll output a plausible but incorrect number. All you see is "the population of Berkeley is" and then a number. And you don't know whether it's right or not. There's no certificate of truth that comes with that. There's no process that the system went through to determine whether it did or did not have that knowledge in some discrete way. There's only a claim presented fluently and confidently — and that means the load is on you to figure out: is this one of those times where it's fluent and correct, or is this one of the times where it's fluent and incorrect? As opposed to a lot of experience where you're like, "All right, I'm going to click on this link, half of them aren't right, I'm going to look at it, I'm going to check for signs — this web page looks sketchy, maybe it's not reliable, this translation's got a bunch of disfluencies, maybe other things are going wrong." And all of those cues that we've been trained to detect when the AI fails have been really taken away from us. So the combination of the underlying holes and the misalignment with our experience with past technology ends up being an issue.
Host: Maybe that's a good segue to your startup. You're talking about not just minimizing but completely eliminating them. So how can this technology do that?
7:31 — What is hallucination
Dan Klein: I think the best place to start that answer is to talk about why the current technologies have hallucinations — which really starts with what is even a hallucination. So we talked a little bit already about how a standard transformer model is basically designed to predict the next token and the next one and the next one. We like to talk about as humans — oh that was true, that was correct, or oh that was a hallucination. What does that mean? These systems are fundamentally today just probabilistic systems designed to output plausible continuations. They output plausible next tokens and they are in as many ways as possible going to have the trappings of truth — they're going to be linguistically fluent. And in practice these systems often output tokens that are correct and then they're not correct and you can't tell them apart. So we call it a hallucination because it's confidently wrong. But to the system, this is all just its natural operation.
8:56 — Deception and alignment
Host: We could talk a little bit about deception and what that means. That's kind of malicious, right? I mean, deception to me implies malicious intent. Is that fair?
Dan Klein: Yeah. As humans, we talk about deception that involves an intent to deceive. But this really gets into the topic of alignment. And it's very very easy for systems to become what we as humans might call deceptive. Sometimes these labels fit and sometimes they don't. So for example, let's imagine that we're a shipping company and we want to build an agent that's going to answer questions about package status and you call up and you say, "Hey, where's my package?" This system — there are a lot of ways you could build this sort of system today. One thing you might do is decide to take whatever system you've built and optimize it through reinforcement learning. When you train a system through reinforcement learning, you give it some metric and you say your whole job is to do well on this metric. And so maybe you tell it your job is to get a high net promoter score from your customers, which makes sense — you're trying to make your customers happy. And the system over its operation comes to learn that people actually do not like being told that their package has been lost. And in fact, they much prefer to hear that it's arriving tomorrow. So you say "where's my package" and it's actually lost but the system tells you, "Oh no, it's actually going to be there tomorrow" — because it's seeking the reward of a high NPS, it is doing exactly what you told it to do, which is to choose actions that make the customers happy. And so then you can get into this process of saying, "Oh well, maybe I didn't mean make the customers happy at all costs" — and now you're in this very very hard problem of trying to specify exactly what the system should be optimizing, how it should trade off truth and happiness. And is a system that makes that error — that says your package is coming when it's not — is that a hallucination? Is it reasonable to call it that? Is it a deception? While it feels like it might be deceptive, because in a human that sort of action would be characterized as deception, what it really is is just efficiently optimizing an objective which maybe isn't what anybody really wanted.
Host: Is this why I think I read somewhere you said that people building agents on top of the foundational models has a kind of fundamentally broken approach? Is this the core reason?
11:28 — Why agents break
Dan Klein: Yeah, there's a lot right now in industry of taking these foundation models as they are today and building a thin wrapper on top of that to build an agent. And that hasn't really worked very well. These are systems which are inherently non-deterministic. If you're calling up your package agent or you're trying to get a refund or change your flight or something like that, you aren't really looking for a system that uses all of the incredible breadth and strength of today's frontier models — like you don't want it knowing a lot about quantum physics and being able to give you iambic pentameter. What you want is you want it to truthfully and reliably reflect what is in the database. And when it says it did something, you want it to actually have done it. And those are the places where current systems are weak. You have this misalignment where these systems are strong in ways they don't even really want and they're weak in ways they need. And this is fundamentally because they're building on a soft probabilistic technology. Typically, it is hard to build a reliable and deterministic technology out of non-deterministic elements.
12:48 — Chaining and determinism
Host: Is the hack there to kind of chain it together with deterministic rule-based kinds of automations or triggers, or is there a more foundational workaround that you're advocating or building?
Dan Klein: Well, the most common approach out there is exactly what you said. You have some system and it is going to do something noisy that's not reliable. You can then bring in a second large language model with instructions to check the first one. And as the joke goes, now you have two problems — because you've got noisy systems checking noisy systems. You get a cascade, or you run fifteen of these in some kind of constellation and any of them can make mistakes that might get caught or might not, and you pay this high price in latency. You have to run a model to check a model to check your model which then gets checked by some other model. It takes a long time. It burns a lot of tokens — which is great if you're in the business of trying to use as much computation as possible. But if you want small efficient systems that get it right in the first place, this is not really going in a good direction.
What we do at Scaled Cognition is instead we build models whose fundamental operation is different and that come along with a big class of determinism that we can guarantee because of how the model is structured and how it operates.
Another approach that people use today is they take this LLM — which really as an artifact we've built is incredible in its potential breadth, like you can ask ChatGPT about anything — and now you have the system and it's hallucinating. It's maybe telling you it cancelled your flight, but it didn't, or the other way around. And you're trying to get this system to do something reliable. And the instinct people have, and the only real tool they have, is to just squeeze down its domain until it's doing almost nothing — like the deterministic rules you're talking about. You say, "All right, LLM with this incredible power, all you get to do is to decide whether the user said it wants to talk about payments or bill." And that's it. That's all you can do. And so you've got a system that's like being asked to do only this small thing. It's like an 18-wheeler to deliver one letter — or having a Porsche but you only push it down the road. You're wasting the volume that this 18-wheeler has. You're wasting its power. And there's a better solution out there, which is get somebody on a bike — it'll be silent. And what we're doing is building models which have completely different control and performance profiles that line up better with this industry need for determinism.
16:02 — When hallucination helps
Host: Given the flaws that you've described with LLMs and this amazing horsepower 18-wheeler, what do you think they're good for? Do you think they're good for anything? And if so, what are the use cases where you go, of course, you should be using the horsepower here?
Dan Klein: Yeah, I think they're good for a lot. I mean, the key is really in the name — these are generative AI systems. They're good at generating. They're good at generating content. People complain about hallucinations in these cases where reliability is important, accuracy is important. But in many cases where these systems are the strongest, where they were developed, where they took off first, the hallucination was the product. And so for example if you're Midjourney and somebody comes to you and says, "Hey, I'd like a picture of a mouse holding a balloon" — not only is the whole purpose to get something new and creative that matches what you've asked, it's actually very important that it not replicate something it's seen. You don't want a copyrighted picture of Mickey Mouse. And so here, what we would call hallucination in another context — the confident and fluent, which in this case has to do with visual fidelity and plausibility of the elements of the image — here you want the fluency with the creativity. If you go to ChatGPT and say, "Give me an idea for a short story," you don't want something that Isaac Asimov wrote. You want something that is new. And in all of these cases, you're asking for the power of creativity, of generation, and you're asking almost to be guaranteed a hallucination. And then you take the same technology and you turn around and say, "Actually, all right, I changed my mind. Now I want you to only do accurate things. I want you to only reflect the state of the database and I want you to follow these rules precisely." And they're just not good at that. They're not built for that.
Host: I read somewhere that you only do synthetic data, and AlphaGo is the only other kind of operation that does that. I'm also curious — as far as I can see, you haven't raised that much money, so you're building this very powerful model but with a relatively small team. How do you square this idea that there seems to be this completely weapons-raised race where everybody's trying to hire as many people as possible and putting as much money as they can into these models, and here you're building what seems to be quite a unique model with less money and only synthetic data?
19:20 — Beyond scale for reliability
Dan Klein: Yeah, I'd be happy to. So let me talk about a couple of things because there's a bunch of interesting questions in there. One is the question of scale — what things come from scale and what things don't — and then I can talk a little bit about synthetic data because I think that's a really important question.
It feels like these systems went from knowing nothing about the world and not really being able to do anything contextual or sophisticated with language, to just seeming to know everything that people know, in such a short amount of time. It feels explosive. But one thing that I think is important to notice is that what's driving them is really the web. They are distillations of all of this information that humans have written down about everything that humans find worth talking about — which is basically everything we know. And so the web did not spring up in the last few years. It's been slowly accreting for decades. What we found is a way to compress that in a queryable and remixable way. But the initial scaling, the sort of explosive growth of systems that just seemed every iteration so much smarter — a lot of that just came from being able to tap into that data, which partly meant scaling up to use it all because there's a lot of it. It meant making the models big enough that the parameters could hold the information that was being fed into them. It meant getting enough compute that you could do the translation between the declarative data you're training on and the appropriate learned representations and weights. And all of that together unlocked the potential of that data. But there is a data wall — we only have so much Wikipedia, and eventually you can only extract so much juice from that orange. And that's why you're seeing that systems as they scale up get diminishing returns.
When people look at technologies at the beginning, it always looks like an exponential curve. And people have a tendency to look at that and assume it will continue forever. But those exponential curves are almost always S-curves. It always looks like an exponential. It always turns out to be an S-curve. And the next step of progress is switching onto some new idea.
For making systems reliable, for making systems follow policies, for making them guarantee certain properties — scaling up in this way is an incredibly inefficient way of achieving that and hasn't been particularly successful. If what you want is reliability and determinism, you want a different set of techniques. So one of the things we're also seeing is models that are constructed in different ways that are able to have different performance characteristics — that's the space where we fit in. With our models you don't need to learn in this very expensive way. For example, let's imagine that you wanted to learn French and you were going to learn it just by reading books written in English, noticing every now and then, "Oh, here's some French" — devouring thousands upon thousands of books and picking up these little bits of French as you go. Well, eventually this would work. But you know, you pick up one or two French books and you're going to be further along. The sort of data efficiency or sample complexity that is associated with a given kind of data, a given kind of model, a given learning mechanism can give you vastly different performance curves. Just because you can get there in the limit of infinite everything doesn't mean there's not a much much better way.
Host: So for practicality — for people sitting out there building stuff, where most people I would imagine use Claude, Gemini, whatever — do you think there already should be more of a strategy of saying let's just look at what the world of models looks like and then figure out what the exact use case is and decide if there's a better model? I'm not sure that that is even happening right now. Is that a fair assumption?
Dan Klein: I think for some things that is absolutely the right strategy — to say what performance characteristics do we need, what do we not need, what kind of model exhibits those performance characteristics with the optimal efficiency, be that data efficiency or compute efficiency at deployment or availability of compute nodes or whatever it is that is your constraint that you care about. There are going to be some problems that are best solved by just bigger and bigger models trained on bigger and bigger things — those are problems that involve breadth, problems that involve very complicated contextual understanding. But where you have problems of determinism, reliability, truth — that is not the best attack we have today. Pursuits of unreliable general intelligence and reliable specialized intelligence are going to require different methods.
Host: I was still curious about the small team making what seems to be such a big opportunity. You're basically saying I'd like to make a model for a specific purpose. You can't compete with the OpenAIs of the world because that race is probably done. But you probably could go out and find a bunch of use cases where a specific model will be very useful and then go make land.
Dan Klein: Yeah. I think because of the successes on the axes that benefit from scale, people are very much now thinking about scale, scale, scale — and obviously that requires a ton of capital, a ton of compute, big teams, because anything that's scaled up requires a whole bunch of support structure. But again, our models work in different ways and the focus of our model is not extracting the full breadth of human knowledge from the information as found. It's working on a specific class of interactions — which I would characterize as interactions where you have a person who has all the contextual situation that is relevant to human language in terms of having a conversation, referring back to things that have happened before. On the other side, you have a set of functionalities — you can call them tools or APIs — that have logic behind them and ways they can be chained together and semantics that govern what flow of information through those tools means. And it's the person talking to this orchestration of backend functionalities. That is a huge class of interactions that share a bunch of properties. They need to be reliable — if you say you want three tickets and it says here's three tickets but it secretly only booked one, like that's bad, you're going to find out, there's going to be a high cost to that. On top of that, there are going to be policies and rules and ways in which these things can and should be used, and those rules may change. And so this is just a case where for this kind of interaction, the existing models are not only expensive, they're not very good at this. And being able to build a model that is better — there's an upside to it also being smaller, but you also just can build a model that's better when you architect that model fundamentally to be structured around these sorts of operations. And that's the approach we took.
I think right now there are really two kinds of companies that are dominating the market in terms of number of companies operating in these ways. One is the companies that are very very big, doing everything at the most massive scale imaginable. And there are companies building these thin wrappers — where what is the technology there? It's like it's probably some prompt. And I think we are trying really as a community to figure out what is beyond the prompt here. And for us that is models that have additional control surfaces, additional performance characteristics, reliability, the ability to guarantee certain kinds of behaviors.
In building things as a society — if you teach CS 101 — one of the most powerful tools we've had historically for building large reliable systems has been modularity. The ability to take pieces, work on them independently, and say this piece — we're going to work on it, but we're going to guarantee you that this kind of input produces this kind of output, and there's a contract, and there's an abstraction. And this has been one of the biggest challenges in this AI age — LLMs come with absolutely no contracts beyond "you will get tokens out if you put tokens in." And this is one of the key things that it was clear to me needed to be different for a specialist model that was going to be deterministic. There needed to be guarantees that you could make about what goes into the control surface and how that relates to what comes out on the other side, if you want to be able to build reliable things out of it. So we started there and we started thinking about what kinds of systems you could build. That led us to how are they structured, what kind of data do you need, and now we're into the synthetic data training.
30:46 — Synthetic data training
Dan Klein: It turns out that the amount of data you need to get a certain behavior — that sort of sample complexity question — can be different by orders of magnitude. It's like learning French from a French book versus incidental French uses in English novels. It's just a very different scale characteristic and it's a better operating curve to be on. It doesn't mean the other approach doesn't bring you gains in different cases.
31:12 — Enterprise agent use cases
Host: Can you tell us — just to make it easy to imagine — what's a quintessential deployment? If you look at this as a textbook deployment of Scaled Cognition, who's the user, what are they trying to do, what's the impact to their workflows or life or business?
Dan Klein: It's a great question. The textbook deployment is the class of conversations — if I put it abstractly — that are conversations between a person on one side and a bunch of APIs on the other. This might show up for example in an enterprise-to-customer context where the customer is doing maybe a customer support kind of thing — changing a flight, changing a hotel, making a purchase, getting a refund — where the person comes with all of this context and everything they want to express is in language, and on the other side there's a whole bunch of APIs that can handle this: who are you, how are you going to get authenticated, what's your purchase history, what exactly are we talking about, what's your account balance, navigate all of that in accordance with policies. For us, our typical partner is going to be an enterprise. They want to build an agent. It's very important to them that the system be reliable, policy compliant, and also secure in a variety of ways. Our best customer is some enterprise that cares a lot about not making mistakes, not having things like hallucinations or policy violations, and where essentially the conversations they have — they want to automate them, but it's high stakes to get it right. Now so far pretty much all the enterprises we've talked to feel like it's important to get things right and that their conversations with their customers are high stakes. But if you think about finance or healthcare or cases where not only do you not want to mess up for your customers, but where the consequences might be health consequences or financial consequences or regulatory consequences — then there's even greater sensitivity to wanting to make sure that the systems are doing what you've instructed them to do, that you have audit traces for that, that you have control surfaces, that you can change the behaviors if you need to.
33:48 — Healthcare risks
Host: Why do you think that OpenAI, when they launched their healthcare GPT recently, thought that their system would work just fine doing it?
Dan Klein: First of all, you also have this issue that any company with a big hammer is going to go treating everything like a nail. And so you can absolutely take a generalized probabilistic intelligence and try to do something specialized with it — in the same way that I can train a person to compute square roots, but my calculator will do it faster and more accurately and with a whole lot less energy use. And so depending on the problem you're trying to solve, there are going to be multiple approaches to it. And constellations of nondeterministic models — clearly people are trying that now. You actually do see a lot of news articles about these sorts of things face-planting either because they're not reliable, or they instead of following your refund policy follow some refund policy from Reddit in 2005, or you get them off topic and suddenly your customer service agent is talking about something you absolutely do not want screenshotted and shared. And so there are failure modes to these systems. You can chain these things together and make a go at it. I just don't think that's going to be the most reliable way. It's certainly not the obvious way to me. If that's the only tool you've got, that's what you do. And for a company that either is building these big models or is wrapping them, that's the approach they're going to take. And for what it's worth, if you're a big model company that sort of sells by the token, you're probably okay with mechanisms that require spending tokens to check the other tokens to check the other tokens. The same thing with reasoning models — a canonical kind of reasoning model, just to give a caricature, is like run the thing ten times and then look at what you've got and pick the best one. If you're the person who is paying for ten times the compute, you don't love that as a solution. If you're the person who's being paid per compute, this sounds amazing.
Host: That's hilarious. The unreliability is actually a feature for the model provider if it's being paid on number of at-bats.
Dan Klein: Totally. Absolutely.
36:26 — What Dan teaches students about AI
Host: When you teach students about AI, what is the most important thing you think they should know as they leave the course?
Dan Klein: If I had a compact answer, my course would be a lot shorter. I think that answer has changed. So when I started teaching AI — something like 20 years ago — one of the things we did early on is we showed this checklist of things that humans can do, including playing chess or going to the supermarket, and we asked the class, "Okay, raise your hand — do you think a computer can do this?" And in the past 20 years, it went from mostly no to mostly yes.
At the beginning, we would really focus on these core ideas — what is AI, what are the kinds of problems, deterministic versus non-deterministic, adversarial versus cooperative, single agent versus multi-agent. We talked about these different specialized kinds of problems which required specialized solutions and specialized representations. Back then, the reason why some people worked on computer vision and others on natural language processing and others on robotics was because getting any of that to work required incredibly specialized representations, incredibly specialized algorithms, different kinds of data, different kinds of learning. And we would focus on understanding that kind of breadth and what unified it — at the time this notion that an agent is a system that makes optimal decisions given its information towards its objective function.
Now a couple things have happened that are interesting. One is there's a lot more uniformity to AI. As a natural language person, I think it's great that we've decided that language is kind of a good operating system for AI. But one of the things that I think is very important now that we didn't talk about before is starting to get into these large-scale societal trends. We started talking about digital literacy. AI is going to have a huge impact on how people learn, how people work, what jobs are available. When the key problem in AI was that nothing worked except maybe some game playing here or there, we spent a lot less time on that than now — when the technology's downsides are such that we could have a whole class and do on those sorts of things.
39:16 — The enterprise literacy gap
Host: I saw a stat from an online education platform — something like 1% of enterprises are investing in skills or AI literacy. Why do you think that is? Why are 90-plus percent investing in the tech but only a fraction seeing the literacy problem?
Dan Klein: I think there are a couple of things going on. I think there's more like FOMO attached to missing the wave on the technology — enterprises are being transformed and you don't want to be the only one that's not. And that feels like it's about the technology. I think people have underestimated the skills and human training aspect of this — that whatever technology there is, you can get more or less out of it depending on how humans engage with it. I think also people just as a society have underestimated how poorly the instincts we have for digital literacy translate. Knowing how to find information with search is actually pretty different than knowing how to validate information that comes out of a chat system like ChatGPT.
One way to think about that is people who were doing a lot of writing or researching now are doing something that looks more like review or editing. And the difference between being a writer and being an editor is really big. But if you're used to being a writer and now some large language model is writing your email for you, you don't think, "Ah, now I'm an editor, I don't have those skills." You think, "Oh, it just did my job for me. I can just press send." And I think it's going to take people time — because this technology has appeared so quickly — to realize that for all the skills that maybe are less necessary, there's an equal number of skills that not only are they very necessary, but we're not good at teaching them. We may not even have good names for what they are.
41:30 — Delegation and AI management
Host: What is the skill of taking a fluent-looking output and distrusting it?
Dan Klein: Well, I think it's funny — for all that we talk about AIs as assistants, how few people have actually ever had an assistant. So how do I work with an assistant? Well, I'm learning with this chatbot. I'm getting my on-the-job managerial training with a chatbot. That's not a recipe for success. And people aren't going to AI literacy training in the same way that there's new manager training. How do you delegate? How do you verify? How do you check people's work? How do you mentor? Those are all things that typically professionals learn over the course of a career. And now we're all given intelligence on tap. And the problem is actually people don't know what to do with an assistant, let alone a capable junior employee.
Host: And the optimistic take on this would be well, people will learn these things over the course of a career — it's just that it hasn't been a career's length of time that we've been working with these systems.
Dan Klein: There is more going on because I think if you're working as a human with another human and you're trying to delegate to them, you do trust them to come back and say, "Well, I actually couldn't find this information for you, or I got blocked" — as opposed to "I couldn't find it, but here's my wild guess and I'm not going to tell you it's a wild guess." That would not be good behavior from a human, but we see it all the time for machines. So I think there is the problem that you mentioned — which is that people may not have the skills to delegate and manage humans — and then there's the additional layer that these systems do not act the way a human does in a delegation context.
Host: Say more about that — the system doesn't act as a human does in the delegation context.
43:48 — Metacognition: what AI is missing
Dan Klein: I think a lot of this boils down to something called metacognition. If you had to put a finger on what systems don't have today — we've been talking a lot about reliability, determinism, whatever you want to call it. If you think about them as cognitive systems, the thing they're lacking is metacognition. In humans, we don't just think — we think about those thought processes. When you ask me a question, I stop and I think: do I know the answer? And maybe I do, or maybe I don't. And if I don't know the answer, I may make a decision to bluff. I may make a decision to just keep quiet or to change the topic. I get to decide what to do about that lack of information. But the fact that I have an explicit representation of whether or not I have the knowledge — knowledge about knowledge, cognition about cognition — this is metacognition. Systems don't have that.
Back to the example of what's the population of Berkeley — it's just cranking out tokens. Whereas a database would be different. You do the query and you get the answer and you display it, or you don't get the answer and you say entry not found. The database does not have the breadth and the contextuality and all of those certifications the AI system has, but they are in some sense more metacognitive — they know whether or not the information is present. And ultimately full intelligence requires both. An LLM today lacks the metacognition.
Host: So this is — to your point about the person in the delegation relationship — what you're saying is they have the metacognition or the self-awareness to say "information not found."
Dan Klein: Effectively, right. Manager comes to me, hey, can you do this? You go, I studied biology, not physics — I don't know how to do that.
Host: I just want to push back a little or at least explore the resistance. I had an experience just yesterday with Claude as an example which can kind of serve as that "information not found" — because I read Ethan Mollick's post about giving Claude Code an assignment to generate a new thousand-a-month business or something. I just grabbed his prompt, dropped it in Claude Code, and it was kind of interesting because Claude immediately came back to me: "Jeremy, I've got to level with you — there's no such thing as a business that generates a thousand bucks a month with no effort. Now what I can do is this." And to me, if it were purely a function of sycophancy and next token prediction, I think it would just probably optimistically say "oh you could do this." The fact that it kind of pushed back — how do you square that example with the definition we've kind of been working with around what language models do?
Dan Klein: So I would say there are three levels where what you're talking about happens. What I've been talking about is sort of the caricature — the simplest version of a completion engine. Systems are evolving, and systems aren't just purely trained to produce mimicries of web text. There are additional steps of training — things like alignment training, instruction training. If you do something like RLHF and you are showing the system "okay, in this situation, I don't like that you did this, I like this one better" — that training does happen. And that certainly governs style. If you talk to one of these models and you get the stylistic "that's an excellent question and it gets to the heart of the matter" — that's coming from how they were told in their post-training to answer. And so when the system said "there's no business that will give you $1,000 a month" — well, where did that come from? It could be there's a web page out there that's like "why businesses won't give you $1,000 a month." Like it could actually just be that it is regurgitating something that's out there. Or it could be coming from some explicit post-training which is like: when people ask this stuff, tell them they can't do it. It is still regurgitating its data. And as these systems get stronger, they will increasingly have a system checking — these systems will increasingly become, if not fully metacognitive, they will start to have those sorts of behaviors. It's very difficult from the outside to tell if the system was aware that it didn't know the answer, or whether it was aware that it's supposed to say it doesn't know in that specific context.
49:14 — How Dan uses models himself
Host: When you use models yourself, are you a Claude or ChatGPT person, or do you yourself now use different models in your personal work?
Dan Klein: Yeah, I'm going to give you a boring answer to this because the truth is kind of boring — which is I do use a lot of things because I want to know what these systems do. I want to know where their strengths are, what their weaknesses are. I would say the biggest thing about my experience with large models that maybe differs from a lot of people I talk to is I'm often asking questions to which I already know the answer, and using that as a way to check — all right, what came back, how much of this was right, how much of this was wrong — and then when I ask a question to which I do not know the answer, I sort of assume that the accuracy rate is comparable. I'm always trying to calibrate.
And I am continually impressed by two things. One, the breadth and plasticity and flexibility of these models — it's still just amazing. I mean, when I started this, it was like, will we ever be able to find the verb reliably in a sentence? And now I've got a system that I can ask about quantum mechanics and get an answer. Now, I'm not an expert about quantum mechanics, but when I ask it about things where I am an expert, the answers come back sort of right, but almost always there's a critical flaw in the answer. And I know I can't find the equivalent flaws in other areas. And that has really impressed on me how important it is to recognize that all of us — no matter how much we know this information — are susceptible to looking at a system's outputs. They're fluent. They're confident. The parts we do understand look correct. We assume that everything else is correct. And that's not always true.
Host: I think that's a good lesson for everybody to take with them — maybe try to test their model of choice on something they already know. One thing I do like the "already know" approach. The other thing is the task where there's not a right answer necessary. I'll give you the exact thing and you can extrapolate. I had an interview with Time magazine recently where I was being interviewed, not being the interviewer, and I just on a whim took the transcript and said, "Hey Claude, I've got a chief of staff that's kind of trained on a lot of my blog posts, things like that. Hey, how'd I do?" And it was so deeply insightful. And then I said, "How should I?" And part of its critique was "you treated this interview like a keynote — when you're being interviewed, your job is to be the sous chef, to help the master chef put the ingredients out. And you made this journalist's job more difficult, not more easy. You rambled, you buried the lead, you followed what you're curious about, not what they're curious about." And I said, "Great. Now can you help me prepare for the next one?" Because I had another media interview. And Claude said, "Sure, give me the email that the person sent you." I gave it to him. Boom. Topics to avoid, things that you will want to talk about that this interviewer is not interested in. It was so good I actually kept it on the screen during my next interview. To me that's this whole other class of capability — I don't have the means to have a comms expert on my team, but now I have a comms expert, or at least a reasonable approximation. And even if it's not great, it does give me more confidence than the alternative, which is zero.
Dan Klein: This is totally right. And I think this gets at what you asked earlier about where are these systems strong — and this is exactly where they are incredibly, mind-blowingly strong. The ability to take all that information, cross-reference it with all this context that's out there in terms of how you should run media, that's scattered across the web, and to be able to distill things down — that sort of contextuality and breadth is just stunning. And in this case, the fact that it could read it all, pay perfect attention to it all, then give you something that if it were wrong, you would know — and you were asking it for its generative capabilities. You were asking it to take all of this stuff, mix it together, and give you a synthesis. That is a place where these technologies are incredible. It's just that's not every use case. But when you have that use case, that really lines up well. You were in the loop. You were being the editor. You were like, "Oh, I love this, I love this." And maybe there's something in there you weren't going to say, but that's okay because this gave you a lot. I could draw an analogy to early machine translation — you could read some of it and then there was some stuff that wasn't really even in the language you were expecting, and you didn't get everything, but it was better than nothing. And in situations where something is better than nothing, mistakes are going to get caught by the consumer and the primary product you wanted was this novel synthesis. We could call it a hallucination. It was just a really useful one for you.
%203.avif)


.png)


