Name: Andrej Karpathy: Software Is Changing (Again)
Uploaded: 2026-03-09T01:01:16.553
Duration: 2371 s
Description: Read the full transcript of "Andrej Karpathy: Software Is Changing (Again)" by Y Combinator. Practice English listening and reading with interactive subtitle...

4.0s Tesla Andre Carpathy. Tesla Andre Carpathy. Music Music Hello.

7.2s Tesla Andre Carpathy. Music

11.4s Music Hello.

22.8s Um, okay. Yeah. So I'm excited to be here today to talk to you about software in the era of AI. And I'm told that many of you are students like bachelors, masters, PhD and so on. And you're about to enter the industry. And I think it's actually like an extremely unique and very interesting time to enter the

24.8s here today to talk to you about software

27.2s in the era of AI. And I'm told that many

30.6s of you are students like bachelors,

32.6s masters, PhD and so on. And you're about

34.4s to enter the industry. And I think it's

36.4s actually like an extremely unique and

37.8s very interesting time to enter the

39.0s industry right now. And I think fundamentally the reason for that is that um software is changing uh again. And I say again because I actually gave this talk already. Um but the problem is that software keeps changing. So I actually have a lot of material to create new talks and I think it's changing quite fundamentally. I think

41.4s fundamentally the reason for that is

43.0s that um software is changing uh again.

47.6s And I say again because I actually gave

49.9s this talk already. Um but the problem is

52.6s that software keeps changing. So I

54.1s actually have a lot of material to

55.2s create new talks and I think it's

56.7s changing quite fundamentally. I think

58.2s roughly speaking software has not changed much on such a fundamental level for 70 years. And then it's changed I think about twice quite rapidly in the last few years. And so there's just a huge amount of work to do a huge amount of software to write and rewrite. So let's take a look at maybe the realm of

60.3s changed much on such a fundamental level

62.0s for 70 years. And then it's changed I

64.6s think about twice quite rapidly in the

66.9s last few years. And so there's just a

68.6s huge amount of work to do a huge amount

69.8s of software to write and rewrite. So

72.3s let's take a look at maybe the realm of

74.2s software. So if we kind of think of this as like the map of software this is a really cool tool called map of GitHub. Um this is kind of like all the software that's written. Uh these are instructions to the computer for carrying out tasks in the digital space. So if you zoom in here, these are all

76.1s as like the map of software this is a

77.8s really cool tool called map of GitHub.

80.0s Um this is kind of like all the software

81.9s that's written. Uh these are

83.4s instructions to the computer for

84.6s carrying out tasks in the digital space.

86.4s So if you zoom in here, these are all

88.0s different kinds of repositories and this is all the code that has been written. And a few years ago I kind of observed that um software was kind of changing and there was kind of like a new type of software around and I called this software 2.0 at the time and the idea

90.1s is all the code that has been written.

91.7s And a few years ago I kind of observed

93.6s that um software was kind of changing

95.8s and there was kind of like a new type of

97.8s software around and I called this

99.7s software 2.0 at the time and the idea

102.3s here was that software 1.0 is the code you write for the computer. Software 2.0 know are basically neural networks and in particular the weights of a neural network and you're not writing this code directly you are most you are more kind of like tuning the data sets and then you're running an optimizer to create to

104.6s you write for the computer. Software 2.0

106.8s know are basically neural networks and

108.8s in particular the weights of a neural

110.3s network and you're not writing this code

113.3s directly you are most you are more kind

115.4s of like tuning the data sets and then

116.9s you're running an optimizer to create to

118.4s create the parameters of this neural net and I think like at the time neural nets were kind of seen as like just a different kind of classifier like a decision tree or something like that and so I think it was kind of like um I think this framing was a lot more appropriate and now actually what we

120.9s and I think like at the time neural nets

122.6s were kind of seen as like just a

123.6s different kind of classifier like a

124.8s decision tree or something like that and

126.2s so I think it was kind of like um I

129.0s think this framing was a lot more

130.2s appropriate and now actually what we

132.2s have is kind of like an equivalent of GitHub in the realm of software 2.0 And I think the hugging face is basically equivalent of GitHub in software 2.0. And there's also model atlas and you can visualize all the code written there. In case you're curious, by the way, the giant circle, the point in the middle,

133.5s GitHub in the realm of software 2.0 And

135.8s I think the hugging face is basically

138.1s equivalent of GitHub in software 2.0.

140.7s And there's also model atlas and you can

142.4s visualize all the code written there. In

144.2s case you're curious, by the way, the

145.4s giant circle, the point in the middle,

148.3s uh these are the parameters of flux, the image generator. And so anytime someone tunes a on top of a flux model, you basically create a git commit uh in this space and uh you create a different kind of a image generator. So basically what we have is software 1.0 is the computer code that programs a computer. Software

150.9s image generator. And so anytime someone

152.9s tunes a on top of a flux model, you

155.0s basically create a git commit uh in this

157.1s space and uh you create a different kind

159.1s of a image generator. So basically what

161.6s we have is software 1.0 is the computer

163.6s code that programs a computer. Software

165.9s 2.0 are the weights which program neural networks. Uh and here's an example of Alexet image recognizer neural network. Now so far all of the neural networks that we've been familiar with until recently where kind of like fixed function computers image to categories or something like that. And I think what's changed and I think is a quite

168.7s networks. Uh and here's an example of

170.7s Alexet image recognizer neural network.

173.5s Now so far all of the neural networks

175.0s that we've been familiar with until

176.4s recently where kind of like fixed

178.2s function computers image to categories

181.7s or something like that. And I think

183.4s what's changed and I think is a quite

185.2s fundamental change is that neural networks became programmable with large language models. And so I I see this as quite new, unique. It's a new kind of a computer and uh so in my mind it's uh worth giving it a new designation of software 3.0. And basically your prompts are now programs that program the LLM.

186.7s networks became programmable with large

189.6s language models. And so I I see this as

192.2s quite new, unique. It's a new kind of a

195.0s computer and uh so in my mind it's uh

198.0s worth giving it a new designation of

199.6s software 3.0. And basically your prompts

202.2s are now programs that program the LLM.

205.7s And uh remarkably uh these uh prompts are written in English. So it's kind of a very interesting programming language. Um so maybe uh to summarize the difference if you're doing sentiment classification for example you can imagine writing some uh amount of Python to to basically do sentiment classification or you can train a neural

208.3s are written in English. So it's kind of

210.4s a very interesting programming language.

213.6s Um so maybe uh to summarize the

216.8s difference if you're doing sentiment

217.9s classification for example you can

219.4s imagine writing some uh amount of Python

222.5s to to basically do sentiment

224.2s classification or you can train a neural

226.0s net or you can prompt a large language model. Uh so here this is a few short prompt and you can imagine changing it and programming the computer in a slightly different way. So basically we have software 1.0 software 2.0 and I think we're seeing maybe you've seen a lot of GitHub code is not just like code

227.8s model. Uh so here this is a few short

230.0s prompt and you can imagine changing it

231.3s and programming the computer in a

232.8s slightly different way. So basically we

234.6s have software 1.0 software 2.0 and I

237.6s think we're seeing maybe you've seen a

239.7s lot of GitHub code is not just like code

241.9s anymore. there's a bunch of like English interspersed with code and so I think kind of there's a growing category of new kind of code. So not only is it a new programming paradigm, it's also remarkable to me that it's in our native language of English. And so when this

243.5s interspersed with code and so I think

245.4s kind of there's a growing category of

247.4s new kind of code. So not only is it a

249.2s new programming paradigm, it's also

250.9s remarkable to me that it's in our native

252.7s language of English. And so when this

254.9s blew my mind a few uh I guess years ago now I tweeted this and um I think it captured the attention of a lot of people and this is my currently pinned tweet uh is that remarkably we're now programming computers in English. Now, when I was at uh Tesla, um we were

257.9s now I tweeted this and um I think it

260.9s captured the attention of a lot of

261.9s people and this is my currently pinned

263.2s tweet uh is that remarkably we're now

265.4s programming computers in English. Now,

268.2s when I was at uh Tesla, um we were

271.6s working on the uh autopilot and uh we were trying to get the car to drive and I sort of showed this slide at the time where you can imagine that the inputs to the car are on the bottom and they're going through a software stack to produce the steering and acceleration

275.0s were trying to get the car to drive and

277.4s I sort of showed this slide at the time

279.9s where you can imagine that the inputs to

281.7s the car are on the bottom and they're

283.2s going through a software stack to

284.6s produce the steering and acceleration

287.0s and I made the observation at the time that there was a ton of C code around in the autopilot which was the software 1.0 code and then there was some neural nets in there doing image recognition and uh I kind of observed that over time as we made the autopilot better basically the neural network grew in

288.6s that there was a ton of C code around

291.1s in the autopilot which was the software

292.7s 1.0 code and then there was some neural

294.5s nets in there doing image recognition

297.0s and uh I kind of observed that over time

298.8s as we made the autopilot better

300.9s basically the neural network grew in

302.7s capability and size and in addition to that all the C code was being deleted and kind of like was um and a lot of the kind of capabilities and functionality that was originally written in 1.0 was migrated to 2.0. So as an example, a lot of the stitching up of information across images from the different cameras

305.8s that all the C code was being deleted

308.6s and kind of like was um and a lot of the

312.1s kind of capabilities and functionality

314.6s that was originally written in 1.0 was

316.5s migrated to 2.0. So as an example, a lot

319.0s of the stitching up of information

320.7s across images from the different cameras

322.6s and across time was done by a neural network and we were able to delete a lot of code and so the software 2.0 stack quite literally ate through the software stack of the autopilot. So I thought this was really remarkable at the time and I think we're seeing the same thing

325.0s network and we were able to delete a lot

326.5s of code and so the software 2.0 stack

329.8s quite literally ate through the software

332.6s stack of the autopilot. So I thought

334.2s this was really remarkable at the time

335.7s and I think we're seeing the same thing

337.0s again where uh basically we have a new kind of software and it's eating through the stack. We have three completely different programming paradigms and I think if you're entering the industry it's a very good idea to be fluent in all of them because they all have slight pros and cons and you may want to

339.4s kind of software and it's eating through

340.8s the stack. We have three completely

342.5s different programming paradigms and I

344.4s think if you're entering the industry

345.6s it's a very good idea to be fluent in

347.4s all of them because they all have slight

349.4s pros and cons and you may want to

350.8s program some functionality in 1.0 or 2.0 or 3.0. Are you going to train neurallet? Are you going to just prompt an LLM? Should this be a piece of code that's explicit etc. So we all have to make these decisions and actually potentially uh fluidly trans transition between these paradigms. So what I

353.1s or 3.0. Are you going to train

354.4s neurallet? Are you going to just prompt

355.6s an LLM? Should this be a piece of code

357.4s that's explicit etc. So we all have to

359.4s make these decisions and actually

360.6s potentially uh fluidly trans transition

363.5s between these paradigms. So what I

366.8s wanted to get into now is first I want to in the first part talk about LLMs and how to kind of like think of this new paradigm and the ecosystem and what that looks like. Uh like what are what is this new computer? What does it look like and what does the ecosystem look

369.8s to in the first part talk about LLMs and

371.8s how to kind of like think of this new

373.5s paradigm and the ecosystem and what that

375.1s looks like. Uh like what are what is

377.4s this new computer? What does it look

378.7s like and what does the ecosystem look

380.2s like? Um I was struck by this quote from Anduring actually uh many years ago now I think and I think Andrew is going to be speaking right after me. Uh but he said at the time AI is the new electricity and I do think that it um kind of captures something very interesting in that LLMs certainly feel

383.8s Anduring actually uh many years ago now

385.8s I think and I think Andrew is going to

387.5s be speaking right after me. Uh but he

389.4s said at the time AI is the new

390.6s electricity and I do think that it um

393.4s kind of captures something very

394.6s interesting in that LLMs certainly feel

396.7s like they have properties of utilities right now. So um LLM labs like OpenAI, Gemini, Enthropic etc. They spend capex to train the LLMs and this is kind of equivalent to building out a grid and then there's opex to serve that intelligence over APIs to all of us and this is done through metered access where we pay per

399.0s right now. So

401.6s um LLM labs like OpenAI, Gemini,

404.2s Enthropic etc. They spend capex to train

407.1s the LLMs and this is kind of equivalent

408.9s to building out a grid and then there's

411.1s opex to serve that intelligence over

413.0s APIs to all of us and this is done

416.4s through metered access where we pay per

418.6s million tokens or something like that and we have a lot of demands that are very utility- like demands out of this API we demand low latency high uptime consistent quality etc. In electricity, you would have a transfer switch. So you can transfer your electricity source from like grid and solar or battery or

420.4s and we have a lot of demands that are

421.9s very utility- like demands out of this

423.9s API we demand low latency high uptime

426.2s consistent quality etc. In electricity,

429.0s you would have a transfer switch. So you

430.8s can transfer your electricity source

432.4s from like grid and solar or battery or

434.4s generator. In LLM, we have maybe open router and easily switch between the different types of LLMs that exist. Because the LLM are software, they don't compete for physical space. So it's okay to have basically like six electricity providers and you can switch between them, right? Because they don't compete in such a direct way. And I think what's

436.8s router and easily switch between the

438.6s different types of LLMs that exist.

440.6s Because the LLM are software, they don't

443.0s compete for physical space. So it's okay

445.0s to have basically like six electricity

446.7s providers and you can switch between

448.2s them, right? Because they don't compete

449.8s in such a direct way. And I think what's

451.9s also a little fascinating and we saw this in the last few days actually a lot of the LLMs went down and people were kind of like stuck and unable to work. And uh I think it's kind of fascinating to me that when the state-of-the-art LLMs go down, it's actually kind of like an intelligence brownout in the world.

453.7s this in the last few days actually a lot

456.5s of the LLMs went down and people were

458.8s kind of like stuck and unable to work.

461.1s And uh I think it's kind of fascinating

462.5s to me that when the state-of-the-art

463.8s LLMs go down, it's actually kind of like

465.8s an intelligence brownout in the world.

467.8s It's kind of like when the voltage is unreliable in the grid and uh the planet just gets dumber the more reliance we have on these models, which already is like really dramatic and I think will continue to grow. But LLM's don't only have properties of utilities. I think it's also fair to say that they have

469.4s unreliable in the grid and uh the planet

472.1s just gets dumber the more reliance we

475.1s have on these models, which already is

476.7s like really dramatic and I think will

478.4s continue to grow. But LLM's don't only

480.8s have properties of utilities. I think

482.2s it's also fair to say that they have

483.5s some properties of fabs. And the reason for this is that the capex required for building LLM is actually quite large. Uh it's not just like building some uh power station or something like that, right? You're investing a huge amount of money and I think the tech tree and uh for the technology is growing quite

486.5s for this is that the capex required for

489.5s building LLM is actually quite large. Uh

492.2s it's not just like building some uh

494.3s power station or something like that,

495.9s right? You're investing a huge amount of

497.6s money and I think the tech tree and uh

500.0s for the technology is growing quite

502.5s rapidly. So we're in a world where we have sort of deep tech trees, research and development secrets that are centralizing inside the LLM labs. Um and but I think the analogy muddies a little bit also because as I mentioned this is software and software is a bit less defensible because it is so malleable.

504.4s have sort of deep tech trees, research

507.0s and development secrets that are

509.0s centralizing inside the LLM labs. Um and

512.4s but I think the analogy muddies a little

514.2s bit also because as I mentioned this is

516.2s software and software is a bit less

518.2s defensible because it is so malleable.

521.0s And so um I think it's just an interesting kind of thing to think about potentially. There's many analogy analogies you can make like a 4 nanometer process node maybe is something like a cluster with certain max flops. You can think about when you're use when you're using Nvidia GPUs and you're only doing the software and

523.0s interesting kind of thing to think about

524.3s potentially. There's many analogy

526.6s analogies you can make like a 4

528.2s nanometer process node maybe is

529.6s something like a cluster with certain

531.0s max flops. You can think about when

533.0s you're use when you're using Nvidia GPUs

534.8s and you're only doing the software and

536.1s you're not doing the hardware. That's kind of like the fabless model. But if you're actually also building your own hardware and you're training on TPUs if you're Google, that's kind of like the Intel model where you own your fab. So I think there's some analogies here that make sense. But actually I think the

537.1s kind of like the fabless model. But if

539.1s you're actually also building your own

540.3s hardware and you're training on TPUs if

542.0s you're Google, that's kind of like the

543.3s Intel model where you own your fab. So I

545.2s think there's some analogies here that

546.4s make sense. But actually I think the

548.2s analogy that makes the most sense perhaps is that in my mind LLM have very strong kind of analogies to operating systems. Uh in that this is not just electricity or water. It's not something that comes out of the tap as a commodity. uh this is these are now increasingly complex software ecosystems

549.8s perhaps is that in my mind LLM have very

552.5s strong kind of analogies to operating

555.3s systems. Uh in that this is not just

557.8s electricity or water. It's not something

559.5s that comes out of the tap as a

561.0s commodity. uh this is these are now

563.0s increasingly complex software ecosystems

565.9s right so uh they're not just like simple commodities like electricity and it's kind of interesting to me that the ecosystem is shaping in a very similar kind of way where you have a few closed source providers like Windows or Mac OS and then you have an open source alternative like Linux and I think for u

568.7s commodities like electricity and it's

570.9s kind of interesting to me that the

572.0s ecosystem is shaping in a very similar

573.9s kind of way where you have a few closed

576.2s source providers like Windows or Mac OS

578.6s and then you have an open source

579.8s alternative like Linux and I think for u

582.7s neural for LLMs as well we have a kind of a few competing closed source providers and then maybe the llama ecosystem is currently like maybe a close approximation to something that may grow into something like Linux. Again, I think it's still very early because these are just simple LLMs, but we're starting to see that these are

585.5s of a few competing closed source

587.5s providers and then maybe the llama

589.2s ecosystem is currently like maybe a

591.4s close approximation to something that

593.1s may grow into something like Linux.

595.1s Again, I think it's still very early

596.5s because these are just simple LLMs, but

598.2s we're starting to see that these are

599.6s going to get a lot more complicated. It's not just about the LLM itself. It's about all the tool use and the multiodalities and how all of that works. And so when I sort of had this realization a while back, I tried to sketch it out and it kind of seemed to

601.1s It's not just about the LLM itself. It's

602.8s about all the tool use and the

603.9s multiodalities and how all of that

605.5s works. And so when I sort of had this

607.3s realization a while back, I tried to

609.4s sketch it out and it kind of seemed to

611.2s me like LLMs are kind of like a new operating system, right? So the LLM is a new kind of a computer. It's sitting it's kind of like the CPU equivalent. uh the context windows are kind of like the memory and then the LLM is orchestrating memory and compute uh for problem solving um using all of these uh

612.8s operating system, right? So the LLM is a

615.8s new kind of a computer. It's sitting

617.6s it's kind of like the CPU equivalent. uh

619.8s the context windows are kind of like the

621.5s memory and then the LLM is orchestrating

624.4s memory and compute uh for problem

626.6s solving um using all of these uh

629.8s capabilities here and so definitely if you look at it looks very much like operating system from that perspective. Um, a few more analogies. For example, if you want to download an app, say I go to VS Code and I go to download, you can download VS Code and you can run it on

632.6s you look at it looks very much like

634.3s operating system from that perspective.

636.5s Um, a few more analogies. For example,

638.9s if you want to download an app, say I go

641.2s to VS Code and I go to download, you can

643.7s download VS Code and you can run it on

646.2s Windows, Linux or or Mac in the same way as you can take an LLM app like cursor and you can run it on GPT or cloud or Gemini series, right? It's just a drop down. So, it's kind of like similar in that way as well. uh more analogies that I think strike me

650.2s as you can take an LLM app like cursor

653.1s and you can run it on GPT or cloud or

655.5s Gemini series, right? It's just a drop

657.4s down. So, it's kind of like similar in

659.0s that way as well.

660.7s uh more analogies that I think strike me

662.4s is that we're kind of like in this is that we're kind of like in this 1960sish era where LLM compute is still very expensive for this new kind of a computer and that forces the LLMs to be centralized in the cloud and we're all just uh sort of thing clients that interact with it over the network and

664.3s is that we're kind of like in this 1960sish

665.9s era where LLM compute is still very

669.0s expensive for this new kind of a

670.7s computer and that forces the LLMs to be

673.4s centralized in the cloud and we're all

675.8s just uh sort of thing clients that

678.4s interact with it over the network and

680.3s none of us have full utilization of these computers and therefore it makes sense to use time sharing where we're all just you know a dimension of the batch when they're running the computer in the cloud. And this is very much what computers used to look like at during this time. The operating systems were in

682.1s these computers and therefore it makes

684.2s sense to use time sharing where we're

686.4s all just you know a dimension of the

688.3s batch when they're running the computer

690.0s in the cloud. And this is very much what

692.0s computers used to look like at during

693.4s this time. The operating systems were in

695.0s the cloud. Everything was streamed around and there was batching. And so the p the personal computing revolution hasn't happened yet because it's just not economical. It doesn't make sense. But I think some people are trying. And it turns out that Mac minis, for example, are a very good fit for some of

696.2s around and there was batching. And so

699.6s the p the personal computing revolution

701.5s hasn't happened yet because it's just

703.0s not economical. It doesn't make sense.

704.6s But I think some people are trying. And

706.7s it turns out that Mac minis, for

708.4s example, are a very good fit for some of

710.4s the LLMs because it's all if you're doing batch one inference, this is all super memory bound. So this actually super memory bound. So this actually works. And uh I think these are some early indications maybe of personal computing. Uh but this hasn't really happened yet. It's not clear what this looks like.

712.3s doing batch one inference, this is all

713.8s super memory bound. So this actually

715.4s super memory bound. So this actually works.

716.9s And uh I think these are some early

718.7s indications maybe of personal computing.

720.4s Uh but this hasn't really happened yet.

722.1s It's not clear what this looks like.

723.5s Maybe some of you get to invent what what this is or how it works or uh what this should what this should be. Maybe one more analogy that I'll mention is whenever I talk to Chach or some LLM directly in text, I feel like I'm talking to an operating system through the terminal. Like it's just it's it's

725.2s what this is or how it works or uh what

728.1s this should what this should be. Maybe

730.3s one more analogy that I'll mention is

732.2s whenever I talk to Chach or some LLM

734.6s directly in text, I feel like I'm

736.5s talking to an operating system through

738.4s the terminal. Like it's just it's it's

741.0s text. It's direct access to the operating system. And I think a guey hasn't yet really been invented in like a general way like should chatt have a guey like different than just a tech bubbles. Uh certainly some of the apps that we're going to go into in a bit have guey but there's no like guey

742.6s operating system. And I think a guey

744.7s hasn't yet really been invented in like

746.7s a general way like should chatt have a

749.7s guey like different than just a tech

751.4s bubbles. Uh certainly some of the apps

753.4s that we're going to go into in a bit

755.4s have guey but there's no like guey

758.5s across all the tasks if that makes sense. Um there are some ways in which LLMs are different from kind of operating systems in some fairly unique way and from early computing. And I wrote about uh this one particular property that strikes me as very different uh this time around. It's that LLMs like flip they flip the direction

760.2s sense. Um there are some ways in which

763.4s LLMs are different from kind of

765.5s operating systems in some fairly unique

767.4s way and from early computing. And I

769.8s wrote about uh this one particular

772.9s property that strikes me as very

774.2s different uh this time around. It's that

777.1s LLMs like flip they flip the direction

779.8s of technology diffusion uh that is usually uh present in technology. So for example with electricity, cryptography, computing, flight, internet, GPS, lots of new transformative technologies that have not been around. Typically it is the government and corporations that are the first users because it's new and expensive etc. and it only later diffuses to consumer. Uh, but I feel

782.0s usually uh present in technology. So for

785.4s example with electricity, cryptography,

787.0s computing, flight, internet, GPS, lots

789.1s of new transformative technologies that

790.6s have not been around. Typically it is

792.3s the government and corporations that are

794.3s the first users because it's new and

796.7s expensive etc. and it only later

798.7s diffuses to consumer. Uh, but I feel

800.7s like LLMs are kind of like flipped around. So maybe with early computers, it was all about ballistics and military use, but with LLMs, it's all about how do you boil an egg or something like that. This is certainly like a lot of my use. And so it's really fascinating to me that we have a new magical computer

802.1s around. So maybe with early computers,

804.0s it was all about ballistics and military

806.0s use, but with LLMs, it's all about how

809.0s do you boil an egg or something like

810.3s that. This is certainly like a lot of my

812.0s use. And so it's really fascinating to

813.6s me that we have a new magical computer

815.6s and it's like helping me boil an egg. It's not helping the government do something really crazy like some military ballistics or some special technology. Indeed, corporations are governments are lagging behind the adoption of all of us, of all of these technologies. So, it's just backwards and I think it informs maybe some of the

817.4s It's not helping the government do

818.9s something really crazy like some

820.7s military ballistics or some special

822.2s technology. Indeed, corporations are

823.8s governments are lagging behind the

825.1s adoption of all of us, of all of these

827.2s technologies. So, it's just backwards

829.0s and I think it informs maybe some of the

830.5s uses of how we want to use this technology or like where are some of the first apps and so on. So, in summary so far, LLM labs LLMs. I think it's accurate language to use, but LLMs are complicated operating systems. They're circa 1960s in computing and we're redoing computing all over again. and they're currently available via time

832.4s technology or like where are some of the

833.6s first apps and so on.

836.1s So, in summary so far, LLM labs LLMs. I

841.0s think it's accurate language to use, but

843.7s LLMs are complicated operating systems.

846.5s They're circa 1960s in computing and

848.6s we're redoing computing all over again.

850.2s and they're currently available via time

851.8s sharing and distributed like a utility. What is new and unprecedented is that they're not in the hands of a few governments and corporations. They're in the hands of all of us because we all have a computer and it's all just software and Chaship was beamed down to our computers like billions of people

853.8s What is new and unprecedented is that

856.0s they're not in the hands of a few

857.4s governments and corporations. They're in

858.9s the hands of all of us because we all

860.2s have a computer and it's all just

861.6s software and Chaship was beamed down to

864.3s our computers like billions of people

866.6s like instantly and overnight and this is insane. Uh and it's kind of insane to me that this is the case and now it is our time to enter the industry and program these computers. This is crazy. So I think this is quite remarkable. Before we program LLMs, we have to kind of like

868.3s insane. Uh and it's kind of insane to me

870.9s that this is the case and now it is our

873.3s time to enter the industry and program

875.0s these computers. This is crazy. So I

877.3s think this is quite remarkable. Before

879.7s we program LLMs, we have to kind of like

882.1s spend some time to think about what these things are. And I especially like to kind of talk about their psychology. So the way I like to think about LLMs is that they're kind of like people spirits. Um they are stoastic simulations of people. Um and the simulator in this case happens to be an auto reggressive transformer. So

883.5s these things are. And I especially like

885.8s to kind of talk about their psychology.

888.3s So the way I like to think about LLMs is

890.5s that they're kind of like people

891.5s spirits. Um they are stoastic

894.1s simulations of people. Um and the

896.4s simulator in this case happens to be an

898.0s auto reggressive transformer. So

899.8s transformer is a neural net. Uh it's and it just kind of like is goes on the level of tokens. It goes chunk chunk chunk chunk chunk. And there's an almost equal amount of compute for every single chunk. Um and um this simulator of course is is just is basically there's some weights involved and we fit it to

902.7s it just kind of like is goes on the

904.8s level of tokens. It goes chunk chunk

906.5s chunk chunk chunk. And there's an almost

908.3s equal amount of compute for every single

910.2s chunk. Um and um this simulator of

914.7s course is is just is basically there's

917.0s some weights involved and we fit it to

919.0s all of text that we have on the internet and so on. And you end up with this kind of a simulator and because it is trained on humans, it's got this emergent psychology that is humanlike. So the first thing you'll notice is of course uh LLM have encyclopedic knowledge and memory. uh and they can remember lots of

920.5s and so on. And you end up with this kind

922.2s of a simulator and because it is trained

924.2s on humans, it's got this emergent

926.2s psychology that is humanlike. So the

928.4s first thing you'll notice is of course

930.6s uh LLM have encyclopedic knowledge and

932.6s memory. uh and they can remember lots of

934.6s things, a lot more than any single individual human can because they read so many things. It's it actually kind of reminds me of this movie Rainman, which I actually really recommend people watch. It's an amazing movie. I love this movie. Um and Dustin Hoffman here is an autistic savant who has almost

936.1s individual human can because they read

937.6s so many things. It's it actually kind of

939.8s reminds me of this movie Rainman, which

941.7s I actually really recommend people

943.0s watch. It's an amazing movie. I love

944.5s this movie. Um and Dustin Hoffman here

946.7s is an autistic savant who has almost

949.2s perfect memory. So, he can read a he can read like a phone book and remember all of the names and phone numbers. And I kind of feel like LM are kind of like very similar. They can remember Shaw hashes and lots of different kinds of things very very easily. So they certainly have superpowers in some set

951.6s read like a phone book and remember all

953.3s of the names and phone numbers. And I

955.4s kind of feel like LM are kind of like

957.2s very similar. They can remember Shaw

959.0s hashes and lots of different kinds of

960.4s things very very easily. So they

962.5s certainly have superpowers in some set

964.4s in some respects. But they also have a bunch of I would say cognitive deficits. So they hallucinate quite a bit. Um and they kind of make up stuff and don't have a very good uh sort of internal model of self-nowledge, not sufficient at least. And this has gotten better but not perfect. They display jagged

966.2s bunch of I would say cognitive deficits.

968.8s So they hallucinate quite a bit. Um and

971.8s they kind of make up stuff and don't

973.1s have a very good uh sort of internal

975.3s model of self-nowledge, not sufficient

977.7s at least. And this has gotten better but

979.4s not perfect. They display jagged

981.6s intelligence. So they're going to be superhuman in some problems solving domains. And then they're going to make mistakes that basically no human will make. like you know they will insist that 9.11 is greater than 9.9 or that there are two Rs in strawberry these are some famous examples but basically there

982.8s superhuman in some problems solving

984.5s domains. And then they're going to make

986.0s mistakes that basically no human will

987.7s make. like you know they will insist

989.9s that 9.11 is greater than 9.9 or that

992.6s there are two Rs in strawberry these are

994.2s some famous examples but basically there

996.2s are rough edges that you can trip on so that's kind of I think also kind of unique um they also kind of suffer from entrograde amnesia um so uh and I think I'm alluding to the fact that if you have a co-orker who joins your organization this co-orker will over time learn your organization and uh they

998.9s that's kind of I think also kind of

1000.3s unique um they also kind of suffer from

1003.3s entrograde amnesia um so uh and I think

1006.9s I'm alluding to the fact that if you

1008.1s have a co-orker who joins your

1009.3s organization this co-orker will over

1011.4s time learn your organization and uh they

1014.2s will understand and gain like a huge amount of context on the organization and they go home and they sleep and they consolidate knowledge and they develop expertise over time. LLMs don't natively do this and this is not something that has really been solved in the RD of LLM. I think um and so context windows

1015.9s amount of context on the organization

1017.8s and they go home and they sleep and they

1019.6s consolidate knowledge and they develop

1021.1s expertise over time. LLMs don't natively

1023.4s do this and this is not something that

1024.6s has really been solved in the RD of

1026.4s LLM. I think um and so context windows

1029.3s are really kind of like working memory and you have to sort of program the working memory quite directly because they don't just kind of like get smarter by uh by default and I think a lot of people get tripped up by the analogies uh in this way. Uh in popular culture I recommend people watch these two movies

1030.6s and you have to sort of program the

1032.0s working memory quite directly because

1033.6s they don't just kind of like get smarter

1035.0s by uh by default and I think a lot of

1037.0s people get tripped up by the analogies

1039.0s uh in this way. Uh in popular culture I

1042.2s recommend people watch these two movies

1043.9s uh Momento and 51st dates. In both of these movies, the protagonists, their weights are fixed and their context windows gets wiped every single morning and it's really problematic to go to work or have relationships when this happens and this happens to all the time. I guess one more thing I would point to is security kind of related

1046.1s these movies, the protagonists, their

1047.8s weights are fixed and their context

1049.8s windows gets wiped every single morning

1052.2s and it's really problematic to go to

1054.2s work or have relationships when this

1055.8s happens and this happens to all the

1057.5s time. I guess one more thing I would

1059.6s point to is security kind of related

1062.3s limitations of the use of LLM. So for example, LLMs are quite gullible. Uh they are susceptible to prompt injection risks. They might leak your data etc. And so um and there's many other considerations uh security related. So, so basically long story short, you have to load your you have to load your you have to simultaneously think through

1064.3s example, LLMs are quite gullible. Uh

1066.4s they are susceptible to prompt injection

1068.2s risks. They might leak your data etc.

1070.8s And so um and there's many other

1072.8s considerations uh security related. So,

1075.3s so basically long story short, you have

1077.5s to load your you have to load your you

1080.0s have to simultaneously think through

1081.3s this superhuman thing that has a bunch of cognitive deficits and issues. How do we and yet they are extremely like useful and so how do we program them and how do we work around their deficits and enjoy their superhuman powers. So what I want to switch to now is talk about the opportunities of how do we use

1083.2s of cognitive deficits and issues. How do

1085.4s we and yet they are extremely like

1087.8s useful and so how do we program them and

1090.6s how do we work around their deficits and

1092.4s enjoy their superhuman powers.

1095.8s So what I want to switch to now is talk

1097.4s about the opportunities of how do we use

1099.0s these models and what are some of the biggest opportunities. This is not a comprehensive list just some of the things that I thought were interesting for this talk. The first thing I'm kind of excited about is what I would call partial autonomy apps. So for example, let's work with the example of coding.

1100.7s biggest opportunities. This is not a

1102.4s comprehensive list just some of the

1103.5s things that I thought were interesting

1104.6s for this talk. The first thing I'm kind

1106.9s of excited about is what I would call

1109.3s partial autonomy apps. So for example,

1112.2s let's work with the example of coding.

1114.2s You can certainly go to chacht directly and you can start copy pasting code around and copyping bug reports and stuff around and getting code and copy pasting everything around. Why would you why would you do that? Why would you go directly to the operating system? It makes a lot more sense to have an app

1116.6s and you can start copy pasting code

1118.1s around and copyping bug reports and

1121.0s stuff around and getting code and copy

1122.4s pasting everything around. Why would you

1124.2s why would you do that? Why would you go

1125.4s directly to the operating system? It

1127.1s makes a lot more sense to have an app

1128.5s dedicated for this. And so I think many of you uh use uh cursor. I do as well. And uh cursor is kind of like the thing you want instead. You don't want to just directly go to the chash apt. And I think cursor is a very good example of an early LLM app that has a bunch of

1130.7s of you uh use uh cursor. I do as well.

1133.8s And uh cursor is kind of like the thing

1136.3s you want instead. You don't want to just

1137.8s directly go to the chash apt. And I

1139.8s think cursor is a very good example of

1141.4s an early LLM app that has a bunch of

1143.8s properties that I think are um useful across all the LLM apps. So in particular, you will notice that we have a traditional interface that allows a human to go in and do all the work manually just as before. But in addition to that, we now have this LLM integration that allows us to go in

1146.2s across all the LLM apps. So in

1148.0s particular, you will notice that we have

1149.7s a traditional interface that allows a

1152.0s human to go in and do all the work

1153.8s manually just as before. But in addition

1156.5s to that, we now have this LLM

1157.8s integration that allows us to go in

1159.4s bigger chunks. And so some of the properties of LLM apps that I think are shared and useful to point out. Number one, the LLMs basically do a ton of the context management. Um, number two, they orchestrate multiple calls to LLMs, right? So in the case of cursor, there's under the hood embedding models for all

1161.9s properties of LLM apps that I think are

1163.5s shared and useful to point out. Number

1165.8s one, the LLMs basically do a ton of the

1168.1s context management. Um, number two, they

1171.2s orchestrate multiple calls to LLMs,

1173.2s right? So in the case of cursor, there's

1175.0s under the hood embedding models for all

1177.0s your files, the actual chat models, models that apply diffs to the code, and this is all orchestrated for you. A really big one that uh I think also maybe not fully appreciated always is application specific uh GUI and the importance of it. Um because you don't just want to talk to the operating

1179.2s models that apply diffs to the code, and

1181.8s this is all orchestrated for you. A

1183.9s really big one that uh I think also

1186.1s maybe not fully appreciated always is

1188.5s application specific uh GUI and the

1190.5s importance of it. Um because you don't

1193.1s just want to talk to the operating

1194.6s system directly in text. Text is very hard to read, interpret, understand and also like you don't want to take some of these actions natively in text. So it's much better to just see a diff as like red and green change and you can see what's being added is subtracted. It's much easier to just do command Y to

1196.6s hard to read, interpret, understand and

1199.0s also like you don't want to take some of

1200.5s these actions natively in text. So it's

1203.1s much better to just see a diff as like

1205.0s red and green change and you can see

1206.8s what's being added is subtracted. It's

1208.5s much easier to just do command Y to

1210.2s accept or command N to reject. I shouldn't have to type it in text, right? So, a guey allows a human to audit the work of these fallible systems and to go faster. I'm going to come back to this point a little bit uh later as well. And the last kind of feature I

1211.9s shouldn't have to type it in text,

1213.1s right? So, a guey allows a human to

1215.5s audit the work of these fallible systems

1217.8s and to go faster. I'm going to come back

1220.0s to this point a little bit uh later as

1221.8s well. And the last kind of feature I

1223.8s want to point out is that there's what I call the autonomy slider. So, for example, in cursor, you can just do tap completion. You're mostly in charge. You can select a chunk of code and command K to change just that chunk of code. You can do command L to change the entire

1225.2s call the autonomy slider. So, for

1227.7s example, in cursor, you can just do tap

1229.4s completion. You're mostly in charge. You

1231.5s can select a chunk of code and command K

1233.6s to change just that chunk of code. You

1236.0s can do command L to change the entire

1237.9s file. Or you can do command I which just you know let it rip do whatever you want in the entire repo and that's the sort of full autonomy agent agentic version and so you are in charge of the autonomy slider and depending on the complexity of the task at hand you can uh tune the

1240.4s you know let it rip do whatever you want

1242.2s in the entire repo and that's the sort

1244.1s of full autonomy agent agentic version

1246.4s and so you are in charge of the autonomy

1248.3s slider and depending on the complexity

1250.2s of the task at hand you can uh tune the

1253.0s amount of autonomy that you're willing to give up uh for that task maybe to show one more example of a fairly successful LLM app uh perplexity um it also has very similar features to what I've just pointed out to in cursor uh it packages up a lot of the information. It orchestrates multiple LLMs. It's got a

1254.3s to give up uh for that task maybe to

1257.1s show one more example of a fairly

1258.6s successful LLM app uh perplexity um it

1263.0s also has very similar features to what

1264.6s I've just pointed out to in cursor uh it

1267.2s packages up a lot of the information. It

1268.7s orchestrates multiple LLMs. It's got a

1271.0s GUI that allows you to audit some of its work. So, for example, it will site sources and you can imagine inspecting them. And it's got an autonomy slider. You can either just do a quick search or you can do research or you can do deep research and come back 10 minutes later.

1273.4s work. So, for example, it will site

1275.6s sources and you can imagine inspecting

1277.3s them. And it's got an autonomy slider.

1279.0s You can either just do a quick search or

1280.6s you can do research or you can do deep

1282.3s research and come back 10 minutes later.

1284.3s So, this is all just varying levels of autonomy that you give up to the tool. So, I guess my question is I feel like a lot of software will become partially autonomous. I'm trying to think through like what does that look like? And for many of you who maintain products and

1285.7s autonomy that you give up to the tool.

1287.7s So, I guess my question is I feel like a

1290.2s lot of software will become partially

1292.0s autonomous. I'm trying to think through

1293.5s like what does that look like? And for

1295.3s many of you who maintain products and

1297.0s services, how are you going to make your products and services partially autonomous? Can an LLM see everything that a human can see? Can an LLM act in all the ways that a human could act? And can humans supervise and stay in the loop of this activity? Because again, these are fallible systems that aren't

1299.0s products and services partially

1300.2s autonomous? Can an LLM see everything

1302.7s that a human can see? Can an LLM act in

1305.1s all the ways that a human could act? And

1307.0s can humans supervise and stay in the

1309.4s loop of this activity? Because again,

1310.9s these are fallible systems that aren't

1312.3s yet perfect. And what does a diff look like in Photoshop or something like that? You know, and also a lot of the traditional software right now, it has all these switches and all this kind of stuff that's all designed for human. All of this has to change and become accessible to LLMs.

1314.9s like in Photoshop or something like

1316.6s that? You know, and also a lot of the

1318.8s traditional software right now, it has

1320.1s all these switches and all this kind of

1321.8s stuff that's all designed for human. All

1323.4s of this has to change and become

1324.7s accessible to LLMs.

1327.8s So, one thing I want to stress with a lot of these LLM apps that I'm not sure gets as much attention as it should is um we we're now kind of like cooperating with AIS and usually they are doing the generation and we as humans are doing the verification. It is in our interest

1329.5s lot of these LLM apps that I'm not sure

1331.1s gets as much attention as it should is

1334.2s um we we're now kind of like cooperating

1336.8s with AIS and usually they are doing the

1338.6s generation and we as humans are doing

1340.2s the verification. It is in our interest

1342.6s to make this loop go as fast as possible. So, we're getting a lot of work done. There are two major ways that I think uh this can be done. Number one, you can speed up verification a lot. Um, and I think guies, for example, are extremely important to this because a guey utilizes your computer vision GPU

1344.5s possible. So, we're getting a lot of

1345.8s work done. There are two major ways that

1348.0s I think uh this can be done. Number one,

1350.4s you can speed up verification a lot. Um,

1352.7s and I think guies, for example, are

1354.2s extremely important to this because a

1356.1s guey utilizes your computer vision GPU

1359.3s in all of our head. Reading text is effortful and it's not fun, but looking at stuff is fun and it's it's just a kind of like a highway to your brain. So, I think guies are very useful for auditing systems and visual representations in general. And number two, I would say is we have to keep the

1361.4s effortful and it's not fun, but looking

1363.2s at stuff is fun and it's it's just a

1365.8s kind of like a highway to your brain.

1367.4s So, I think guies are very useful for

1369.7s auditing systems and visual

1371.7s representations in general. And number

1373.6s two, I would say is we have to keep the

1376.1s AI on the leash. We I think a lot of people are getting way over excited with AI agents and uh it's not useful to me to get a diff of 10,000 lines of code to my repo. Like I have to I'm still the bottleneck, right? Even though that 10,00 lines come out instantly, I have

1378.9s people are getting way over excited with

1380.6s AI agents and uh it's not useful to me

1383.6s to get a diff of 10,000 lines of code to

1385.8s my repo. Like I have to I'm still the

1387.9s bottleneck, right? Even though that

1389.2s 10,00 lines come out instantly, I have

1391.1s to make sure that this thing is not introducing bugs. It's just like and that it's doing the correct thing, right? And that there's no security issues and so on. So um I think that um yeah basically you we have to sort of like it's in our interest to make the

1392.2s introducing bugs. It's just like and

1395.4s that it's doing the correct thing,

1396.6s right? And that there's no security

1397.8s issues and so on. So um I think that um

1402.9s yeah basically you we have to sort of

1405.4s like it's in our interest to make the

1408.2s the flow of these two go very very fast and we have to somehow keep the AI on the leash because it gets way too overreactive. It's uh it's kind of like this. This is how I feel when I do AI assisted coding. If I'm just bite coding everything is nice and great but if I'm

1410.3s and we have to somehow keep the AI on

1412.2s the leash because it gets way too

1413.1s overreactive. It's uh it's kind of like

1415.3s this. This is how I feel when I do AI

1417.3s assisted coding. If I'm just bite coding

1419.2s everything is nice and great but if I'm

1420.9s actually trying to get work done it's not so great to have an overreactive uh agent doing all this kind of stuff. So this slide is not very good. I'm sorry, but I guess I'm trying to develop like many of you some ways of utilizing these agents in my coding workflow and to do

1422.4s not so great to have an overreactive uh

1424.7s agent doing all this kind of stuff. So

1427.3s this slide is not very good. I'm sorry,

1428.8s but I guess I'm trying to develop like

1431.1s many of you some ways of utilizing these

1433.8s agents in my coding workflow and to do

1435.8s AI assisted coding. And in my own work, I'm always scared to get way too big diffs. I always go in small incremental chunks. I want to make sure that everything is good. I want to spin this loop very very fast and um I sort of work on small chunks of single concrete

1438.1s I'm always scared to get way too big

1439.8s diffs. I always go in small incremental

1442.2s chunks. I want to make sure that

1444.2s everything is good. I want to spin this

1446.2s loop very very fast and um I sort of

1449.1s work on small chunks of single concrete

1450.8s thing. Uh and so I think many of you probably are developing similar ways of working with the with LLMs. Um, I also saw a number of blog posts that try to develop these best practices for working with LLMs. And here's one that I read recently and I thought was quite good. And it kind of discussed

1453.2s probably are developing similar ways of

1454.6s working with the with LLMs.

1457.6s Um, I also saw a number of blog posts

1459.6s that try to develop these best practices

1462.2s for working with LLMs. And here's one

1464.0s that I read recently and I thought was

1465.4s quite good. And it kind of discussed

1466.8s some techniques and some of them have to do with how you keep the AI on the leash. And so, as an example, if you are prompting, if your prompt is vague, then uh the AI might not do exactly what you wanted and in that case, verification will fail. You're going to ask for

1468.2s do with how you keep the AI on the

1469.9s leash. And so, as an example, if you are

1472.0s prompting, if your prompt is vague, then

1475.0s uh the AI might not do exactly what you

1477.0s wanted and in that case, verification

1478.9s will fail. You're going to ask for

1480.2s something else. If a verification fails, then you're going to start spinning. So it makes a lot more sense to spend a bit more time to be more concrete in your prompts which increases the probability of successful verification and you can move forward. And so I think a lot of us

1482.1s then you're going to start spinning. So

1483.7s it makes a lot more sense to spend a bit

1485.1s more time to be more concrete in your

1486.8s prompts which increases the probability

1488.5s of successful verification and you can

1490.2s move forward. And so I think a lot of us

1492.1s are going to end up finding um kind of techniques like this. I think in my own work as well I'm currently interested in uh what education looks like in um together with kind of like now that we have AI uh and LLMs what does education look like? And I think a a large amount

1494.1s techniques like this. I think in my own

1496.3s work as well I'm currently interested in

1497.8s uh what education looks like in um

1500.1s together with kind of like now that we

1501.8s have AI uh and LLMs what does education

1504.5s look like? And I think a a large amount

1507.0s of thought for me goes into how we keep AI on the leash. I don't think it just works to go to chat and be like, Hey, teach me physics. I don't think this works because the AI is like gets lost in the woods. And so for me, this is actually two separate apps. For example,

1509.7s AI on the leash. I don't think it just

1511.4s works to go to chat and be like, Hey,

1513.2s teach me physics. I don't think this

1514.8s works because the AI is like gets lost

1516.9s in the woods. And so for me, this is

1518.8s actually two separate apps. For example,

1520.9s there's an app for a teacher that creates courses and then there's an app that takes courses and serves them to students. And in both cases, we now have this intermediate artifact of a course that is auditable and we can make sure it's good. We can make sure it's consistent. and the AI is kept on the

1522.6s creates courses and then there's an app

1524.9s that takes courses and serves them to

1526.5s students. And in both cases, we now have

1529.1s this intermediate artifact of a course

1531.2s that is auditable and we can make sure

1532.7s it's good. We can make sure it's

1533.8s consistent. and the AI is kept on the

1535.9s leash with respect to a certain syllabus, a certain like um progression of projects and so on. And so this is one way of keeping the AI on leash and I think has a much higher likelihood of working and the AI is not getting lost in the woods. One more kind of analogy I wanted to

1537.1s syllabus, a certain like um progression

1540.2s of projects and so on. And so this is

1542.6s one way of keeping the AI on leash and I

1544.2s think has a much higher likelihood of

1545.8s working and the AI is not getting lost

1547.8s in the woods.

1549.9s One more kind of analogy I wanted to

1551.9s sort of allude to is I'm not I'm no stranger to partial autonomy and I kind of worked on this I think for five years at Tesla and this is also a partial autonomy product and shares a lot of the features like for example right there in the instrument panel is the GUI of the

1554.5s stranger to partial autonomy and I kind

1556.2s of worked on this I think for five years

1557.8s at Tesla and this is also a partial

1560.2s autonomy product and shares a lot of the

1561.9s features like for example right there in

1563.5s the instrument panel is the GUI of the

1565.4s autopilot so it's showing me what the what the neural network sees and so on and we have the autonomy slider where over the course of my tenure there we did more and more autonomous tasks for the user and maybe the story that I wanted to tell very briefly is uh actually the first time I drove a

1567.6s what the neural network sees and so on

1569.2s and we have the autonomy slider where

1570.8s over the course of my tenure there we

1573.4s did more and more autonomous tasks for

1575.6s the user and maybe the story that I

1578.3s wanted to tell very briefly is uh

1581.1s actually the first time I drove a

1582.6s self-driving vehicle was in 2013 and I had a friend who worked at Whimo and uh he offered to give me a drive around Palo Alto. I took this picture using Google Glass at the time and many of you are so young that you might not even know what that is. Uh but uh yeah, this

1585.2s had a friend who worked at Whimo and uh

1587.3s he offered to give me a drive around

1589.1s Palo Alto. I took this picture using

1591.5s Google Glass at the time and many of you

1593.9s are so young that you might not even

1595.3s know what that is. Uh but uh yeah, this

1597.3s was like all the rage at the time. And we got into this car and we went for about a 30-minute drive around Palo Alto highways uh streets and so on. And this drive was perfect. There was zero interventions and this was 2013 which is now 12 years ago. And it kind of struck

1599.4s we got into this car and we went for

1601.0s about a 30-minute drive around Palo Alto

1603.0s highways uh streets and so on. And this

1605.1s drive was perfect. There was zero

1607.0s interventions and this was 2013 which is

1609.8s now 12 years ago. And it kind of struck

1612.5s me because at the time when I had this perfect drive, this perfect demo, I felt like, wow, self-driving is imminent because this just worked. This is incredible. Um, but here we are 12 years later and we are still working on autonomy. Um, we are still working on driving agents and even now we haven't

1614.0s perfect drive, this perfect demo, I felt

1616.2s like, wow, self-driving is imminent

1619.5s because this just worked. This is

1620.8s incredible. Um, but here we are 12 years

1623.4s later and we are still working on

1624.9s autonomy. Um, we are still working on

1627.0s driving agents and even now we haven't

1629.2s actually like really solved the problem. like you may see Whimos going around and they look driverless but you know there's still a lot of teleoperation and a lot of human in the loop of a lot of this driving so we still haven't even like declared success but I think it's definitely like going to succeed at this

1630.8s like you may see Whimos going around and

1632.9s they look driverless but you know

1635.0s there's still a lot of teleoperation and

1636.8s a lot of human in the loop of a lot of

1638.7s this driving so we still haven't even

1641.0s like declared success but I think it's

1642.6s definitely like going to succeed at this

1644.4s point but it just took a long time and so I think like like this is software is really tricky I think in the same way that driving is tricky and so when I see things like oh 2025 is the year of agents I get very concerned and I kind of feel like you know this is the decade

1646.6s so I think like like this is software is

1649.4s really tricky I think in the same way

1651.6s that driving is tricky and so when I see

1654.7s things like oh 2025 is the year of

1656.5s agents I get very concerned and I kind

1658.7s of feel like you know this is the decade

1661.0s of agents and this is going to be quite some time. We need humans in the loop. We need to do this carefully. This is software. Let's be serious here. One more kind of analogy that I always think through is the Iron Man suit. Uh I think this is I always love Iron Man. I think

1664.1s some time. We need humans in the loop.

1665.8s We need to do this carefully. This is

1667.2s software. Let's be serious here. One

1671.0s more kind of analogy that I always think

1672.9s through is the Iron Man suit. Uh I think

1676.1s this is I always love Iron Man. I think

1678.2s it's like so um correct in a bunch of ways with respect to technology and how it will play out. And what I love about the Iron Man suit is that it's both an augmentation and Tony Stark can drive it and it's also an agent. And in some of the movies, the Iron Man suit is quite

1681.4s ways with respect to technology and how

1682.9s it will play out. And what I love about

1684.4s the Iron Man suit is that it's both an

1685.9s augmentation and Tony Stark can drive it

1688.7s and it's also an agent. And in some of

1690.3s the movies, the Iron Man suit is quite

1691.8s autonomous and can fly around and find Tony and all this kind of stuff. And so this is the autonomy slider is we can be we can build augmentations or we can build agents and we kind of want to do a bit of both. But at this stage I would say working with fallible LLMs and so

1693.6s Tony and all this kind of stuff. And so

1695.3s this is the autonomy slider is we can be

1697.3s we can build augmentations or we can

1699.0s build agents and we kind of want to do a

1701.2s bit of both. But at this stage I would

1703.4s say working with fallible LLMs and so

1705.9s on. I would say you know it's less Iron Man robots and more Iron Man suits that you want to build. It's less like building flashy demos of autonomous agents and more building partial autonomy products. And these products have custom gueies and UIUX. And we're trying to um and this is done so that

1709.1s Man robots and more Iron Man suits that

1711.6s you want to build. It's less like

1713.7s building flashy demos of autonomous

1715.1s agents and more building partial

1716.7s autonomy products. And these products

1719.7s have custom gueies and UIUX. And we're

1721.9s trying to um and this is done so that

1723.8s the generation verification loop of the human is very very fast. But we are not losing the sight of the fact that it is in principle possible to automate this work. And there should be an autonomy slider in your product. And you should be thinking about how you can slide that autonomy slider and make your product uh

1725.5s human is very very fast. But we are not

1728.2s losing the sight of the fact that it is

1729.5s in principle possible to automate this

1731.3s work. And there should be an autonomy

1733.0s slider in your product. And you should

1734.6s be thinking about how you can slide that

1735.9s autonomy slider and make your product uh

1738.6s sort of um more autonomous over time. But this is kind of how I think there's lots of opportunities in these kinds of products. I want to now switch gears a little bit and talk about one other dimension that I think is very unique. Not only is there a new type of programming language that allows for

1741.3s But this is kind of how I think there's

1742.7s lots of opportunities in these kinds of

1744.2s products. I want to now switch gears a

1746.6s little bit and talk about one other

1748.2s dimension that I think is very unique.

1749.8s Not only is there a new type of

1751.4s programming language that allows for

1753.0s autonomy in software but also as I mentioned it's programmed in English which is this natural interface and suddenly everyone is a programmer because everyone speaks natural language like English. So this is extremely bullish and very interesting to me and also completely unprecedented. I would say it it used to be the case that you

1755.3s mentioned it's programmed in English

1756.6s which is this natural interface and

1759.0s suddenly everyone is a programmer

1760.6s because everyone speaks natural language

1762.2s like English. So this is extremely

1764.6s bullish and very interesting to me and

1766.2s also completely unprecedented. I would

1768.0s say it it used to be the case that you

1769.5s need to spend five to 10 years studying something to be able to do something in software. this is not the case anymore. So, I don't know if by any chance anyone has heard of vibe coding. Uh, this this is the tweet that kind of like introduced this, but I'm told that

1771.4s something to be able to do something in

1772.9s software. this is not the case anymore.

1775.2s So, I don't know if by any chance anyone

1777.1s has heard of vibe coding.

1780.6s Uh, this this is the tweet that kind of

1782.5s like introduced this, but I'm told that

1784.2s this is now like a major meme. Um, fun story about this is that I've been on Twitter for like 15 years or something like that at this point and I still have no clue which tweet will become viral and which tweet like fizzles and no one cares. And I thought that this tweet was

1786.7s story about this is that I've been on

1789.6s Twitter for like 15 years or something

1791.2s like that at this point and I still have

1793.5s no clue which tweet will become viral

1796.3s and which tweet like fizzles and no one

1798.0s cares. And I thought that this tweet was

1800.8s going to be the latter. I don't know. It was just like a shower of thoughts. But this became like a total meme and I really just can't tell. But I guess like it struck a chord and it gave a name to something that everyone was feeling but couldn't quite say in words. So now

1801.8s was just like a shower of thoughts. But

1803.4s this became like a total meme and I

1805.3s really just can't tell. But I guess like

1806.7s it struck a chord and it gave a name to

1808.5s something that everyone was feeling but

1810.6s couldn't quite say in words. So now

1813.3s there's a Wikipedia page and everything. This is like This is like Applause yeah this is like a major contribution now or something like that. So, um, so Tom Wolf from HuggingFace shared this beautiful video that I really love. this beautiful video that I really love. Um, these are kids vibe coding.

1817.3s This is like

1818.6s This is like Applause

1825.9s yeah this is like a major contribution

1827.6s now or something like that. So,

1830.7s um, so Tom Wolf from HuggingFace shared

1833.0s this beautiful video that I really love.

1835.0s this beautiful video that I really love. Um,

1837.8s these are kids vibe coding.

1844.4s video. Like, I love this video. Like, how can you look at this video and feel bad about the future? The future is bad about the future? The future is great. I think this will end up being like a gateway drug to software development. Um, I'm not a doomer about the future of

1846.7s how can you look at this video and feel

1848.1s bad about the future? The future is

1849.8s bad about the future? The future is great.

1852.6s I think this will end up being like a

1853.9s gateway drug to software development.

1856.6s Um, I'm not a doomer about the future of

1859.2s the generation and I think yeah, I love this video. So, I tried by coding a little bit uh as well because it's so fun. Uh, so bike coding is so great when you want to build something super duper custom that doesn't appear to exist and you just want to wing it because it's a

1862.2s this video. So, I tried by coding a

1864.8s little bit uh as well because it's so

1867.1s fun. Uh, so bike coding is so great when

1869.4s you want to build something super duper

1870.8s custom that doesn't appear to exist and

1872.4s you just want to wing it because it's a

1873.7s Saturday or something like that. So, I built this uh iOS app and I don't I can't actually program in Swift, but I was really shocked that I was able to build like a super basic app and I'm not going to explain it. It's really uh dumb, but uh I kind of like this was

1875.5s built this uh iOS app and I don't I

1878.7s can't actually program in Swift, but I

1880.6s was really shocked that I was able to

1881.8s build like a super basic app and I'm not

1883.4s going to explain it. It's really uh

1884.7s dumb, but uh I kind of like this was

1887.4s just like a day of work and this was running on my phone like later that day and I was like, Wow, this is amazing. I didn't have to like read through Swift for like five days or something like that to like get started. I also vipcoded this app called Menu Genen. And

1888.7s running on my phone like later that day

1890.3s and I was like, Wow, this is amazing.

1892.3s I didn't have to like read through Swift

1893.9s for like five days or something like

1895.9s that to like get started. I also

1898.2s vipcoded this app called Menu Genen. And

1900.5s this is live. You can try it in menu.app. And I basically had this problem where I show up at a restaurant, I read through the menu, and I have no idea what any of the things are. And I need pictures. So this doesn't exist. So I was like, Hey, I'm going to bite code

1901.8s menu.app. And I basically had this

1904.1s problem where I show up at a restaurant,

1905.4s I read through the menu, and I have no

1906.6s idea what any of the things are. And I

1908.6s need pictures. So this doesn't exist. So

1911.6s I was like, Hey, I'm going to bite code

1913.0s it. So, um, this is what it looks like. You go to menu.app, um, and, uh, you take a picture of a of a menu and then menu generates the images and everyone gets 5 in credits for free when you sign up. And therefore, this is a major cost center in my life. So, this is a negative

1915.9s You go to menu.app,

1918.2s um, and, uh, you take a picture of a of

1921.4s a menu and then menu generates the

1923.3s images and everyone gets 5 in credits

1926.2s for free when you sign up. And

1928.0s therefore, this is a major cost center

1930.5s in my life. So, this is a negative

1933.8s negative uh, revenue app for me right negative uh, revenue app for me right now. I've lost a huge amount of money on I've lost a huge amount of money on menu. Okay. But the fascinating thing about menu genen for me is that the code of the v the vite coding part the code was

1936.2s negative uh, revenue app for me right now.

1937.8s I've lost a huge amount of money on

1939.2s I've lost a huge amount of money on menu.

1941.3s Okay. But the fascinating thing about

1943.4s menu genen for me is that the code of

1948.2s the v the vite coding part the code was

1950.2s actually the easy part of v of v coding menu and most of it actually was when I tried to make it real so that you can actually have authentication and payments and the domain name and averal deployment. This was really hard and all of this was not code. All of this devops

1952.7s menu and most of it actually was when I

1955.1s tried to make it real so that you can

1956.5s actually have authentication and

1957.6s payments and the domain name and averal

1959.6s deployment. This was really hard and all

1961.9s of this was not code. All of this devops

1964.2s stuff was in me in the browser clicking stuff and this was extreme slo and took another week. So it was really fascinating that I had the menu genen um basically demo working on my laptop in a few hours and then it took me a week because I was trying to make it real and

1967.1s stuff and this was extreme slo and took

1969.8s another week. So it was really

1971.5s fascinating that I had the menu genen um

1974.6s basically demo working on my laptop in a

1977.3s few hours and then it took me a week

1979.3s because I was trying to make it real and

1981.2s the reason for this is this was just really annoying. Um, so for example, if you try to add Google login to your web page, I know this is very small, but just a huge amount of instructions of this clerk library telling me how to integrate this. And this is crazy. Like

1982.9s really annoying. Um, so for example, if

1985.6s you try to add Google login to your web

1987.3s page, I know this is very small, but

1989.2s just a huge amount of instructions of

1991.7s this clerk library telling me how to

1993.6s integrate this. And this is crazy. Like

1995.2s it's telling me go to this URL, click on this dropdown, choose this, go to this, and click on that. And it's like telling me what to do. Like a computer is telling me the actions I should be taking. Like you do it. Why am I doing taking. Like you do it. Why am I doing this?

1997.5s this dropdown, choose this, go to this,

1999.8s and click on that. And it's like telling

2001.2s me what to do. Like a computer is

2002.6s telling me the actions I should be

2004.9s taking. Like you do it. Why am I doing

2006.6s taking. Like you do it. Why am I doing this?

2008.6s What the hell? I had to follow all these instructions. This was crazy. So I think the last part of my talk therefore focuses on can we just build for agents? I don't want to do this work. Can agents do this? Thank do this work. Can agents do this? Thank you. Okay. So roughly speaking, I think

2011.8s I had to follow all these instructions.

2013.8s This was crazy. So I think the last part

2016.2s of my talk therefore focuses on can we

2019.5s just build for agents? I don't want to

2021.7s do this work. Can agents do this? Thank

2024.2s do this work. Can agents do this? Thank you.

2026.3s Okay. So roughly speaking, I think

2028.6s there's a new category of consumer and manipulator of digital information. It used to be just humans through GUIs or computers through APIs. And now we have a completely new thing and agents are they're computers but they are humanlike kind of right they're people spirits there's people spirits on the internet and they need to interact with our

2030.9s manipulator of digital information. It

2033.1s used to be just humans through GUIs or

2035.4s computers through APIs. And now we have

2037.5s a completely new thing and agents are

2040.2s they're computers but they are humanlike

2042.8s kind of right they're people spirits

2044.3s there's people spirits on the internet

2045.6s and they need to interact with our

2046.7s software infrastructure like can we build for them it's a new thing so as an example you can have robots.txt on your domain and you can instruct uh or like advise I suppose um uh web crawlers on how to behave on your website in the same way you can have maybe lm.txt txt

2048.3s build for them it's a new thing so as an

2050.6s example you can have robots.txt on your

2053.0s domain and you can instruct uh or like

2055.1s advise I suppose um uh web crawlers on

2058.3s how to behave on your website in the

2059.8s same way you can have maybe lm.txt txt

2061.5s file which is just a simple markdown that's telling LLMs what this domain is about and this is very readable to a to an LLM. If it had to instead get the HTML of your web page and try to parse it, this is very errorprone and difficult and will screw it up and it's

2063.4s that's telling LLMs what this domain is

2065.7s about and this is very readable to a to

2068.1s an LLM. If it had to instead get the

2070.6s HTML of your web page and try to parse

2072.5s it, this is very errorprone and

2073.8s difficult and will screw it up and it's

2075.7s not going to work. So we can just directly speak to the LLM. It's worth it. Um a huge amount of documentation is currently written for people. So you will see things like lists and bold and pictures and this is not directly accessible by an LLM. So I see some of the services now are transitioning a lot

2076.8s directly speak to the LLM. It's worth

2078.4s it. Um a huge amount of documentation is

2081.3s currently written for people. So you

2082.7s will see things like lists and bold and

2085.6s pictures and this is not directly

2087.8s accessible by an LLM. So I see some of

2091.2s the services now are transitioning a lot

2092.8s of the their docs to be specifically for LLMs. So Versell and Stripe as an example are early movers here but there are a few more that I've seen already and they offer their documentation in markdown. Markdown is super easy for LMS to understand. This is great. Um maybe one simple example from from uh my

2094.9s LLMs. So Versell and Stripe as an

2097.0s example are early movers here but there

2099.4s are a few more that I've seen already

2101.9s and they offer their documentation in

2104.2s markdown. Markdown is super easy for LMS

2106.7s to understand. This is great. Um maybe

2110.1s one simple example from from uh my

2112.3s experience as well. Maybe some of you know three blue one brown. He makes beautiful animation videos on YouTube. beautiful animation videos on YouTube. Applause wrote uh Manon and I wanted to make my own and uh there's extensive documentations on how to use manon and so I didn't want to actually read

2114.1s know three blue one brown. He makes

2115.6s beautiful animation videos on YouTube.

2119.4s beautiful animation videos on YouTube. Applause

2125.0s wrote uh Manon and I wanted to make my

2127.4s own and uh there's extensive

2130.1s documentations on how to use manon and

2132.6s so I didn't want to actually read

2134.0s through it. So I copy pasted the whole thing to an LLM and I described what I wanted and it just worked out of the box like LLM just bcoded me an animation exactly what I wanted and I was like wow this is amazing. So if we can make docs legible to LLMs, it's going to unlock a

2135.4s thing to an LLM and I described what I

2137.4s wanted and it just worked out of the box

2139.2s like LLM just bcoded me an animation

2141.4s exactly what I wanted and I was like wow

2143.3s this is amazing. So if we can make docs

2145.8s legible to LLMs, it's going to unlock a

2148.2s huge amount of um kind of use and um I think this is wonderful and should should happen more. The other thing I wanted to point out is that you do unfortunately have to it's not just about taking your docs and making them appear in markdown. That's the easy part. We actually have to change the

2151.2s think this is wonderful and should

2152.4s should happen more. The other thing I

2155.1s wanted to point out is that you do

2156.2s unfortunately have to it's not just

2157.7s about taking your docs and making them

2159.0s appear in markdown. That's the easy

2160.6s part. We actually have to change the

2161.9s docs because anytime your docs say click this is bad. An LLM will not be able to natively take this action right now. So, Verscell, for example, is replacing every occurrence of click with an equivalent curl command that your LM agent could take on your behalf. Um, and so I think this is very interesting. And

2164.7s this is bad. An LLM will not be able to

2166.8s natively take this action right now. So,

2169.9s Verscell, for example, is replacing

2171.5s every occurrence of click with an

2173.5s equivalent curl command that your LM

2175.4s agent could take on your behalf. Um, and

2178.2s so I think this is very interesting. And

2179.8s then, of course, there's a model context protocol from Enthropic. And this is also another way, it's a protocol of speaking directly to agents as this new consumer and manipulator of digital information. So, I'm very bullish on these ideas. The other thing I really like is a number of little tools here and there that are helping ingest data

2181.4s protocol from Enthropic. And this is

2183.0s also another way, it's a protocol of

2184.9s speaking directly to agents as this new

2186.7s consumer and manipulator of digital

2188.2s information. So, I'm very bullish on

2189.7s these ideas. The other thing I really

2191.5s like is a number of little tools here

2193.5s and there that are helping ingest data

2196.6s that in like very LLM friendly formats. So for example, when I go to a GitHub repo like my nanoGPT repo, I can't feed this to an LLM and ask questions about it uh because it's you know this is a human interface on GitHub. So when you just change the URL from GitHub to get

2198.7s So for example, when I go to a GitHub

2200.2s repo like my nanoGPT repo, I can't feed

2202.7s this to an LLM and ask questions about

2204.3s it uh because it's you know this is a

2206.7s human interface on GitHub. So when you

2208.9s just change the URL from GitHub to get

2210.5s ingest then uh this will actually concatenate all the files into a single giant text and it will create a directory structure etc. And this is ready to be copy pasted into your favorite LLM and you can do stuff. Maybe even more dramatic example of this is deep wiki where it's not just the raw

2212.3s concatenate all the files into a single

2214.2s giant text and it will create a

2215.9s directory structure etc. And this is

2217.5s ready to be copy pasted into your

2219.0s favorite LLM and you can do stuff. Maybe

2221.5s even more dramatic example of this is

2223.4s deep wiki where it's not just the raw

2225.4s content of these files. uh this is from Devon but also like they have Devon basically do analysis of the GitHub repo and Devon basically builds up a whole docs uh pages just for your repo and you can imagine that this is even more helpful to copy paste into your LLM. So I love all the little tools that

2228.6s Devon but also like they have Devon

2231.0s basically do analysis of the GitHub repo

2232.9s and Devon basically builds up a whole

2234.6s docs uh pages just for your repo and you

2238.0s can imagine that this is even more

2239.8s helpful to copy paste into your LLM. So

2242.1s I love all the little tools that

2243.4s basically where you just change the URL and it makes something accessible to an LLM. So this is all well and great and u I think there should be a lot more of it. One more note I wanted to make is that it is absolutely possible that in the future LLMs will be able to this is

2245.0s and it makes something accessible to an

2246.6s LLM. So this is all well and great and u

2249.5s I think there should be a lot more of

2250.7s it. One more note I wanted to make is

2252.7s that it is absolutely possible that in

2255.3s the future LLMs will be able to this is

2258.0s not even future this is today they'll be able to go around and they'll be able to click stuff and so on but I still think it's very worth u basically meeting LLM halfway LLM's halfway and making it easier for them to access all this information uh because this is still fairly expensive I would say to use and

2259.6s able to go around and they'll be able to

2260.8s click stuff and so on but I still think

2262.6s it's very worth u basically meeting LLM

2266.1s halfway LLM's halfway and making it

2268.6s easier for them to access all this

2269.9s information uh because this is still

2271.7s fairly expensive I would say to use and

2274.4s uh a lot more difficult and so I do think that lots of software there will be a long tail where it won't like adapt apps because these are not like live player sort of repositories or digital infrastructure and we will need these tools. Uh but I think for everyone else I think it's very worth kind of like

2276.6s think that lots of software there will

2278.2s be a long tail where it won't like adapt

2280.6s apps because these are not like live

2282.2s player sort of repositories or digital

2284.5s infrastructure and we will need these

2286.2s tools. Uh but I think for everyone else

2288.4s I think it's very worth kind of like

2289.7s meeting in some middle point. So I'm bullish on both if that makes sense. So in summary, what an amazing time to get into the industry. We need to rewrite a ton of code. A ton of code will be written by professionals and by coders. These LLMs are kind of like utilities, kind of like fabs, but

2291.8s bullish on both if that makes sense.

2294.6s So in summary, what an amazing time to

2297.1s get into the industry. We need to

2298.6s rewrite a ton of code. A ton of code

2300.7s will be written by professionals and by

2303.0s coders. These LLMs are kind of like

2305.6s utilities, kind of like fabs, but

2307.5s they're kind of especially like operating systems. But it's so early. It's like 1960s of operating systems and uh and I think a lot of the analogies cross over. Um and these LMS are kind of like these fallible uh you know people spirits that we have to learn to work with. And in order to do that properly,

2308.8s operating systems. But it's so early.

2311.0s It's like 1960s of operating systems and

2314.3s uh and I think a lot of the analogies

2316.1s cross over. Um and these LMS are kind of

2319.0s like these fallible uh you know people

2321.6s spirits that we have to learn to work

2323.4s with. And in order to do that properly,

2325.6s we need to adjust our infrastructure towards it. So when you're building these LLM apps, I describe some of the ways of working effectively with these LLMs and some of the tools that make that uh kind of possible and how you can spin this loop very very quickly and basically create partial tunneling

2327.7s towards it. So when you're building

2329.0s these LLM apps, I describe some of the

2330.6s ways of working effectively with these

2332.8s LLMs and some of the tools that make

2334.7s that uh kind of possible and how you can

2337.0s spin this loop very very quickly and

2339.0s basically create partial tunneling

2340.8s products and then um yeah, a lot of code has to also be written for the agents more directly. But in any case, going back to the Iron Man suit analogy, I think what we'll see over the next decade roughly is we're going to take the slider from left to right. And I'm

2343.5s has to also be written for the agents

2344.9s more directly. But in any case, going

2347.2s back to the Iron Man suit analogy, I

2349.5s think what we'll see over the next

2350.9s decade roughly is we're going to take

2352.7s the slider from left to right. And I'm

2355.9s very interesting. It's going to be very interesting to see what that looks like. And I can't wait to build it with all of you. Thank you.

2357.6s interesting to see what that looks like.

2359.4s And I can't wait to build it with all of

2361.5s you. Thank you.

Andrej Karpathy: Software Is Changing (Again)

Full Transcript