专访前FAIR研究总监田渊栋：Meta裁员之后，对AI的一些遗憾与思考【对话】

Name: 专访前FAIR研究总监田渊栋：Meta裁员之后，对AI的一些遗憾与思考【对话】
Uploaded: 2026-01-26T23:44:46.344
Duration: 2349 s
Description: Read the full transcript of "专访前FAIR研究总监田渊栋：Meta裁员之后，对AI的一些遗憾与思考【对话】" by 硅谷101. Practice English listening and reading with interactive subtitles on xLearning.

Read the full transcript of "专访前FAIR研究总监田渊栋：Meta裁员之后，对AI的一些遗憾与思考【对话】" by 硅谷101. Practice English listening and reading with interactive subtitles on xLearning.

Channel: 硅谷101 Duration: 39 min Sentences: 108

2025年10月底，Meta在AI部门进行大规模裁员约600人，前FAIR研究总监田渊栋是这次风波的核心人物之一。我们在上个视频中聊了聊Meta的开源AI路线是如何遭遇滑铁卢的，这期我们将完成采访放出来，除去公司层面，田渊栋分享了更具价值的思考：这位在Meta工作十年多的科学家在采访中坦言：被裁并不意外。他认为AI行业正迎来大趋势，即AI正在自动化AI本身，未来“执行层”的人会越来越少。他为何称“Scaling Law是一个悲观的未来”？大语言模型路线的最大问题是什么，为何它与人类学习效率相差千倍？面对激烈的AI人才战，他反而建议大家不要去追逐“稀缺性” ，应该追随兴趣而非市场信号。理想主义的AI研究者将何去何从？采访嘉宾：田渊栋，前Meta FAIR研究总监、AI科学家（大家可以关注他的X： @tydsh）你会看到： 00:00 - 02:33 Meta大裁员背后，一位科学家的转身与思考 02:33 - 07:51 被裁并不意外，只是一次被加速的个人选择 07:51 - 10:02 行业趋势：AI自动化程度提高，未来“执行层”的人会变少 10:02 - 13:31 开源仍会继续，模型的“用途”才是核心问题 13:31 - 16:17 LLM的最大问题：需海量数据，梯度下降并不是好方案 16:17 - 19:04 强化学习的潜力：AI如何“主动学习”产生更高质量数据 19:04 - 24:57 AGI仍需几十年，为何Scaling Law是悲观未来 24:57 - 26:38 人类洞察力难以取代，下一步希望结合前沿研究与自动化应用 26:38 - 28:34 救火Llama4之前：研究连续思维链对模型的提升 28:34 - 31:14 回顾FAIR十年生涯：遗憾工程做得太少，感谢收获“研究品味” 31:14 - 34:28 谈AI人才战：追逐“稀缺”不如做自己想做的事 34:28 - 39:09 理想主义科学家的下一步：应用与研究的交汇点【关于硅谷101】我们是由海内外一线媒体记者/主持人创办的栏目，深度解析硅谷创新趋势，以轻松的风格分享科技行业的最新动态。我们采过顶级科技大佬，积累了数万小时的媒体经验，做过调查性报道，操盘过千万级传播量的知名深度稿，引发全国讨论和微博热搜；致力于将最专业的媒体素养和信息搜集能力转化为易传播的新媒体力量。旗下同名播客栏目：https://www.youtube.com/@valley101podcast 关注我们，从这里驶向未来。联系我们：video@sv101.net 【往期精彩视频】 https://www.youtube.com/watch?v=0mrko3cYqBs&t=1499s https://www.youtube.com/watch?v=EUsIYVtt3y0&t=1162s https://www.youtube.com/watch?v=8ndEN6VVlUw&t=778s https://www.youtube.com/watch?v=9RMvKWHgyF8&t=1093s https://www.youtube.com/watch?v=sdsiJAyO4Fw&t=1144s https://www.youtube.com/watch?v=sdsiJAyO4Fw&t=1164s https://www.youtube.com/watch?v=DFyc0rFBptE&t=566s https://www.youtube.com/watch?v=Tzp0EF-sysI&t=364s https://www.youtube.com/watch?v=44skDPOPbgo https://www.youtube.com/watch?v=65TkswVEd5Y&t=422s https://www.youtube.com/watch?v=8uHur4G1ZVI&t=2240s https://www.youtube.com/watch?v=H-G76kecXfY&t=1136s https://www.youtube.com/watch?v=h8FTJxGu_hA&t=704s https://www.youtube.com/watch?v=_f2c0eZ-N7M&t=302s https://www.youtube.com/watch?v=3el8ppdGnSk https://www.youtube.com/watch?v=AGLzjvykebo https://www.youtube.com/watch?v=pdcT2jwXP0s&t https://www.youtube.com/watch?v=2PSCnOFkR3U&t https://www.youtube.com/watch?v=b_OpjUz7zN8 https://www.youtube.com/watch?v=CfxuqbB9dj4&t=2510s https://www.youtube.com/watch?v=Jy5OozI1zMs https://www.youtube.com/watch?v=58bS_-215vI&t=137s https://www.youtube.com/watch?v=sKVVl89pv_0&t=107s https://www.youtube.com/watch?v=PSfdilqMxRM&t=875s https://www.youtube.com/watch?v=LyCv2Dm9A1o&t=1725s https://www.youtube.com/watch?v=lyAP8bRKUJo&t=1472s

Watch original video on YouTube →
Start Learning with Interactive Transcript

Full Transcript

0.0s I actually had an offer before I was laid off. I'd been with the company for over ten years , so maybe this was the perfect opportunity to step out and see what the future holds. Do you think LLM Large Language Modeling is the right path? I think LLM is a very interesting path,

2.3s I'd been with the company for over ten years

3.4s , so maybe this was the perfect opportunity

5.7s to step out and see what the future

6.8s holds. Do you think LLM Large Language Modeling is the right path?

9.5s I think LLM is a very interesting path,

12.1s but I don't know if it's the right one. Scaling Law is a pessimistic future because, frankly, the topic of Scaling Law itself is quite strange. So, what's the biggest problem with large language models right now? The biggest problem is that they require a lot of data. It 's the same as with autonomous driving before.

13.7s Scaling Law is a pessimistic future

15.4s because, frankly, the topic

17.2s of Scaling Law

18.0s itself is quite strange.

21.0s So, what's the biggest problem with large language models right now? The

24.2s biggest problem is that they require a lot of data. It

26.6s 's the same as with autonomous driving before.

28.0s Initially, progress was very fast, and everyone thought it would soon replace humans, but the further you go, the bigger the problems become. Why? Because good insight and good data are becoming increasingly scarce and difficult to find. With less and less data , your model can't be trained. What do you think of the RL Reinforcement Learning path?

29.3s and everyone thought it would soon replace humans,

31.1s but the further you go,

33.2s the bigger the problems

33.9s become. Why?

34.3s Because good insight and

35.9s good data are becoming increasingly

36.8s scarce and difficult to find.

37.5s With less and less data

38.3s , your model can't be trained.

40.0s What do you think of the RL Reinforcement Learning path?

42.0s The biggest advantage of reinforcement learning is that it's active learning it can have a very positive impact on the distribution of data. This is its core. Do you have any regrets about FAIR? I should have done more in FAIR's engineering work, maybe even better. My biggest gain should be after 2018.

44.8s it can have a very positive impact on the distribution of data.

47.8s This is its core.

49.3s Do you have any regrets about FAIR?

51.0s I should have done more in FAIR's engineering work,

53.4s maybe even better.

54.2s My biggest gain should be after 2018.

57.3s I should have had a lot of research during this period. If you have a taste for research , it means setting a path for yourself that you can keep moving forward from. What's your next step? Hello everyone, welcome to Silicon Valley 101, I'm Chen Qian. On October 22, 2025, Meta CEO Mark Zuckerberg approved

59.0s during this period.

61.0s If you have a taste for research ,

62.3s it means setting a path for yourself

64.3s that you can keep moving forward from.

65.2s What's your next step? Hello

69.1s everyone,

69.8s welcome to Silicon Valley 101,

71.2s I'm Chen Qian.

72.3s On October 22, 2025,

74.0s Meta CEO Mark Zuckerberg

75.8s approved

78.6s a plan to lay off approximately 600 employees from the company's artificial intelligence division . This is Meta's largest layoff in the AI field this year, mainly targeting the core RD department known as the Super Intelligence Lab. So why is Meta carrying out this layoff? How did the company's open-source AI approach encounter obstacles,

81.5s from the company's artificial intelligence division

83.5s . This is Meta's largest layoff

85.5s in the AI field this year, mainly targeting

87.2s the core RD department known as the Super Intelligence Lab.

90.1s So why is Meta carrying out this layoff?

92.4s How did the company's open-source AI approach encounter obstacles,

95.2s and what about the new AI head parachuted in, Alex? We discussed how Wang will reshape Meta's AI strategy in the previous episode, which you can find on our homepage. We also interviewed Tian Yuandong, a key figure in the recent layoffs, former FAIR Research Director and AI scientist . Our interview covered more than just Meta

98.2s will reshape Meta's AI strategy

100.7s in the previous episode, which

103.8s you can find on our homepage.

106.5s We also interviewed

107.9s Tian Yuandong, a key figure in the recent layoffs,

111.1s former FAIR Research Director

113.2s and AI scientist

115.7s . Our interview covered more than just Meta

118.2s I think what's more interesting and valuable is the reflection of these senior AI scientists on AI roadmaps and future cutting-edge research, beyond the company level. So, in this video, I'm sharing the full interview. This version has removed the repetition from the previous video and focuses more on AI development itself,

120.3s is the reflection

122.3s of these senior AI scientists

124.2s on AI roadmaps and future cutting-edge research, beyond the company level.

127.6s So, in this video,

128.6s I'm sharing the full interview.

132.5s This version

133.5s has removed the repetition from the previous video and

136.6s focuses more on AI development itself,

139.3s especially the LLM roadmap for large language models, the existence of openclosed source research labs , and the choices AI talent makes between RD and engineering. I hope this is helpful. Here's the interview content You're still wearing that FAIR uniform . I think generally, people like us don't care much about clothing, right?

143.6s research labs

145.0s , and the choices

147.8s AI talent makes between RD and engineering.

149.0s I hope this is helpful.

150.7s Here's the interview content

153.8s You're still wearing that FAIR uniform

157.6s . I think

159.0s generally, people like us

160.3s don't care much about clothing, right?

162.5s So we wear whatever the company provides, maybe even change. How have the past few days been for you? I know many people are actually here to reach out to... You contact me Yes, and then whether it's the media or many companies, they all came to you. What was your mindset? I think it was like this

165.9s maybe even change.

167.8s How have the past few days been for you?

169.8s I know many people are actually here to reach out to... You contact me

172.4s Yes, and then whether it's the media

174.4s or many companies, they

177.3s all came to you.

178.5s What was your mindset?

181.3s I think it was like this

182.4s because I actually already had an offer before I was laid off . Before I was laid off, I had already told my superiors that I wasn't very happy and that I might want to look around . They knew that, so I wasn't particularly surprised by the layoff. It didn't matter, since I had an offer anyway.

187.0s . Before I was laid off,

188.2s I had already told my superiors

191.5s that I wasn't very happy and

193.1s that I might want to look around

195.1s . They knew

196.3s that, so I wasn't particularly surprised by

200.7s the layoff. It

201.6s didn't matter, since I had an offer anyway.

203.4s I had told them before that , of course, after receiving the offer, I thought I would stay at Meta for a while longer because I still have GPU computing power, right? I can still do some more things. But since they laid me off, well, that's it, right? So, in short,

204.9s , of course, after receiving the offer,

207.1s I thought I would stay at Meta for a while longer

209.0s because I still have GPU computing power, right?

210.7s I can still do some more things.

212.1s But since they laid me off,

213.4s well, that's it, right?

217.1s So, in short,

218.8s those two years... I've received a lot of contact from people, including many from large companies, and many chatting with me about job opportunities. I've contacted almost every company you can think of , and they've all been at a high level. There are also many smaller companies and co-founding opportunities. So, there are many opportunities.

221.6s including many from large companies,

223.7s and

224.4s many

226.3s chatting with me

228.1s about

229.3s job opportunities.

230.3s I've contacted almost every company you can think of

232.8s , and they've all been at a high level. There are also

234.5s many smaller companies

235.4s and co-founding opportunities.

236.8s So, there are

238.3s many opportunities.

240.7s Right now, I'm still thinking about it and haven't decided yet. But since it's less than a week , less than 168 hours , before the layoffs , I still need to think about it. Was the layoff something you expected? Did you sense it was coming? Otherwise, I wouldn't be looking for a job.

243.1s and haven't decided

243.8s yet.

246.7s But since it's less than a week , less than 168 hours

249.3s , before the layoffs

251.8s , I still need to think about

253.9s it.

255.0s Was the layoff something you expected? Did

257.4s you sense it was coming?

262.1s Otherwise, I wouldn't be looking for a job.

263.8s So, I have some... I feel that , personally, I think this place, at some point in time , is a good opportunity for me to leave and see the world, at least for me, since I've been with the company for over ten years. As for the situation within the company,

265.4s , personally, I think this place,

267.8s at some point in time

270.1s , is a

271.3s good opportunity

272.3s for me

272.7s to leave and see

273.8s the world, at least for me,

276.2s since

277.8s I've been with the company for over ten years.

279.3s As for the situation within the company,

282.7s I'm not in a position to comment right now , but it's a personal choice , and this round of layoffs has accelerated that decision. I might have stayed with the company a little longer , maybe another six months , and then reconsidered. But since I've already left, I've left. I think laying off 600 people is quite shocking

285.2s , but it's a personal choice

287.7s , and this round of layoffs

288.9s has accelerated that decision.

291.5s I might have stayed with the company a little longer

295.0s ,

296.0s maybe another six months

296.9s , and then reconsidered.

297.9s But since

301.4s I've already left,

302.5s I've left.

304.5s I think laying off 600 people

306.6s is quite shocking

307.9s I felt it was a lot, even though it wasn't a complete layoff , just that some... The opportunity to transfer to other groups is just that your AI department feels there's no need for so many positions here, and the department needs to be restructured. I think we should actually talk about industry trends.

308.8s felt it was a lot,

310.0s even though it wasn't a complete layoff

311.8s , just that some... The opportunity to transfer to other groups is

314.4s just that your AI department feels

317.8s there's no need for so many positions here, and

319.9s the department needs to be restructured.

322.5s I think we should actually

324.3s talk about industry

326.2s trends.

326.7s We won't go into the specifics of the recent meta-analysis, because I can't reveal too much . I think the industry trend is definitely that because AI itself has the highest degree of automation, today we have many people labeling data, but tomorrow the model might be stronger and we won't need so many people labeling data,

329.0s because I can't reveal too much

331.2s . I think the industry trend is definitely

333.5s that because AI itself has the highest degree of automation,

338.4s today we have many people labeling data,

340.2s but tomorrow the model might be stronger

341.5s and we won't need so many people labeling data,

343.1s and the day after tomorrow the model will be even stronger and we'll need fewer people. And in the past, I've heard all sorts of stories, though I haven't experienced it myself. For example, there used to be on- call systems where if the model crashed halfway through transmission, you could call back and they'd immediately fix it, adjust parameters

344.4s and we'll need fewer people. And

346.0s in the past,

348.2s I've heard all sorts of stories,

349.7s though I haven't experienced it myself.

350.8s For example, there used to be on-

353.1s call systems where if the model crashed halfway through

356.2s transmission, you could

357.4s call back

359.4s and they'd immediately fix it, adjust parameters

361.4s , and see if they could recover it. But now, because there are many automated tools and the whole system is well-designed, these kinds of things have become much less common. So, you can believe that... Then, as various pipelines project processes gradually mature and become automated, do you think a large number of people are needed? Not necessarily.

363.4s But now, because there are many automated tools

366.0s and the whole system is well-designed,

368.2s these kinds of things have become much less common.

372.5s So, you can believe that...

375.1s Then, as various pipelines project processes

376.8s gradually mature and become automated,

379.4s do you think a large number of people are needed?

381.0s Not necessarily.

381.9s So I think the general trend is that fewer and fewer people will be laid off , or that fewer and fewer people will be doing this kind of work. So you think this round of layoffs isn't just a problem with Meta, but rather a general trend where more and more engineers, or those working in AI, will be

386.1s , or that fewer and fewer people will be doing this kind of work.

388.9s So

391.6s you think this round of layoffs

393.3s isn't just a problem with Meta,

396.5s but rather a general trend where more and more engineers,

402.0s or those working in AI,

404.9s will be

405.6s laid off. The general trend is that one day, everyone will be unemployed. I think this is a very alarming trend. It's like that , or rather, there won't be traditional jobs where I'm employed by a company and I help that company do its work. Maybe in the future, that won't be necessary.

409.3s that one day, everyone will be unemployed.

411.1s I think this is a

413.1s very alarming trend. It's like that

415.1s , or rather, there won't be traditional jobs

418.9s where I'm employed by a company

420.5s and I help that company do its work.

423.5s Maybe in the future, that won't be necessary.

425.3s For example, if... If I were to become a CEO , a leader of a small company , or start my own business, with these tools at my disposal, I would realize that I wouldn't need as many people to do many things. Many tasks are automated , and to a very high degree . So, what

427.3s , a leader of a small company

430.3s , or start my own business,

434.0s with these tools at my disposal,

436.1s I would realize that

437.3s I wouldn't need as many people to do many things.

440.4s Many tasks are automated

441.5s , and to a very high degree

443.3s . So, what

446.4s might have previously required a team of hundreds or thousands of people to do something now might not require that many. Many tasks can be automated using agents. Therefore, I think that in general, fewer people will be working on AI itself , but more and more people will be exploring using AI as tools to explore other things

447.7s of hundreds or thousands of people to do something

450.8s now might not require that many.

452.8s Many tasks

454.0s can be automated using agents.

456.7s Therefore,

457.6s I think that in general,

461.3s fewer people will be working on AI itself

464.3s , but

465.6s more and more people will be

466.6s exploring using AI as tools to explore other things

469.9s . That's roughly the process. Do you think there will be fewer people researching Foundation Models ? Yes, that 's true. There will likely be more and more exploratory research on the model base model , but fewer and fewer people will simply build and train the model according to our previous engineering logic. This is because we'll find

471.3s Do you think

472.0s there will be fewer people researching Foundation Models

473.9s ? Yes, that

475.5s 's true.

476.1s There will likely be more and more exploratory research

478.1s on the model base model

481.0s , but fewer and fewer people will simply

485.1s build and train the model

487.5s according to our previous engineering logic.

489.1s This is because we'll find

491.3s that everyone follows the same logic to train the model, and the code will all run and be effective. Why would we need so many people? Many will say we can do research or other exploratory work, and those people will increase. And there will also be more and more people developing applications . But these applications aren't general applications

494.0s to train the model,

495.8s and the code will all run

497.0s and be effective.

499.1s Why would we need so many

500.6s people? Many will say we can do research

502.9s or other exploratory work,

504.8s and those people will increase. And there

506.7s will also be more and more people developing applications

509.3s . But these applications

510.7s aren't general applications

513.2s they'll often be implemented in a specific vertical field or use this technology... There will likely be more and more people doing what you want to do now , but this applies to the middle layer, the execution team. For those doing execution , their work is repetitive, right? Many things need fixing or processing.

513.6s be implemented in a specific vertical field

515.9s or use this technology... There

517.1s will likely be more and more people

519.0s doing what you want to do now

520.4s , but this

524.7s applies to the middle layer, the execution team.

527.9s For those doing execution

529.5s , their work is repetitive, right?

531.6s Many things need fixing or processing.

533.9s But as tools become more automated, repetitive labor will decrease. That's the general feeling. Before this layoff, what were you researching at FAIR? Before the layoffs , actually, in January of this year, 2011, I went to GenAI to help out. During that time, we weren't doing research most of the time we were doing various emergency response tasks.

536.1s repetitive labor will decrease.

538.3s That's the general feeling.

539.7s Before this layoff,

541.2s what were you researching at FAIR?

544.2s Before the layoffs

544.8s , actually, in

547.1s January of this year, 2011, I went to

548.6s GenAI to help out.

550.4s During that time,

553.1s we weren't doing research most of the time

555.5s we were doing various emergency response tasks.

557.8s Right, that was Llama. 4 Llama 4 Yes, of course. I personally still have some collaborative work with other friends . For example, in April or May of this year, we published an article analyzing the theoretical strengths of our previous Continuous Thinking Chain. This analysis was quite effective and influential. People felt that it added a note

560.0s Llama 4

561.1s Yes, of course. I personally still have some

563.9s collaborative work with

565.9s other friends

567.9s . For example, in April or May of this year, we published an article

570.6s analyzing the

574.0s theoretical strengths

574.7s of our previous Continuous Thinking Chain.

576.1s This analysis was quite

578.3s effective

579.3s and influential.

581.3s People felt that

583.6s it added a note

585.6s to the Continuous Thinking Chain Coconut article, indicating that we had indeed done a more in-depth theoretical analysis . This analysis made the Continuous Thinking Chain approach more reasonable , and more work might be done on it. You can talk about the future development of open source and closed source. You think that because many outsiders say that

587.3s article,

588.4s indicating that we had indeed

591.8s done a more in-depth

594.4s theoretical

595.2s analysis . This analysis made the Continuous Thinking Chain

597.5s approach more reasonable

600.3s , and more work might be done on it.

602.2s You can talk about

603.2s the future development of open source and closed source.

606.0s You think that because many

608.2s outsiders say that

609.3s open source is not feasible in a large company's architecture, because the competition in cutting-edge models is too fierce, and others are closing source, you may not be able to persist in open source. Do you think that the gap between open source and closed source models will become wider and wider , and will anyone

612.1s in a large company's architecture,

612.8s because the competition in cutting-edge models is too fierce, and

615.8s others are closing source,

618.0s you may not be able to persist in open source.

620.0s Do you think that the gap

622.6s between open source and closed source models will become

625.4s wider and wider

627.3s , and will anyone

629.6s still do open source? Many companies, especially in China, are doing open source. But I think there will still be open source in Silicon Valley. For example, I know some companies like Reflection, right ? AI developers are likely working on open-source models, right? They have many requirements and ideas to explore these things. OpenAI previously developed an

631.0s companies, especially in China, are doing open

632.9s source. But

634.2s I think there

635.5s will still be open source in Silicon Valley.

637.6s For example, I know some companies

640.0s like Reflection, right

641.2s ? AI

641.7s developers are likely working on open-source models, right?

645.1s They have many requirements

647.3s and ideas to explore these things.

649.6s OpenAI previously developed an

651.1s open-source GPT-OSS model , so I think open source will continue, and it certainly will. Ai2 is also working on open-source projects. The bigger question is , what are the uses of these models ? Whether open-source or closed-source, once a model is available , it can be used as a chat tool, a search tool, or a productivity tool.

654.2s , so I think open source will continue,

658.2s and it certainly will.

661.6s Ai2 is also working on open-source projects.

663.8s The

664.3s bigger

667.2s question

668.7s is , what are

670.1s the uses of these models

672.1s ? Whether open-source or closed-source,

674.2s once a model is available

675.9s , it can be used

677.7s as a chat tool,

678.0s a search tool, or

679.1s a productivity tool.

680.1s Large companies might work on these, but there are many other directions. For example, the model can be used for scientific research , scientists' work , or work in vertical fields . Small companies can do this. That 's roughly it. So, at a certain point, how powerful does the model need to be to solve this problem?

682.8s but

684.3s there are many other directions.

686.8s For example, the model can be used for scientific research

690.0s , scientists' work

691.3s , or work in vertical fields

693.2s . Small companies can do this. That

696.5s 's roughly it.

698.2s So, at a certain point,

700.4s how powerful does the model need

701.9s to be to solve this problem?

704.5s That's probably the question. This is a problem that varies from person to person or problem to problem. Ultimately, we find that in different fields, do we really need a model that is strong in all aspects? Not necessarily. It might only be strong in the areas you care about. At this point, differentiation may begin.

707.7s or problem to problem.

710.9s Ultimately, we find that in different fields, do

714.3s we really need a model that is strong in all aspects?

718.0s Not

718.8s necessarily. It might only be strong in the areas you care about.

722.9s At this point, differentiation may begin.

724.3s Each person and each model may have their own ideas, and each company may have its own purpose in developing this model. As a result, there will be all sorts of different models doing different things. In this situation, there may be different strategies, right? Some models may want to be open source because after being open sourced ,

725.9s may have their own ideas, and

727.6s each company may have its own purpose in developing this model. As

730.6s a result, there will be all sorts of different models doing different things.

733.9s In this situation,

736.1s there may be different strategies, right?

739.1s Some models may want to be open source

741.0s because after being open sourced

741.6s ,

742.2s everyone can use them to build a community, right? Or as a tool platform. In this case, open source is very reasonable. For example, I have a model that, after being trained, can call a certain standard toolkit, and then I can use the standard toolkit... If I could use this model to create a platform for everyone to

745.5s Or as a tool platform.

747.2s In this case, open source is very reasonable.

749.6s For example, I have a model

750.9s that, after being trained,

752.9s can call a certain standard toolkit,

755.8s and then I can use the standard toolkit...

756.9s If I could use this model to create a platform

758.7s for everyone to

759.7s use, then it would definitely be open source. However, for other fields, such as personalized search or personalized recommendations, I'd be less willing to open source such models, right? Or perhaps everyone trains their own model but doesn't open source it. So ultimately, it depends on the ultimate goal, not on whether

762.0s would definitely be open source.

763.8s However,

765.1s for other fields, such as personalized search

769.3s or personalized recommendations,

771.4s I'd be less willing to open source such models,

773.9s right?

774.7s Or perhaps everyone trains their own model

776.2s but doesn't open source it.

777.8s So ultimately, it depends on the ultimate goal,

780.7s not on whether

782.6s open source or closed source is better or worse . Ultimately, it depends on the company's strategy, because every company and every individual has different strategies . So, you might think that in state-of-the-art SOTA models, it's difficult for an open source model to directly compete with a closed source model , but in many smaller, niche models,

784.7s Ultimately, it depends on the company's strategy,

787.3s because every company and every individual has different

790.1s strategies

792.5s .

795.1s So, you might think that in state-of-the-art SOTA models,

797.2s it's difficult for an open source

799.7s model

802.1s to directly compete with a closed source model

803.8s , but in many smaller, niche models,

806.8s there are still many, many opportunities for open source. That's how it is, right? Do you think LLM Large Language Model is the right path? I think LLM is a very interesting path , but I don't know if it's the right one. Because I think you ultimately agree with Yann on this point. LeCun? That's hard to say.

809.7s That's how it is,

812.2s right? Do you think LLM Large Language Model is the right path?

816.1s I think LLM is a very interesting path

819.2s , but I don't know if it's the right one.

821.3s Because I think

823.0s you ultimately agree with Yann on this point. LeCun?

827.5s That's hard to say.

828.2s I think we're all scientists, so people with a scientist's mindset always feel that they want to find something better, rather than being satisfied with the current framework and working on it until the end. That's definitely not the way I'm going to be. So I always say there are all sorts of possible problems , and

830.1s we're all scientists,

832.1s so people with a scientist's mindset always feel

836.0s that they want to find something better,

838.6s rather than being satisfied with the current framework and working on it until the end.

841.3s That's definitely not the way

843.0s I'm going to be.

844.3s So I always say there are all sorts of possible problems

848.0s , and

849.1s how to solve these problems in other ways is a huge issue. So the biggest problem with large language models right now is that they require a lot of data. And while the quality of the trained model is certainly very good, it's definitely not as efficient as a human's. This is a huge problem

851.5s is a huge issue.

853.0s So the biggest problem with large language models right now is that

856.4s they require a lot

859.6s of data.

860.7s And while the quality of the trained model

862.9s is certainly very good, it's

865.0s definitely not as efficient as a human's.

868.7s This is a huge problem

870.2s because for humans, the number of samples you learn is very small, and the number of tokens you can learn in your lifetime is probably only, for example, at most, on the order of 10 billion, especially text tokens. I've also mentioned this before. I calculated this number on a slide presentation

873.2s and the number of tokens you can learn in your lifetime

876.4s is probably only,

877.3s for example, at most, on the order of 10 billion,

879.1s especially text tokens.

881.9s I've also mentioned this before. I calculated this number

885.0s on a slide presentation

886.2s , but the training data for large language models can easily reach 10 trillion or 30 trillion, right? There's a 1000-fold difference . How can you use human learning ability to bridge this 1000-fold gap ? It's very difficult. But humans can learn very well. We know that throughout human history,

888.7s can easily reach 10 trillion

890.5s or 30 trillion, right?

893.6s There's a 1000-fold difference

896.7s . How can you use human learning ability to bridge

897.5s this 1000-fold gap

900.4s ? It's very difficult.

901.5s But humans can learn very well.

902.6s We know that throughout human history,

905.7s there have been all sorts of incredibly talented scientists, right? Their ideas and approaches were unique. They didn't have access to many books or much data at the time , yet they were able to discover interesting new theorems, new proofs, new findings , or new inventions . So where did they get these abilities?

908.0s Their ideas and approaches were unique.

911.7s They didn't have access to many books

913.3s or much data at the time

915.1s , yet they were able to discover interesting

918.0s new theorems, new proofs, new findings

920.6s , or new inventions

921.9s . So

924.0s where did they get these abilities?

926.7s Now, with so many tokens being put into large language models , have they reached human capabilities? This is actually a huge question right now . Question mark big question mark So, if that's the case, maybe our current training algorithm hasn't reached its optimal state, right? There might be better algorithms, better logic , and better ways to learn

927.9s large language models ,

930.6s have they reached human capabilities?

933.5s This is actually a huge question right now

935.4s . Question mark big question mark

938.2s So, if that's the case,

940.3s maybe our current training algorithm

942.8s hasn't reached its optimal state, right?

944.4s There might be better algorithms, better logic

946.9s , and better ways to learn

949.0s the representations that emerge from the data and use them to solve problems . Gradient descent might not be a particularly good solution. Maybe one day we won't use gradient descent anymore there might be other methods. This is just a wild guess, right? In that case, maybe our entire training framework might need to change

952.8s and use them to solve problems

954.7s . Gradient descent

956.0s might not be a particularly good solution.

958.4s Maybe one day we won't use gradient descent anymore

961.1s there might be other methods.

962.3s This is just a wild guess, right?

965.0s In that case, maybe our entire training framework

967.6s might need to change

969.6s . Of course, this might not happen now , but I think it might be an interesting direction to experiment with in the future. I've seen some debate in the industry recently about reinforcement learning, especially with Andrej Karpathy. He did a podcast interview and expressed some rather negative views. What do you think of the RL reinforcement learning route?

972.3s , but I think

973.0s it might be an interesting direction

975.6s to experiment with in the future.

977.0s I've seen some debate in the industry recently about reinforcement learning,

981.4s especially with Andrej Karpathy. He

983.4s did a podcast interview

984.9s and expressed some rather negative views.

988.6s What do you think of the RL reinforcement learning route?

990.7s I've been working in this area for a long time, and I also think that the good thing about RL reinforcement learning is that it's essentially a search process. So, you give it some difficult problems and let it search for them. The data you learn and the information you gain during the search process

993.4s and I also think

994.2s that the good thing about RL reinforcement learning is that

996.4s it's essentially a search process.

998.6s So, you give it some difficult problems

1000.9s and let it search for them.

1002.1s The data you learn and

1003.0s the information you gain

1005.2s during the search process

1007.0s are of higher quality than the data you were fed. It's like one person is supervising another person, for example, someone else is attending a lecture by a teacher, right? Attending a lecture by a teacher can be considered equivalent to being supervised. In the realm of supervised learning, some argue that one can solve problems independently without attending lectures

1011.5s It's like one person

1014.0s is supervising another person,

1015.2s for example, someone else is attending a lecture by a teacher, right? Attending a lecture

1019.5s by a teacher can be considered

1020.7s equivalent to being supervised.

1022.7s In

1024.0s the realm of supervised learning,

1025.8s some argue that one

1027.4s can solve problems

1028.8s independently without attending lectures

1031.4s . However, I believe the latter approach yields a more fundamental and problem-solving ability. Therefore, I think Reinforcement Learning RL is superior to Supervised Finite Soft SFT in this regard. Indeed, many articles demonstrate that Reinforcement Learning is stronger than SFT in many problems, especially inference . You need Reinforcement Learning to truly enable the model to learn reasoning.

1034.6s and problem-solving ability.

1036.6s Therefore, I think Reinforcement

1038.1s Learning RL is superior to Supervised Finite Soft SFT

1041.6s in this regard. Indeed, many articles demonstrate

1044.2s that Reinforcement Learning

1045.9s is stronger than SFT in many problems,

1048.9s especially inference

1050.1s . You need Reinforcement Learning

1051.7s to truly enable the model to learn reasoning.

1053.8s Supervised Finite Soft SFT might simply memorize previous reasoning processes , but it doesn't develop generalization ability. On new problems, its generalization ability might be weaker. Especially with extensive SFT, the model's quality may decline. This is the key difference between the two. However, Reinforcement Learning is merely a paradigm it doesn't involve any mysterious elements.

1055.1s might simply memorize

1056.4s previous reasoning processes

1058.2s , but it doesn't develop generalization ability.

1060.8s On new problems, its generalization ability might be weaker.

1064.1s Especially with extensive SFT,

1067.5s the model's quality may decline.

1070.4s This is the key difference between the two.

1075.4s However, Reinforcement Learning is merely a paradigm

1078.1s it doesn't involve any mysterious elements.

1081.2s Its ultimate goal is still to change weights, just like SFT , only the method of changing weights differs. Ultimately, perhaps a unified approach exists that can unify Reinforcement Learning and SFT. Reinforcement learning and Supervised Finite Fibre SFT, right? Unifying these things is because the ultimate goal is to change weights. Perhaps I have better methods for these problems.

1084.8s just like SFT

1085.9s , only the method of changing weights differs.

1088.2s Ultimately, perhaps a unified approach exists that can unify

1092.1s Reinforcement Learning and SFT. Reinforcement learning

1093.0s and Supervised Finite Fibre SFT, right?

1094.2s Unifying these things

1094.8s is because the ultimate goal is to change weights.

1098.5s Perhaps I have better methods for these problems.

1101.3s For most people, reinforcement learning is simply a different data acquisition method. It collects data while searching, puts the data together, and then trains it. This is essentially an active learning method, different from SFT. Therefore, I think the biggest advantage of reinforcement learning is that it's active learning

1103.0s is simply a different data acquisition method.

1106.1s It collects data while searching,

1108.7s puts the data together, and then trains it.

1112.5s This is essentially an active learning method,

1115.8s different from SFT.

1118.9s Therefore, I think the biggest advantage of reinforcement learning is

1121.0s that it's active learning

1122.4s it can have a very positive impact on the distribution of data. This is its core strength , not that its objective function or training algorithm is different . Ultimately, it depends on the data itself. The quality of the collected data is different from SFT. That's why it can solve some more difficult problems. Andrej Karpathy's previous points

1125.3s This is its core strength

1127.3s , not that its objective function

1132.6s or training algorithm is different

1134.5s .

1135.4s Ultimately, it depends on the data itself.

1137.2s The

1138.5s quality of the collected data

1141.1s is different from SFT.

1142.2s That's why it can solve some more difficult problems.

1144.6s Andrej Karpathy's previous points

1147.2s are actually quite good in some ways. The assertion that AGI Artificial General Intelligence is still 10 years away implies that we've entered an era measured in decades, not a world where AGI capabilities can be acquired immediately. I believe this. I myself have used GPT-5 before , and it helped me with a paper.

1150.8s The assertion that AGI Artificial General Intelligence is still 10 years away

1152.7s implies that we've entered

1154.1s an era measured in decades, not

1156.8s a world where AGI capabilities can be acquired immediately.

1159.8s I believe this.

1162.2s I myself have used GPT-5 before

1164.7s , and it helped me with a paper.

1167.4s My most recent paper was actually the result of self-play between GPT-5 and me. Essentially, I had no students , and I just talked to GPT-5 every day , telling it about problems I needed to solve and how we should develop research methods. It would provide a plan , but you'll find that without domain knowledge,

1170.5s the result of self-play between GPT-5 and me.

1172.7s Essentially, I had no students

1175.1s , and I just talked to GPT-5 every day

1179.2s , telling it about problems I needed to solve

1181.3s and how we should develop research methods.

1185.7s It would provide a plan

1187.6s , but you'll find

1188.8s that without domain knowledge,

1191.2s the plan you create is similar to others lacking innovation and originality. However, as a researcher, having a deep understanding of the problem , or knowing that the plan, its impact , or the way of thinking is flawed or has fatal problems , allows GPT-5 to delve deeper and ultimately achieve better results. So, this kind

1192.7s similar to others

1194.4s lacking innovation and originality.

1196.6s However, as a researcher,

1199.5s having

1201.5s a deep understanding of the problem

1203.2s , or knowing

1204.2s that the plan, its impact

1206.7s , or the way of thinking is flawed

1210.4s or has fatal problems

1212.3s , allows GPT-5 to delve deeper

1213.8s and ultimately achieve better results.

1216.5s So, this

1218.7s kind

1220.2s of high-level human insight ... Human knowledge and unique insights into the problem are what current models lack. You need these things to make the model stronger. So, to say that AGI lacks these things is not entirely accurate. It's still true that AGI will never achieve top-tier insight because insight will always be led by humans. Yes,

1223.5s ... Human knowledge

1224.9s and unique insights into the problem

1227.7s are what current models lack.

1230.4s You need these things

1231.9s to make the model stronger.

1234.0s So, to say that AGI lacks these things

1236.0s is not entirely accurate. It's still true that AGI will

1237.8s never achieve

1238.9s top-tier insight

1240.9s because insight will always be led by humans. Yes,

1242.5s that's the problem. I've mentioned this before , similar to the early days of autonomous driving . Initially, progress was very rapid, and people thought it would soon replace humans. But the further we go, the bigger the problems become. Why? Because good insights and good data are becoming increasingly scarce and difficult to find. With less data,

1244.1s this before , similar to

1247.2s the early days of autonomous driving

1248.8s . Initially, progress was very rapid,

1250.0s and people thought it would soon replace humans.

1252.1s But the further we go, the bigger the problems become.

1255.0s Why?

1255.4s Because good insights

1256.9s and good data are becoming increasingly scarce and difficult to find.

1259.0s With less data,

1259.7s your model can't be trained properly. Humans' ability to acquire and deeply mine data will always surpass that of computers currently, it surpasses all models. For the same problem , humans might only need one or two samples to... While we can see the essence , computers , or large models like today,

1261.7s Humans' ability to acquire

1265.5s and deeply mine data

1267.8s will always surpass that of computers

1269.0s currently, it

1270.0s surpasses all models. For

1271.3s the same problem

1272.1s , humans might only need one or two samples to... While we can see the essence

1275.3s , computers

1276.3s , or large models like today,

1280.3s need at least hundreds or thousands of samples to roughly grasp a contour. Pre-training may require even more samples . In this situation, if the number of samples is insufficient, humans will always be better than current large models, especially experts in specific fields. They cannot , and they themselves cannot, present the samples they have learned to the computer

1284.0s to roughly grasp a contour.

1287.2s Pre-training may require even more samples

1289.8s . In this situation,

1291.8s if the number of samples is insufficient,

1293.2s humans will always be better than current large models,

1297.1s especially experts in specific fields.

1299.7s They cannot

1300.8s , and they themselves cannot,

1302.2s present the samples they have learned to the computer

1305.2s because these samples are their experience in their minds , which is difficult to quantify into sentences. If this is the case, AI can only forever follow behind humans. Humans gain insights through some better information processing methods and then feed them to computers and AI to make AI perform better in this direction. This is the current state.

1306.7s are their experience

1308.0s in their minds , which is difficult to quantify into sentences.

1312.2s If this is the case,

1313.2s AI can only forever follow behind humans.

1316.3s Humans gain insights

1320.2s through some better information processing methods

1321.3s and then feed them to computers and AI

1323.1s to make AI perform better in this direction.

1327.1s This is the current state.

1329.0s So I think this is quite close to some of my previous arguments. I have also been interviewed before and said that the Scaling Law is a pessimistic future. The Scaling Law is, frankly, a very strange topic. In the past, if we told people that adding exponential samples or exponential computing power would increase our performance linearly,

1333.0s is

1334.6s quite close

1335.9s to some of my previous arguments.

1338.6s I have also been interviewed before

1339.9s and said that the Scaling Law is a pessimistic future.

1344.3s The Scaling Law is, frankly,

1346.6s a very strange topic.

1347.6s In the past, if we told people

1350.2s that adding exponential samples

1353.1s or exponential computing power

1354.5s would increase our performance

1356.6s linearly,

1359.2s I think that previous machine learning... Machine learning scientists might consider these things trivial because, regardless of the model, you can conclude that simply feeding in more data will yield better results . But I think what we should truly pursue is a model that can move more efficiently, effectively, and quickly along this path,

1359.9s machine learning... Machine learning scientists

1361.1s might consider these things trivial

1363.8s because, regardless of the model,

1364.8s you can conclude

1367.6s that simply feeding in more data

1368.7s will yield better results . But I think what we should truly pursue is

1372.7s a model

1374.2s that can

1376.3s move more efficiently, effectively, and quickly

1379.0s along this path,

1382.6s rather than simply being satisfied with this law. That's correct , because this law leads to a rather pessimistic future meaning you need to feed in exponentially more samples to get a decent result. If that's the case, one day all of Earth's resources will be exhausted, and all of Earth's energy and electricity

1385.5s , because this law leads

1388.9s to a rather pessimistic future

1390.6s meaning you need to feed in exponentially more samples

1394.1s to get a decent result.

1396.2s If that's the case, one day

1398.2s all of Earth's resources will be exhausted, and

1401.0s all of Earth's energy and electricity

1404.4s will be used to train large models. In that situation, will we still rely on this ability to change our world? That's a huge question. I think at some point, people will realize that computational power isn't everything we might need a deeper understanding of models. I think this change will gradually happen.

1407.5s In that situation,

1408.6s will we still rely on this ability

1414.3s to change our world?

1416.6s That's a huge question.

1417.6s I think at some point,

1422.3s people will realize that computational power isn't everything

1425.5s we might need a deeper understanding of models.

1428.4s I think

1429.3s this change will gradually happen.

1432.5s That's one of my thoughts. Yes , but we need a more efficient way to develop intelligence . But do you think it will take a long time to find this solution? I think everyone is working on it. So it will take some time to do these things, at least for now. Let's talk about

1435.1s , but we need a

1436.3s more efficient way to develop intelligence

1439.4s . But

1441.1s do you think it will take a long time

1442.7s to find this solution?

1445.3s I think everyone is working on it.

1447.2s So it will take some time to do these things,

1449.8s at least for now. Let's talk about

1451.1s large language models. Their capabilities are incredibly strong. Even if our model's capabilities stagnate now, its impact on various industries is still enormous. I think it can automate a large part of things and enhance the capabilities of many people. I feel that my understanding of large language models has far surpassed my previous abilities. This makes me feel there's

1454.1s Even if our model's capabilities stagnate now,

1457.8s its impact on various industries is still enormous.

1461.6s I think it can automate a large part of things

1465.6s and enhance the capabilities of many people.

1468.2s I feel that my understanding of large language models

1471.0s has far surpassed my previous abilities.

1473.4s This makes me feel there's

1475.4s a lot of room for development in this area , which is a major realization for me. I believe this marks the arrival of a new era. So even if the progress of large language models isn't rapid, I think there will be many opportunities in the next two to three years . So, if you still

1477.7s , which is a major realization for me.

1481.5s I believe this marks the arrival of a new era.

1486.2s So even if the progress of large language models isn't rapid,

1490.5s I think there will be many opportunities

1492.4s in the

1493.7s next two to three

1497.3s years . So, if you still

1501.2s want to do cutting-edge research or try application development, it would be best to combine both. If I could do cutting-edge research that is automated, that would be amazing, right? I already feel that my research paradigm might be partially replaced by automated pipelines. You mean agents? Not necessarily agents , but agents are definitely a very important factor.

1505.1s it would be best to combine both.

1507.4s If I could do

1509.3s cutting-edge research that is automated,

1512.0s that would be amazing, right?

1513.5s I already feel that my research paradigm

1516.3s might be

1520.0s partially replaced by automated pipelines.

1521.8s You mean agents?

1523.6s Not necessarily agents

1524.4s , but agents are definitely a very important factor.

1527.4s Using agents can help you do many things. For example, you might not need to reply to emails yourself or manage your to-do list. Lists to-do items , or tasks that you don't need to do yourself automation can be done by computers. This is definitely going to happen . But the more important question is

1528.3s can help you do many things.

1530.1s For example, you might not need to reply to emails yourself

1532.5s or manage your

1537.7s to-do list.

1539.1s Lists to-do items , or tasks that you don't need to do yourself

1542.7s automation can be

1544.7s done by computers.

1547.6s This is

1548.7s definitely going to happen

1550.1s . But the more important question is

1551.8s whether AI can replace humans in some advanced activities. This is a more complex issue, especially considering the challenges of advanced human thought processes. The key is the need for human insights. To what extent can AI help solve many difficult scientific problems? We don't yet know if AI can accomplish this . If it can,

1555.9s This is a

1557.4s more complex issue, especially considering the challenges of advanced human thought processes.

1560.6s The key is the need for human

1563.5s insights.

1564.7s To what extent

1566.3s can AI help solve

1569.5s many difficult scientific problems?

1572.6s We don't yet know if

1575.4s AI can accomplish this

1578.1s .

1579.2s If it can,

1579.9s it could, in turn, impact my research. From a research perspective, I might become a super researcher. With the addition of AI, I can conduct better research, and these tools can also benefit other things. That would be very interesting. Before you were pulled in to help with Llama 4, what were you researching? We were doing some research

1582.6s From a research perspective,

1583.6s I might become a super researcher.

1584.9s With the addition of AI,

1587.1s I can conduct better research,

1588.7s and

1589.2s these tools can also benefit other things.

1593.4s That

1594.1s would be very interesting.

1597.8s Before you were pulled in to help with Llama 4,

1601.2s what were you researching?

1603.3s We were doing

1605.0s some research

1606.9s on reasoning, mainly on thought chains, their forms, and training methods. Before O1 came out last September , we noticed that very long thought chains affect the scaling law of the model. If you don't have many long thought chains , the scaling law isn't ideal you need many samples to get a good result.

1609.6s mainly on

1613.0s thought chains, their forms, and training methods.

1614.0s Before

1615.8s O1 came out last September

1617.0s , we noticed that

1619.7s very long thought chains

1620.9s affect the scaling law of the model.

1625.1s If you don't have many

1626.5s long thought chains

1627.6s , the scaling law isn't ideal

1630.8s you need many samples

1632.3s to get a good result.

1635.0s But with long thought chains, the model's scaling will be affected. The code of the scaling law becomes very ideal. I can get better results with, for example, one-tenth of the samples, and one-tenth of the parameters. It's something like that. We've actually discovered this , but then we 're doing all sorts of transformations and explorations

1638.2s the model's scaling will be affected. The code of the scaling law

1641.2s becomes very ideal. I can

1643.2s get better results

1644.7s with, for example, one-tenth of the

1646.8s samples, and one-tenth of the parameters.

1650.1s It's

1651.2s something like that.

1653.4s We've actually discovered this

1654.5s , but then we 're doing all sorts of transformations

1656.7s and explorations

1658.5s on the thought chain, right? Including our recent work at the end of last year, the continuous thought chain , which uses continuous space for latent space inference. This paper has indeed received a lot of attention, probably over 200 citations in just six months, and many people are willing to follow it. We 've been doing some exploratory work

1661.5s Including

1662.6s our recent

1663.4s work at the end of last year, the continuous thought chain

1666.6s , which uses continuous space for latent space inference.

1670.0s This paper has indeed received a lot of attention,

1673.6s probably over 200 citations in just six months,

1676.4s and many people are willing to follow it. We

1678.8s 've been doing some exploratory work

1679.8s and have seen some progress, so I think these things are very interesting . Last year, we also published a paper called Dualformer, which was one of the earliest to propose how to create hybrid mental models how to train long-term and short-term thinking together. We found that this model is actually

1683.1s so I think these things are very interesting

1685.8s . Last year, we also published

1687.1s a paper called Dualformer,

1689.1s which was one of the earliest to propose

1690.8s how to create hybrid mental models

1693.2s how to train long-term and short-term thinking together.

1696.9s We found that this model is actually

1699.4s more effective than simply training long-term or short-term thinking. Now, this has become standard practice all mental models have this adaptive property of combining long and short-term thinking . So, last year's research was quite up-to-date. Do you have any regrets about FAIR? That's an interesting question. I think my regret might be this I should have done more

1701.2s Now, this has become standard practice

1703.3s all mental models have this

1706.8s adaptive property of

1708.9s combining long and short-term thinking . So, last year's research

1712.3s was quite up-to-date.

1713.8s Do you have any regrets about FAIR?

1717.4s That's an interesting question.

1720.8s I think my regret might be this

1724.6s I should have done more

1728.7s engineering work at FAIR. Actually, when I first joined FAIR , in the first few years, I did a lot of engineering work. For some of my previous projects, like Go, I did a lot of engineering work myself. At the time, I was even criticized for coming here as a research person. The scientist research scientist

1730.6s Actually, when I first joined FAIR

1732.6s , in the first few years, I did a lot of engineering work.

1736.3s For some of my previous projects,

1738.5s like Go,

1739.4s I did a lot of engineering work myself.

1742.1s At

1746.3s the time, I was even criticized for

1748.4s coming here as a research person. The scientist research scientist

1750.2s who was always doing engineering told me that while others' screens were full of articles , mine were full of code. I was criticized like that, so I said, Okay, if research scientists can't do engineering, then I'll read more code and more articles . So, you'll find that from 2015 to 2018, I was mostly doing engineering,

1751.8s told me that while

1753.4s others' screens were full of articles

1756.4s , mine were full of code.

1758.2s I was criticized like that,

1760.8s so I said,

1761.8s Okay, if research scientists can't do engineering,

1764.7s then I'll read more code and more articles

1767.6s . So, you'll find that

1768.7s from 2015 to 2018, I

1772.2s was mostly doing engineering,

1774.6s and from 2018 until now, I've been doing more research. That's roughly the pattern. This is certainly related to the FAIR Fair policy at the time , and also because I had some research interests and wanted to do more research, so I switched to that approach. But now you'll find that in this era,

1777.0s until now, I've been doing more research.

1780.3s That's roughly the pattern.

1781.9s This is certainly related to the FAIR Fair policy at the time

1787.7s , and also because

1788.8s I had some research interests

1790.6s and wanted to do more research,

1792.7s so I switched to that approach.

1794.0s But now you'll find

1796.1s that in this era,

1798.5s people with strong engineering skills are more sought after, right? So it's interesting that people with strong research skills are also popular , but ideally, they should have both strong engineering and research skills that's extremely difficult. But I think I can achieve that, so I'm doing more engineering work

1801.4s So it's interesting

1803.3s that people with strong research skills are also popular

1806.5s , but ideally, they should have both strong engineering and research skills

1810.2s that's

1811.3s extremely difficult.

1812.5s But I think I can achieve that,

1814.7s so

1816.6s I'm

1819.1s doing more engineering work

1821.2s now . I can pick up a lot of things again and do these engineering things well. I think my biggest gain from FAIR was after 2018 I've had a lot of research during that period. Research taste refers to an appreciation for research and an understanding of research methods. This appreciation can be gradually developed , and it's

1823.6s and do these engineering things well.

1824.8s I think my biggest gain from FAIR

1827.2s was after 2018

1829.5s I've had a lot of research

1831.3s during that period. Research taste

1833.6s refers to an appreciation for research

1835.4s and an understanding of research methods.

1838.0s This appreciation

1839.1s can be gradually developed , and it's

1842.2s become increasingly apparent in recent years' publications. Therefore, having research taste is very helpful for one's future career path. This is crucial because a person who only does engineering has a significant problem they might only tackle difficult engineering problems without understanding their applications. However, having research taste means setting a path for oneself that can be continuously advanced.

1844.4s in recent years' publications.

1846.0s Therefore, having research taste

1848.6s is very helpful

1850.2s for one's future career path.

1852.1s This is crucial

1853.8s because a person who only does engineering

1855.0s has a significant problem

1857.4s they might only tackle difficult engineering problems

1861.0s without understanding their applications.

1863.7s However, having research taste

1865.6s means setting a path for oneself

1867.4s that can be continuously advanced.

1869.0s This is extremely beneficial for one's life. Yes, I have another question I'm very curious about. Given the fierce competition in AI among companies and the intense talent war including Meta's latest lab , which spends a lot of money on a single person what kind of AI talent do you think is most scarce at this stage

1870.7s for one's life. Yes,

1874.0s I have another question I'm very curious about.

1877.3s Given the fierce competition in AI among companies

1881.8s and the intense talent war

1884.2s including Meta's latest lab

1886.2s , which spends a lot of money on a single person

1889.9s what kind of AI talent

1892.3s do you think is most scarce at this stage

1895.6s ? I think it completely depends on each person's positioning. First, I want to correct a point don't think about the present... Who is the most scarce? Because the definition of scarcity might change in a couple of years, right? So, think about Yann LeCun sitting on the sidelines for so many years and then suddenly winning the Turing Award

1902.2s First, I want to correct a point

1903.6s don't think about the present... Who is the most scarce? Because

1907.2s the definition of scarcity might change

1908.9s in a couple of years,

1911.0s right? So, think about Yann LeCun sitting on the sidelines for so many years

1914.2s and then suddenly winning the Turing Award

1917.2s . So I think everyone should think about what they truly want to do, rather than doing what companies might like. I think that's more important because the whole process is different now. In the past, the market would send a signal saying what kind of talent we needed. This signal

1921.2s about what they truly want to do,

1924.5s rather than doing what companies might like.

1930.4s I think that's more important

1932.0s because the

1933.7s whole process is different now.

1936.8s In the past,

1940.1s the market would send a signal

1942.5s saying what kind of talent we needed. This signal

1945.4s would then spread through universities, saying what kind of talent would be most sought after in the next ten years . Universities would then expand enrollment in the corresponding departments and hire more professors. Students would apply to those departments , and after four or more years of training, these students would finally meet the market's requirements. That's roughly how

1947.8s spread

1950.2s through universities,

1952.3s saying what kind of talent would be most sought after in the next ten years

1954.9s . Universities would then expand enrollment in the corresponding departments

1958.2s and hire more professors.

1960.1s Students would apply to those departments

1962.2s , and after four or more years of training,

1965.4s these students would finally meet the market's requirements.

1969.3s That's roughly how

1970.8s it worked before because the whole logic and speed were relatively slow, right? The industry cycle might have been... For example, the fluctuations used to occur over 10 or 20 years, so this process was possible. But now, the entire cycle might be very fast . By the time you want to learn a hot technology in the market,

1972.0s because the whole logic and speed were

1975.9s relatively slow, right?

1978.1s The industry cycle might have been...

1979.4s For example, the fluctuations used to occur over 10 or 20 years,

1982.3s so this process was possible.

1984.7s But now, the entire cycle might be very fast

1988.5s . By the time you want to learn a hot technology in the market,

1992.8s everyone in the world is learning it, right? You've thought of it, and others have thought of it too, right? Everyone in the world is learning it. There will always be someone who learns faster than you, someone who learns better than you, and someone who can immediately get started and make things work. So you might find that

1994.8s You've thought of it, and others have thought of it too, right?

1996.9s Everyone in the world is learning

1998.0s it. There will always be someone who learns faster than you,

2000.3s someone who learns better than you, and someone who

2001.8s can immediately get started and make things work.

2004.0s So you might find that

2005.5s after studying for half a year or a year, you can't compete with others, and you still can't stand out. In this case, the market has changed. Maybe next year won't be the era where this particular skill is most important. Maybe something else has taken its place. If you start learning then,

2008.2s you can't compete with others,

2009.3s and you still can't stand out.

2011.9s In this case,

2012.7s the market has changed.

2013.5s Maybe next year won't be

2017.9s the era where this particular skill is most important.

2020.6s Maybe something else has taken its place.

2022.4s If you start learning then,

2023.2s you might always be following in others' footsteps. So maybe in the future, everyone will suddenly realize that instead of following the market's orders, it's better to do what you want to do. You 'll be happy doing it, and also, once this thing is discovered ... The benefits are huge, of course, that's the ideal situation, right?

2026.5s So maybe in the future, everyone will suddenly realize

2029.1s that instead of following the market's orders,

2030.6s it's better to do what you want to do. You

2033.2s 'll be happy doing

2034.4s it, and also, once this thing is discovered

2038.0s ... The benefits are huge,

2040.4s of course, that's the ideal situation, right?

2042.0s Because in reality, you definitely need to combine both sides. You 'll definitely want to judge for yourself whether this thing will be useful in the future, plus your own interests. Finally, you can put more effort into it after combining the two. Yes , that's roughly it. So it's very difficult to make a judgment

2044.4s You 'll definitely want

2046.3s to judge for yourself whether this thing will be useful in the future,

2049.8s plus your own interests.

2053.0s Finally,

2054.3s you can put more effort into it after

2057.0s combining the two. Yes

2059.9s , that's roughly it.

2060.6s So it's very difficult to make a judgment

2066.0s because it completely depends on your own ability. I feel you are a very idealistic person. Yes, and I feel that FAIR was a very idealistic team before , as we talked about in the last podcast. But you feel that the market is a bit distorted now because when the competition is particularly fierce, many cultures and beliefs

2068.1s I feel you are a very idealistic person.

2070.8s Yes, and I feel that FAIR was a

2073.4s very idealistic team

2075.2s before

2076.5s , as we talked about in the last podcast.

2078.6s But you feel that

2080.6s the market is a bit distorted now

2082.0s because when the competition is particularly fierce,

2084.0s many cultures and beliefs

2086.5s may deviate. Do you think that in the current situation, there are still relatively idealistic research labs? Maybe Ilya Sutskever's team or Mira's team are considered relatively idealistic. Their counterparts are Sam... Altman is very commercial and aggressive. How do you view this balance? I think firstly,

2088.8s you think that in the current situation,

2091.5s there are still

2092.8s relatively idealistic research labs?

2096.6s Maybe Ilya Sutskever's team

2100.3s or Mira's team

2102.7s are considered relatively idealistic.

2106.1s Their counterparts are Sam... Altman

2108.2s is very commercial and aggressive.

2110.2s How do you view this balance?

2112.9s I think firstly,

2114.3s you shouldn't treat large companies as monolithic. In fact, there are many groups , and many of these groups have research teams. These teams themselves have a research spirit and research freedom. This will always exist. Fair is just a very famous and well-known place. But there are many places that are not as famous as Fair

2116.9s In fact, there are many groups

2117.8s , and many of these groups have research teams.

2120.7s These teams themselves

2121.4s have

2123.6s a research spirit and research freedom.

2124.6s This will always exist.

2126.4s Fair is just a very famous and

2129.5s well-known place.

2131.8s But there are many places

2134.7s that are not as famous as Fair

2137.5s , but they also have a free space to do research. Even within Meta, there are many groups that have space to do research. I have many collaborators in Meta who also do some research. I don't think this is a problem. Maybe Fair might not be as research-oriented in the future because of this or other reasons, right?

2139.9s to do research.

2141.0s Even

2142.3s within Meta, there are many groups

2143.9s that have space to do research.

2145.7s I have many collaborators in Meta

2147.4s who also do some research.

2149.1s I don't think this is a problem.

2152.8s Maybe Fair might

2155.5s not be as research-oriented

2158.1s in the future because of this or other reasons, right?

2160.9s But there will still be many places where you can do research. Even when you are a startup, you might find that the problem is very cutting-edge, so there will definitely be things you can do there. Because when we talk about research, we mean the process itself is to find new solutions to difficult problems. That's called research,

2163.7s Even when you are a startup,

2165.8s you might find

2167.9s that the problem is very cutting-edge,

2168.9s so there will definitely be things you can do there.

2171.0s Because when we talk about research,

2172.2s we mean the process itself

2174.2s is to find new

2177.5s solutions to difficult problems.

2180.0s That's called research,

2181.1s or re-search. Right? Actually, it's about research exploration , so it's not an abstract concept . I think there are many areas where it can be done. It's not a monolithic concept it's not that big companies can't do it, but small companies can. It's not that simple . It completely depends on which group, which person

2182.3s Actually, it's about research exploration

2183.9s , so it's not an abstract concept

2186.9s . I think there are many areas where it can be done.

2189.3s It's not a monolithic concept it's not that big companies can't do it, but small companies can.

2192.6s It's not that simple

2193.6s . It completely depends on which group, which person

2196.9s , what resources, what kind of things, and what kind of chemical reaction will occur when these people come together. Maybe it can be done today but not tomorrow , or maybe there's room for it for a period of time, but not at other times. So countless people are thinking about this problem , and maybe

2199.3s what kind of chemical reaction will

2200.5s occur when these people come together.

2202.3s Maybe it can be done today but not tomorrow

2204.0s , or maybe there's

2206.6s room for it for a period of time, but not at other times.

2209.1s So countless people are thinking about this problem , and maybe

2211.6s a new work will definitely emerge during this period, influencing the entire field. So research will always continue, it's just that its form may become more like guerrilla warfare. It's not that some very famous research institutions will do research , saying, I'll dedicate all our time and energy to research. Maybe not.

2213.1s during this period,

2214.8s influencing the entire field.

2217.2s So research will always continue,

2220.3s it's just that its form may become

2223.4s more like guerrilla warfare.

2225.5s It's not that some very famous

2228.3s research institutions

2229.3s will do research

2231.4s , saying,

2232.0s I'll dedicate all our time and energy to research.

2234.2s Maybe not.

2235.1s But you will always find many idealistic people and small organizations continuing to do what they want to do. It's roughly like this process. Yes, it's not 0 or 1 there will be many gray areas. The last question is, what is your next step? As I just said, the next step is not yet determined,

2239.1s and small organizations

2242.4s continuing to do what they want to do.

2244.5s It's roughly like this process. Yes,

2245.9s it's not 0 or 1

2247.4s there will be many gray areas.

2249.7s The last question

2250.6s is, what is your next step?

2252.5s As I just said, the next step is not yet determined,

2254.7s so it's still under discussion. Because it hasn't been a week since I was laid off, so... There are some considerations and ideas. The question you just asked was whether I want to work on applications or continue my scientific research, right? My answer is, of course, it's best to combine both.

2257.2s Because it hasn't been a week since I was laid off,

2261.4s so... There are some considerations and ideas. The

2266.1s question you just asked was whether I want to work on applications

2269.0s or continue my scientific research, right?

2273.2s My answer is, of course, it's best to combine both.

2275.3s We want to find a way to empower my scientific research while also being able to do many other things. Does such an opportunity exist? I don't know, but generally speaking , we set a high goal first and then look at the options. Because generally, people are more realistic

2279.7s while also being able to do many other things.

2282.3s Does such an

2284.2s opportunity exist?

2287.5s I don't know, but generally speaking

2289.4s , we set a high goal first

2291.2s and then look at

2292.2s the options.

2293.8s Because generally, people are more realistic

2295.1s they think, If such an opportunity exists, I don't need to think about it. But actually, it should be the other way around. First, think of an impossible goal, and then think about what can support it. This might give you a better direction to take. Okay, then we look forward to your next announcement. Okay, that

2296.5s But actually, it should be the other way around.

2298.0s First, think of an impossible goal,

2300.2s and then think about what can support it.

2303.1s This might give you a better direction to take.

2307.2s Okay,

2309.0s then we look forward

2310.6s to your next announcement.

2312.8s Okay, that

2314.4s concludes our interview with Tian Yuandong. We also look forward to his next move. I sincerely hope he can find a new role that balances cutting-edge research and engineering applications. I think this is the path that cutting-edge AI engineers are exploring. Good luck to him! Do you think such AI work exists?

2317.0s We also look forward to his next move.

2319.2s I sincerely hope he can find

2322.3s a new role that balances cutting-edge research and engineering applications.

2326.2s I think this is the path that cutting-edge AI engineers

2328.5s are exploring.

2330.0s Good luck to him!

2331.4s Do you think such AI work exists?

2333.6s Welcome to leave us comments, share, and like! Your support is the best motivation for Silicon Valley 101 to produce in-depth technology and business content. See you in the next video! Bye!

2336.4s Your support is

2338.3s the best motivation for Silicon Valley 101 to produce in-depth technology and business content.

2341.5s See you in the next video! Bye!