0.0s
I actually had an offer before I was laid off. I'd been with the company for over ten years , so maybe this was the perfect opportunity to step out and see what the future holds. Do you think LLM Large Language Modeling is the right path? I think LLM is a very interesting path,
2.3s
I'd been with the company for over ten years
3.4s
, so maybe this was the perfect opportunity
5.7s
to step out and see what the future
6.8s
holds. Do you think LLM Large Language Modeling is the right path?
9.5s
I think LLM is a very interesting path,
12.1s
but I don't know if it's the right one. Scaling Law is a pessimistic future because, frankly, the topic of Scaling Law itself is quite strange. So, what's the biggest problem with large language models right now? The biggest problem is that they require a lot of data. It 's the same as with autonomous driving before.
13.7s
Scaling Law is a pessimistic future
15.4s
because, frankly, the topic
17.2s
of Scaling Law
18.0s
itself is quite strange.
21.0s
So, what's the biggest problem with large language models right now? The
24.2s
biggest problem is that they require a lot of data. It
26.6s
's the same as with autonomous driving before.
28.0s
Initially, progress was very fast, and everyone thought it would soon replace humans, but the further you go, the bigger the problems become. Why? Because good insight and good data are becoming increasingly scarce and difficult to find. With less and less data , your model can't be trained. What do you think of the RL Reinforcement Learning path?
29.3s
and everyone thought it would soon replace humans,
31.1s
but the further you go,
33.2s
the bigger the problems
33.9s
become. Why?
34.3s
Because good insight and
35.9s
good data are becoming increasingly
36.8s
scarce and difficult to find.
37.5s
With less and less data
38.3s
, your model can't be trained.
40.0s
What do you think of the RL Reinforcement Learning path?
42.0s
The biggest advantage of reinforcement learning is that it's active learning it can have a very positive impact on the distribution of data. This is its core. Do you have any regrets about FAIR? I should have done more in FAIR's engineering work, maybe even better. My biggest gain should be after 2018.
44.8s
it can have a very positive impact on the distribution of data.
47.8s
This is its core.
49.3s
Do you have any regrets about FAIR?
51.0s
I should have done more in FAIR's engineering work,
53.4s
maybe even better.
54.2s
My biggest gain should be after 2018.
57.3s
I should have had a lot of research during this period. If you have a taste for research , it means setting a path for yourself that you can keep moving forward from. What's your next step? Hello everyone, welcome to Silicon Valley 101, I'm Chen Qian. On October 22, 2025, Meta CEO Mark Zuckerberg approved
59.0s
during this period.
61.0s
If you have a taste for research ,
62.3s
it means setting a path for yourself
64.3s
that you can keep moving forward from.
65.2s
What's your next step? Hello
69.1s
everyone,
69.8s
welcome to Silicon Valley 101,
71.2s
I'm Chen Qian.
72.3s
On October 22, 2025,
74.0s
Meta CEO Mark Zuckerberg
75.8s
approved
78.6s
a plan to lay off approximately 600 employees from the company's artificial intelligence division . This is Meta's largest layoff in the AI field this year, mainly targeting the core RD department known as the Super Intelligence Lab. So why is Meta carrying out this layoff? How did the company's open-source AI approach encounter obstacles,
81.5s
from the company's artificial intelligence division
83.5s
. This is Meta's largest layoff
85.5s
in the AI field this year, mainly targeting
87.2s
the core RD department known as the Super Intelligence Lab.
90.1s
So why is Meta carrying out this layoff?
92.4s
How did the company's open-source AI approach encounter obstacles,
95.2s
and what about the new AI head parachuted in, Alex? We discussed how Wang will reshape Meta's AI strategy in the previous episode, which you can find on our homepage. We also interviewed Tian Yuandong, a key figure in the recent layoffs, former FAIR Research Director and AI scientist . Our interview covered more than just Meta
98.2s
will reshape Meta's AI strategy
100.7s
in the previous episode, which
103.8s
you can find on our homepage.
106.5s
We also interviewed
107.9s
Tian Yuandong, a key figure in the recent layoffs,
111.1s
former FAIR Research Director
113.2s
and AI scientist
115.7s
. Our interview covered more than just Meta
118.2s
I think what's more interesting and valuable is the reflection of these senior AI scientists on AI roadmaps and future cutting-edge research, beyond the company level. So, in this video, I'm sharing the full interview. This version has removed the repetition from the previous video and focuses more on AI development itself,
120.3s
is the reflection
122.3s
of these senior AI scientists
124.2s
on AI roadmaps and future cutting-edge research, beyond the company level.
127.6s
So, in this video,
128.6s
I'm sharing the full interview.
132.5s
This version
133.5s
has removed the repetition from the previous video and
136.6s
focuses more on AI development itself,
139.3s
especially the LLM roadmap for large language models, the existence of openclosed source research labs , and the choices AI talent makes between RD and engineering. I hope this is helpful. Here's the interview content You're still wearing that FAIR uniform . I think generally, people like us don't care much about clothing, right?
143.6s
research labs
145.0s
, and the choices
147.8s
AI talent makes between RD and engineering.
149.0s
I hope this is helpful.
150.7s
Here's the interview content
153.8s
You're still wearing that FAIR uniform
157.6s
. I think
159.0s
generally, people like us
160.3s
don't care much about clothing, right?
162.5s
So we wear whatever the company provides, maybe even change. How have the past few days been for you? I know many people are actually here to reach out to... You contact me Yes, and then whether it's the media or many companies, they all came to you. What was your mindset? I think it was like this
165.9s
maybe even change.
167.8s
How have the past few days been for you?
169.8s
I know many people are actually here to reach out to... You contact me
172.4s
Yes, and then whether it's the media
174.4s
or many companies, they
177.3s
all came to you.
178.5s
What was your mindset?
181.3s
I think it was like this
182.4s
because I actually already had an offer before I was laid off . Before I was laid off, I had already told my superiors that I wasn't very happy and that I might want to look around . They knew that, so I wasn't particularly surprised by the layoff. It didn't matter, since I had an offer anyway.
187.0s
. Before I was laid off,
188.2s
I had already told my superiors
191.5s
that I wasn't very happy and
193.1s
that I might want to look around
195.1s
. They knew
196.3s
that, so I wasn't particularly surprised by
200.7s
the layoff. It
201.6s
didn't matter, since I had an offer anyway.
203.4s
I had told them before that , of course, after receiving the offer, I thought I would stay at Meta for a while longer because I still have GPU computing power, right? I can still do some more things. But since they laid me off, well, that's it, right? So, in short,
204.9s
, of course, after receiving the offer,
207.1s
I thought I would stay at Meta for a while longer
209.0s
because I still have GPU computing power, right?
210.7s
I can still do some more things.
212.1s
But since they laid me off,
213.4s
well, that's it, right?
217.1s
So, in short,
218.8s
those two years... I've received a lot of contact from people, including many from large companies, and many chatting with me about job opportunities. I've contacted almost every company you can think of , and they've all been at a high level. There are also many smaller companies and co-founding opportunities. So, there are many opportunities.
221.6s
including many from large companies,
223.7s
and
224.4s
many
226.3s
chatting with me
228.1s
about
229.3s
job opportunities.
230.3s
I've contacted almost every company you can think of
232.8s
, and they've all been at a high level. There are also
234.5s
many smaller companies
235.4s
and co-founding opportunities.
236.8s
So, there are
238.3s
many opportunities.
240.7s
Right now, I'm still thinking about it and haven't decided yet. But since it's less than a week , less than 168 hours , before the layoffs , I still need to think about it. Was the layoff something you expected? Did you sense it was coming? Otherwise, I wouldn't be looking for a job.
243.1s
and haven't decided
243.8s
yet.
246.7s
But since it's less than a week , less than 168 hours
249.3s
, before the layoffs
251.8s
, I still need to think about
253.9s
it.
255.0s
Was the layoff something you expected? Did
257.4s
you sense it was coming?
262.1s
Otherwise, I wouldn't be looking for a job.
263.8s
So, I have some... I feel that , personally, I think this place, at some point in time , is a good opportunity for me to leave and see the world, at least for me, since I've been with the company for over ten years. As for the situation within the company,
265.4s
, personally, I think this place,
267.8s
at some point in time
270.1s
, is a
271.3s
good opportunity
272.3s
for me
272.7s
to leave and see
273.8s
the world, at least for me,
276.2s
since
277.8s
I've been with the company for over ten years.
279.3s
As for the situation within the company,
282.7s
I'm not in a position to comment right now , but it's a personal choice , and this round of layoffs has accelerated that decision. I might have stayed with the company a little longer , maybe another six months , and then reconsidered. But since I've already left, I've left. I think laying off 600 people is quite shocking
285.2s
, but it's a personal choice
287.7s
, and this round of layoffs
288.9s
has accelerated that decision.
291.5s
I might have stayed with the company a little longer
295.0s
,
296.0s
maybe another six months
296.9s
, and then reconsidered.
297.9s
But since
301.4s
I've already left,
302.5s
I've left.
304.5s
I think laying off 600 people
306.6s
is quite shocking
307.9s
I felt it was a lot, even though it wasn't a complete layoff , just that some... The opportunity to transfer to other groups is just that your AI department feels there's no need for so many positions here, and the department needs to be restructured. I think we should actually talk about industry trends.
308.8s
felt it was a lot,
310.0s
even though it wasn't a complete layoff
311.8s
, just that some... The opportunity to transfer to other groups is
314.4s
just that your AI department feels
317.8s
there's no need for so many positions here, and
319.9s
the department needs to be restructured.
322.5s
I think we should actually
324.3s
talk about industry
326.2s
trends.
326.7s
We won't go into the specifics of the recent meta-analysis, because I can't reveal too much . I think the industry trend is definitely that because AI itself has the highest degree of automation, today we have many people labeling data, but tomorrow the model might be stronger and we won't need so many people labeling data,
329.0s
because I can't reveal too much
331.2s
. I think the industry trend is definitely
333.5s
that because AI itself has the highest degree of automation,
338.4s
today we have many people labeling data,
340.2s
but tomorrow the model might be stronger
341.5s
and we won't need so many people labeling data,
343.1s
and the day after tomorrow the model will be even stronger and we'll need fewer people. And in the past, I've heard all sorts of stories, though I haven't experienced it myself. For example, there used to be on- call systems where if the model crashed halfway through transmission, you could call back and they'd immediately fix it, adjust parameters
344.4s
and we'll need fewer people. And
346.0s
in the past,
348.2s
I've heard all sorts of stories,
349.7s
though I haven't experienced it myself.
350.8s
For example, there used to be on-
353.1s
call systems where if the model crashed halfway through
356.2s
transmission, you could
357.4s
call back
359.4s
and they'd immediately fix it, adjust parameters
361.4s
, and see if they could recover it. But now, because there are many automated tools and the whole system is well-designed, these kinds of things have become much less common. So, you can believe that... Then, as various pipelines project processes gradually mature and become automated, do you think a large number of people are needed? Not necessarily.
363.4s
But now, because there are many automated tools
366.0s
and the whole system is well-designed,
368.2s
these kinds of things have become much less common.
372.5s
So, you can believe that...
375.1s
Then, as various pipelines project processes
376.8s
gradually mature and become automated,
379.4s
do you think a large number of people are needed?
381.0s
Not necessarily.
381.9s
So I think the general trend is that fewer and fewer people will be laid off , or that fewer and fewer people will be doing this kind of work. So you think this round of layoffs isn't just a problem with Meta, but rather a general trend where more and more engineers, or those working in AI, will be
386.1s
, or that fewer and fewer people will be doing this kind of work.
388.9s
So
391.6s
you think this round of layoffs
393.3s
isn't just a problem with Meta,
396.5s
but rather a general trend where more and more engineers,
402.0s
or those working in AI,
404.9s
will be
405.6s
laid off. The general trend is that one day, everyone will be unemployed. I think this is a very alarming trend. It's like that , or rather, there won't be traditional jobs where I'm employed by a company and I help that company do its work. Maybe in the future, that won't be necessary.
409.3s
that one day, everyone will be unemployed.
411.1s
I think this is a
413.1s
very alarming trend. It's like that
415.1s
, or rather, there won't be traditional jobs
418.9s
where I'm employed by a company
420.5s
and I help that company do its work.
423.5s
Maybe in the future, that won't be necessary.
425.3s
For example, if... If I were to become a CEO , a leader of a small company , or start my own business, with these tools at my disposal, I would realize that I wouldn't need as many people to do many things. Many tasks are automated , and to a very high degree . So, what
427.3s
, a leader of a small company
430.3s
, or start my own business,
434.0s
with these tools at my disposal,
436.1s
I would realize that
437.3s
I wouldn't need as many people to do many things.
440.4s
Many tasks are automated
441.5s
, and to a very high degree
443.3s
. So, what
446.4s
might have previously required a team of hundreds or thousands of people to do something now might not require that many. Many tasks can be automated using agents. Therefore, I think that in general, fewer people will be working on AI itself , but more and more people will be exploring using AI as tools to explore other things
447.7s
of hundreds or thousands of people to do something
450.8s
now might not require that many.
452.8s
Many tasks
454.0s
can be automated using agents.
456.7s
Therefore,
457.6s
I think that in general,
461.3s
fewer people will be working on AI itself
464.3s
, but
465.6s
more and more people will be
466.6s
exploring using AI as tools to explore other things
469.9s
. That's roughly the process. Do you think there will be fewer people researching Foundation Models ? Yes, that 's true. There will likely be more and more exploratory research on the model base model , but fewer and fewer people will simply build and train the model according to our previous engineering logic. This is because we'll find
471.3s
Do you think
472.0s
there will be fewer people researching Foundation Models
473.9s
? Yes, that
475.5s
's true.
476.1s
There will likely be more and more exploratory research
478.1s
on the model base model
481.0s
, but fewer and fewer people will simply
485.1s
build and train the model
487.5s
according to our previous engineering logic.
489.1s
This is because we'll find
491.3s
that everyone follows the same logic to train the model, and the code will all run and be effective. Why would we need so many people? Many will say we can do research or other exploratory work, and those people will increase. And there will also be more and more people developing applications . But these applications aren't general applications
494.0s
to train the model,
495.8s
and the code will all run
497.0s
and be effective.
499.1s
Why would we need so many
500.6s
people? Many will say we can do research
502.9s
or other exploratory work,
504.8s
and those people will increase. And there
506.7s
will also be more and more people developing applications
509.3s
. But these applications
510.7s
aren't general applications
513.2s
they'll often be implemented in a specific vertical field or use this technology... There will likely be more and more people doing what you want to do now , but this applies to the middle layer, the execution team. For those doing execution , their work is repetitive, right? Many things need fixing or processing.
513.6s
be implemented in a specific vertical field
515.9s
or use this technology... There
517.1s
will likely be more and more people
519.0s
doing what you want to do now
520.4s
, but this
524.7s
applies to the middle layer, the execution team.
527.9s
For those doing execution
529.5s
, their work is repetitive, right?
531.6s
Many things need fixing or processing.
533.9s
But as tools become more automated, repetitive labor will decrease. That's the general feeling. Before this layoff, what were you researching at FAIR? Before the layoffs , actually, in January of this year, 2011, I went to GenAI to help out. During that time, we weren't doing research most of the time we were doing various emergency response tasks.
536.1s
repetitive labor will decrease.
538.3s
That's the general feeling.
539.7s
Before this layoff,
541.2s
what were you researching at FAIR?
544.2s
Before the layoffs
544.8s
, actually, in
547.1s
January of this year, 2011, I went to
548.6s
GenAI to help out.
550.4s
During that time,
553.1s
we weren't doing research most of the time
555.5s
we were doing various emergency response tasks.
557.8s
Right, that was Llama. 4 Llama 4 Yes, of course. I personally still have some collaborative work with other friends . For example, in April or May of this year, we published an article analyzing the theoretical strengths of our previous Continuous Thinking Chain. This analysis was quite effective and influential. People felt that it added a note
560.0s
Llama 4
561.1s
Yes, of course. I personally still have some
563.9s
collaborative work with
565.9s
other friends
567.9s
. For example, in April or May of this year, we published an article
570.6s
analyzing the
574.0s
theoretical strengths
574.7s
of our previous Continuous Thinking Chain.
576.1s
This analysis was quite
578.3s
effective
579.3s
and influential.
581.3s
People felt that
583.6s
it added a note
585.6s
to the Continuous Thinking Chain Coconut article, indicating that we had indeed done a more in-depth theoretical analysis . This analysis made the Continuous Thinking Chain approach more reasonable , and more work might be done on it. You can talk about the future development of open source and closed source. You think that because many outsiders say that
587.3s
article,
588.4s
indicating that we had indeed
591.8s
done a more in-depth
594.4s
theoretical
595.2s
analysis . This analysis made the Continuous Thinking Chain
597.5s
approach more reasonable
600.3s
, and more work might be done on it.
602.2s
You can talk about
603.2s
the future development of open source and closed source.
606.0s
You think that because many
608.2s
outsiders say that
609.3s
open source is not feasible in a large company's architecture, because the competition in cutting-edge models is too fierce, and others are closing source, you may not be able to persist in open source. Do you think that the gap between open source and closed source models will become wider and wider , and will anyone
612.1s
in a large company's architecture,
612.8s
because the competition in cutting-edge models is too fierce, and
615.8s
others are closing source,
618.0s
you may not be able to persist in open source.
620.0s
Do you think that the gap
622.6s
between open source and closed source models will become
625.4s
wider and wider
627.3s
, and will anyone
629.6s
still do open source? Many companies, especially in China, are doing open source. But I think there will still be open source in Silicon Valley. For example, I know some companies like Reflection, right ? AI developers are likely working on open-source models, right? They have many requirements and ideas to explore these things. OpenAI previously developed an
631.0s
companies, especially in China, are doing open
632.9s
source. But
634.2s
I think there
635.5s
will still be open source in Silicon Valley.
637.6s
For example, I know some companies
640.0s
like Reflection, right
641.2s
? AI
641.7s
developers are likely working on open-source models, right?
645.1s
They have many requirements
647.3s
and ideas to explore these things.
649.6s
OpenAI previously developed an
651.1s
open-source GPT-OSS model , so I think open source will continue, and it certainly will. Ai2 is also working on open-source projects. The bigger question is , what are the uses of these models ? Whether open-source or closed-source, once a model is available , it can be used as a chat tool, a search tool, or a productivity tool.
654.2s
, so I think open source will continue,
658.2s
and it certainly will.
661.6s
Ai2 is also working on open-source projects.
663.8s
The
664.3s
bigger
667.2s
question
668.7s
is , what are
670.1s
the uses of these models
672.1s
? Whether open-source or closed-source,
674.2s
once a model is available
675.9s
, it can be used
677.7s
as a chat tool,
678.0s
a search tool, or
679.1s
a productivity tool.
680.1s
Large companies might work on these, but there are many other directions. For example, the model can be used for scientific research , scientists' work , or work in vertical fields . Small companies can do this. That 's roughly it. So, at a certain point, how powerful does the model need to be to solve this problem?
682.8s
but
684.3s
there are many other directions.
686.8s
For example, the model can be used for scientific research
690.0s
, scientists' work
691.3s
, or work in vertical fields
693.2s
. Small companies can do this. That
696.5s
's roughly it.
698.2s
So, at a certain point,
700.4s
how powerful does the model need
701.9s
to be to solve this problem?
704.5s
That's probably the question. This is a problem that varies from person to person or problem to problem. Ultimately, we find that in different fields, do we really need a model that is strong in all aspects? Not necessarily. It might only be strong in the areas you care about. At this point, differentiation may begin.
707.7s
or problem to problem.
710.9s
Ultimately, we find that in different fields, do
714.3s
we really need a model that is strong in all aspects?
718.0s
Not
718.8s
necessarily. It might only be strong in the areas you care about.
722.9s
At this point, differentiation may begin.
724.3s
Each person and each model may have their own ideas, and each company may have its own purpose in developing this model. As a result, there will be all sorts of different models doing different things. In this situation, there may be different strategies, right? Some models may want to be open source because after being open sourced ,
725.9s
may have their own ideas, and
727.6s
each company may have its own purpose in developing this model. As
730.6s
a result, there will be all sorts of different models doing different things.
733.9s
In this situation,
736.1s
there may be different strategies, right?
739.1s
Some models may want to be open source
741.0s
because after being open sourced
741.6s
,
742.2s
everyone can use them to build a community, right? Or as a tool platform. In this case, open source is very reasonable. For example, I have a model that, after being trained, can call a certain standard toolkit, and then I can use the standard toolkit... If I could use this model to create a platform for everyone to
745.5s
Or as a tool platform.
747.2s
In this case, open source is very reasonable.
749.6s
For example, I have a model
750.9s
that, after being trained,
752.9s
can call a certain standard toolkit,
755.8s
and then I can use the standard toolkit...
756.9s
If I could use this model to create a platform
758.7s
for everyone to
759.7s
use, then it would definitely be open source. However, for other fields, such as personalized search or personalized recommendations, I'd be less willing to open source such models, right? Or perhaps everyone trains their own model but doesn't open source it. So ultimately, it depends on the ultimate goal, not on whether
762.0s
would definitely be open source.
763.8s
However,
765.1s
for other fields, such as personalized search
769.3s
or personalized recommendations,
771.4s
I'd be less willing to open source such models,
773.9s
right?
774.7s
Or perhaps everyone trains their own model
776.2s
but doesn't open source it.
777.8s
So ultimately, it depends on the ultimate goal,
780.7s
not on whether
782.6s
open source or closed source is better or worse . Ultimately, it depends on the company's strategy, because every company and every individual has different strategies . So, you might think that in state-of-the-art SOTA models, it's difficult for an open source model to directly compete with a closed source model , but in many smaller, niche models,
784.7s
Ultimately, it depends on the company's strategy,
787.3s
because every company and every individual has different
790.1s
strategies
792.5s
.
795.1s
So, you might think that in state-of-the-art SOTA models,
797.2s
it's difficult for an open source
799.7s
model
802.1s
to directly compete with a closed source model
803.8s
, but in many smaller, niche models,
806.8s
there are still many, many opportunities for open source. That's how it is, right? Do you think LLM Large Language Model is the right path? I think LLM is a very interesting path , but I don't know if it's the right one. Because I think you ultimately agree with Yann on this point. LeCun? That's hard to say.
809.7s
That's how it is,
812.2s
right? Do you think LLM Large Language Model is the right path?
816.1s
I think LLM is a very interesting path
819.2s
, but I don't know if it's the right one.
821.3s
Because I think
823.0s
you ultimately agree with Yann on this point. LeCun?
827.5s
That's hard to say.
828.2s
I think we're all scientists, so people with a scientist's mindset always feel that they want to find something better, rather than being satisfied with the current framework and working on it until the end. That's definitely not the way I'm going to be. So I always say there are all sorts of possible problems , and
830.1s
we're all scientists,
832.1s
so people with a scientist's mindset always feel
836.0s
that they want to find something better,
838.6s
rather than being satisfied with the current framework and working on it until the end.
841.3s
That's definitely not the way
843.0s
I'm going to be.
844.3s
So I always say there are all sorts of possible problems
848.0s
, and
849.1s
how to solve these problems in other ways is a huge issue. So the biggest problem with large language models right now is that they require a lot of data. And while the quality of the trained model is certainly very good, it's definitely not as efficient as a human's. This is a huge problem
851.5s
is a huge issue.
853.0s
So the biggest problem with large language models right now is that
856.4s
they require a lot
859.6s
of data.
860.7s
And while the quality of the trained model
862.9s
is certainly very good, it's
865.0s
definitely not as efficient as a human's.
868.7s
This is a huge problem
870.2s
because for humans, the number of samples you learn is very small, and the number of tokens you can learn in your lifetime is probably only, for example, at most, on the order of 10 billion, especially text tokens. I've also mentioned this before. I calculated this number on a slide presentation
873.2s
and the number of tokens you can learn in your lifetime
876.4s
is probably only,
877.3s
for example, at most, on the order of 10 billion,
879.1s
especially text tokens.
881.9s
I've also mentioned this before. I calculated this number
885.0s
on a slide presentation
886.2s
, but the training data for large language models can easily reach 10 trillion or 30 trillion, right? There's a 1000-fold difference . How can you use human learning ability to bridge this 1000-fold gap ? It's very difficult. But humans can learn very well. We know that throughout human history,
888.7s
can easily reach 10 trillion
890.5s
or 30 trillion, right?
893.6s
There's a 1000-fold difference
896.7s
. How can you use human learning ability to bridge
897.5s
this 1000-fold gap
900.4s
? It's very difficult.
901.5s
But humans can learn very well.
902.6s
We know that throughout human history,
905.7s
there have been all sorts of incredibly talented scientists, right? Their ideas and approaches were unique. They didn't have access to many books or much data at the time , yet they were able to discover interesting new theorems, new proofs, new findings , or new inventions . So where did they get these abilities?
908.0s
Their ideas and approaches were unique.
911.7s
They didn't have access to many books
913.3s
or much data at the time
915.1s
, yet they were able to discover interesting
918.0s
new theorems, new proofs, new findings
920.6s
, or new inventions
921.9s
. So
924.0s
where did they get these abilities?
926.7s
Now, with so many tokens being put into large language models , have they reached human capabilities? This is actually a huge question right now . Question mark big question mark So, if that's the case, maybe our current training algorithm hasn't reached its optimal state, right? There might be better algorithms, better logic , and better ways to learn
927.9s
large language models ,
930.6s
have they reached human capabilities?
933.5s
This is actually a huge question right now
935.4s
. Question mark big question mark
938.2s
So, if that's the case,
940.3s
maybe our current training algorithm
942.8s
hasn't reached its optimal state, right?
944.4s
There might be better algorithms, better logic
946.9s
, and better ways to learn
949.0s
the representations that emerge from the data and use them to solve problems . Gradient descent might not be a particularly good solution. Maybe one day we won't use gradient descent anymore there might be other methods. This is just a wild guess, right? In that case, maybe our entire training framework might need to change
952.8s
and use them to solve problems
954.7s
. Gradient descent
956.0s
might not be a particularly good solution.
958.4s
Maybe one day we won't use gradient descent anymore
961.1s
there might be other methods.
962.3s
This is just a wild guess, right?
965.0s
In that case, maybe our entire training framework
967.6s
might need to change
969.6s
. Of course, this might not happen now , but I think it might be an interesting direction to experiment with in the future. I've seen some debate in the industry recently about reinforcement learning, especially with Andrej Karpathy. He did a podcast interview and expressed some rather negative views. What do you think of the RL reinforcement learning route?
972.3s
, but I think
973.0s
it might be an interesting direction
975.6s
to experiment with in the future.
977.0s
I've seen some debate in the industry recently about reinforcement learning,
981.4s
especially with Andrej Karpathy. He
983.4s
did a podcast interview
984.9s
and expressed some rather negative views.
988.6s
What do you think of the RL reinforcement learning route?
990.7s
I've been working in this area for a long time, and I also think that the good thing about RL reinforcement learning is that it's essentially a search process. So, you give it some difficult problems and let it search for them. The data you learn and the information you gain during the search process
993.4s
and I also think
994.2s
that the good thing about RL reinforcement learning is that
996.4s
it's essentially a search process.
998.6s
So, you give it some difficult problems
1000.9s
and let it search for them.
1002.1s
The data you learn and
1003.0s
the information you gain
1005.2s
during the search process
1007.0s
are of higher quality than the data you were fed. It's like one person is supervising another person, for example, someone else is attending a lecture by a teacher, right? Attending a lecture by a teacher can be considered equivalent to being supervised. In the realm of supervised learning, some argue that one can solve problems independently without attending lectures
1011.5s
It's like one person
1014.0s
is supervising another person,
1015.2s
for example, someone else is attending a lecture by a teacher, right? Attending a lecture
1019.5s
by a teacher can be considered
1020.7s
equivalent to being supervised.
1022.7s
In
1024.0s
the realm of supervised learning,
1025.8s
some argue that one
1027.4s
can solve problems
1028.8s
independently without attending lectures
1031.4s
. However, I believe the latter approach yields a more fundamental and problem-solving ability. Therefore, I think Reinforcement Learning RL is superior to Supervised Finite Soft SFT in this regard. Indeed, many articles demonstrate that Reinforcement Learning is stronger than SFT in many problems, especially inference . You need Reinforcement Learning to truly enable the model to learn reasoning.
1034.6s
and problem-solving ability.
1036.6s
Therefore, I think Reinforcement
1038.1s
Learning RL is superior to Supervised Finite Soft SFT
1041.6s
in this regard. Indeed, many articles demonstrate
1044.2s
that Reinforcement Learning
1045.9s
is stronger than SFT in many problems,
1048.9s
especially inference
1050.1s
. You need Reinforcement Learning
1051.7s
to truly enable the model to learn reasoning.
1053.8s
Supervised Finite Soft SFT might simply memorize previous reasoning processes , but it doesn't develop generalization ability. On new problems, its generalization ability might be weaker. Especially with extensive SFT, the model's quality may decline. This is the key difference between the two. However, Reinforcement Learning is merely a paradigm it doesn't involve any mysterious elements.
1055.1s
might simply memorize
1056.4s
previous reasoning processes
1058.2s
, but it doesn't develop generalization ability.
1060.8s
On new problems, its generalization ability might be weaker.
1064.1s
Especially with extensive SFT,
1067.5s
the model's quality may decline.
1070.4s
This is the key difference between the two.
1075.4s
However, Reinforcement Learning is merely a paradigm
1078.1s
it doesn't involve any mysterious elements.
1081.2s
Its ultimate goal is still to change weights, just like SFT , only the method of changing weights differs. Ultimately, perhaps a unified approach exists that can unify Reinforcement Learning and SFT. Reinforcement learning and Supervised Finite Fibre SFT, right? Unifying these things is because the ultimate goal is to change weights. Perhaps I have better methods for these problems.
1084.8s
just like SFT
1085.9s
, only the method of changing weights differs.
1088.2s
Ultimately, perhaps a unified approach exists that can unify
1092.1s
Reinforcement Learning and SFT. Reinforcement learning
1093.0s
and Supervised Finite Fibre SFT, right?
1094.2s
Unifying these things
1094.8s
is because the ultimate goal is to change weights.
1098.5s
Perhaps I have better methods for these problems.
1101.3s
For most people, reinforcement learning is simply a different data acquisition method. It collects data while searching, puts the data together, and then trains it. This is essentially an active learning method, different from SFT. Therefore, I think the biggest advantage of reinforcement learning is that it's active learning
1103.0s
is simply a different data acquisition method.
1106.1s
It collects data while searching,
1108.7s
puts the data together, and then trains it.
1112.5s
This is essentially an active learning method,
1115.8s
different from SFT.
1118.9s
Therefore, I think the biggest advantage of reinforcement learning is
1121.0s
that it's active learning
1122.4s
it can have a very positive impact on the distribution of data. This is its core strength , not that its objective function or training algorithm is different . Ultimately, it depends on the data itself. The quality of the collected data is different from SFT. That's why it can solve some more difficult problems. Andrej Karpathy's previous points
1125.3s
This is its core strength
1127.3s
, not that its objective function
1132.6s
or training algorithm is different
1134.5s
.
1135.4s
Ultimately, it depends on the data itself.
1137.2s
The
1138.5s
quality of the collected data
1141.1s
is different from SFT.
1142.2s
That's why it can solve some more difficult problems.
1144.6s
Andrej Karpathy's previous points
1147.2s
are actually quite good in some ways. The assertion that AGI Artificial General Intelligence is still 10 years away implies that we've entered an era measured in decades, not a world where AGI capabilities can be acquired immediately. I believe this. I myself have used GPT-5 before , and it helped me with a paper.
1150.8s
The assertion that AGI Artificial General Intelligence is still 10 years away
1152.7s
implies that we've entered
1154.1s
an era measured in decades, not
1156.8s
a world where AGI capabilities can be acquired immediately.
1159.8s
I believe this.
1162.2s
I myself have used GPT-5 before
1164.7s
, and it helped me with a paper.
1167.4s
My most recent paper was actually the result of self-play between GPT-5 and me. Essentially, I had no students , and I just talked to GPT-5 every day , telling it about problems I needed to solve and how we should develop research methods. It would provide a plan , but you'll find that without domain knowledge,
1170.5s
the result of self-play between GPT-5 and me.
1172.7s
Essentially, I had no students
1175.1s
, and I just talked to GPT-5 every day
1179.2s
, telling it about problems I needed to solve
1181.3s
and how we should develop research methods.
1185.7s
It would provide a plan
1187.6s
, but you'll find
1188.8s
that without domain knowledge,
1191.2s
the plan you create is similar to others lacking innovation and originality. However, as a researcher, having a deep understanding of the problem , or knowing that the plan, its impact , or the way of thinking is flawed or has fatal problems , allows GPT-5 to delve deeper and ultimately achieve better results. So, this kind
1192.7s
similar to others
1194.4s
lacking innovation and originality.
1196.6s
However, as a researcher,
1199.5s
having
1201.5s
a deep understanding of the problem
1203.2s
, or knowing
1204.2s
that the plan, its impact
1206.7s
, or the way of thinking is flawed
1210.4s
or has fatal problems
1212.3s
, allows GPT-5 to delve deeper
1213.8s
and ultimately achieve better results.
1216.5s
So, this
1218.7s
kind
1220.2s
of high-level human insight ... Human knowledge and unique insights into the problem are what current models lack. You need these things to make the model stronger. So, to say that AGI lacks these things is not entirely accurate. It's still true that AGI will never achieve top-tier insight because insight will always be led by humans. Yes,
1223.5s
... Human knowledge
1224.9s
and unique insights into the problem
1227.7s
are what current models lack.
1230.4s
You need these things
1231.9s
to make the model stronger.
1234.0s
So, to say that AGI lacks these things
1236.0s
is not entirely accurate. It's still true that AGI will
1237.8s
never achieve
1238.9s
top-tier insight
1240.9s
because insight will always be led by humans. Yes,
1242.5s
that's the problem. I've mentioned this before , similar to the early days of autonomous driving . Initially, progress was very rapid, and people thought it would soon replace humans. But the further we go, the bigger the problems become. Why? Because good insights and good data are becoming increasingly scarce and difficult to find. With less data,
1244.1s
this before , similar to
1247.2s
the early days of autonomous driving
1248.8s
. Initially, progress was very rapid,
1250.0s
and people thought it would soon replace humans.
1252.1s
But the further we go, the bigger the problems become.
1255.0s
Why?
1255.4s
Because good insights
1256.9s
and good data are becoming increasingly scarce and difficult to find.
1259.0s
With less data,
1259.7s
your model can't be trained properly. Humans' ability to acquire and deeply mine data will always surpass that of computers currently, it surpasses all models. For the same problem , humans might only need one or two samples to... While we can see the essence , computers , or large models like today,
1261.7s
Humans' ability to acquire
1265.5s
and deeply mine data
1267.8s
will always surpass that of computers
1269.0s
currently, it
1270.0s
surpasses all models. For
1271.3s
the same problem
1272.1s
, humans might only need one or two samples to... While we can see the essence
1275.3s
, computers
1276.3s
, or large models like today,
1280.3s
need at least hundreds or thousands of samples to roughly grasp a contour. Pre-training may require even more samples . In this situation, if the number of samples is insufficient, humans will always be better than current large models, especially experts in specific fields. They cannot , and they themselves cannot, present the samples they have learned to the computer
1284.0s
to roughly grasp a contour.
1287.2s
Pre-training may require even more samples
1289.8s
. In this situation,
1291.8s
if the number of samples is insufficient,
1293.2s
humans will always be better than current large models,
1297.1s
especially experts in specific fields.
1299.7s
They cannot
1300.8s
, and they themselves cannot,
1302.2s
present the samples they have learned to the computer
1305.2s
because these samples are their experience in their minds , which is difficult to quantify into sentences. If this is the case, AI can only forever follow behind humans. Humans gain insights through some better information processing methods and then feed them to computers and AI to make AI perform better in this direction. This is the current state.
1306.7s
are their experience
1308.0s
in their minds , which is difficult to quantify into sentences.
1312.2s
If this is the case,
1313.2s
AI can only forever follow behind humans.
1316.3s
Humans gain insights
1320.2s
through some better information processing methods
1321.3s
and then feed them to computers and AI
1323.1s
to make AI perform better in this direction.
1327.1s
This is the current state.
1329.0s
So I think this is quite close to some of my previous arguments. I have also been interviewed before and said that the Scaling Law is a pessimistic future. The Scaling Law is, frankly, a very strange topic. In the past, if we told people that adding exponential samples or exponential computing power would increase our performance linearly,
1333.0s
is
1334.6s
quite close
1335.9s
to some of my previous arguments.
1338.6s
I have also been interviewed before
1339.9s
and said that the Scaling Law is a pessimistic future.
1344.3s
The Scaling Law is, frankly,
1346.6s
a very strange topic.
1347.6s
In the past, if we told people
1350.2s
that adding exponential samples
1353.1s
or exponential computing power
1354.5s
would increase our performance
1356.6s
linearly,
1359.2s
I think that previous machine learning... Machine learning scientists might consider these things trivial because, regardless of the model, you can conclude that simply feeding in more data will yield better results . But I think what we should truly pursue is a model that can move more efficiently, effectively, and quickly along this path,
1359.9s
machine learning... Machine learning scientists
1361.1s
might consider these things trivial
1363.8s
because, regardless of the model,
1364.8s
you can conclude
1367.6s
that simply feeding in more data
1368.7s
will yield better results . But I think what we should truly pursue is
1372.7s
a model
1374.2s
that can
1376.3s
move more efficiently, effectively, and quickly
1379.0s
along this path,
1382.6s
rather than simply being satisfied with this law. That's correct , because this law leads to a rather pessimistic future meaning you need to feed in exponentially more samples to get a decent result. If that's the case, one day all of Earth's resources will be exhausted, and all of Earth's energy and electricity
1385.5s
, because this law leads
1388.9s
to a rather pessimistic future
1390.6s
meaning you need to feed in exponentially more samples
1394.1s
to get a decent result.
1396.2s
If that's the case, one day
1398.2s
all of Earth's resources will be exhausted, and
1401.0s
all of Earth's energy and electricity
1404.4s
will be used to train large models. In that situation, will we still rely on this ability to change our world? That's a huge question. I think at some point, people will realize that computational power isn't everything we might need a deeper understanding of models. I think this change will gradually happen.
1407.5s
In that situation,
1408.6s
will we still rely on this ability
1414.3s
to change our world?
1416.6s
That's a huge question.
1417.6s
I think at some point,
1422.3s
people will realize that computational power isn't everything
1425.5s
we might need a deeper understanding of models.
1428.4s
I think
1429.3s
this change will gradually happen.
1432.5s
That's one of my thoughts. Yes , but we need a more efficient way to develop intelligence . But do you think it will take a long time to find this solution? I think everyone is working on it. So it will take some time to do these things, at least for now. Let's talk about
1435.1s
, but we need a
1436.3s
more efficient way to develop intelligence
1439.4s
. But
1441.1s
do you think it will take a long time
1442.7s
to find this solution?
1445.3s
I think everyone is working on it.
1447.2s
So it will take some time to do these things,
1449.8s
at least for now. Let's talk about
1451.1s
large language models. Their capabilities are incredibly strong. Even if our model's capabilities stagnate now, its impact on various industries is still enormous. I think it can automate a large part of things and enhance the capabilities of many people. I feel that my understanding of large language models has far surpassed my previous abilities. This makes me feel there's
1454.1s
Even if our model's capabilities stagnate now,
1457.8s
its impact on various industries is still enormous.
1461.6s
I think it can automate a large part of things
1465.6s
and enhance the capabilities of many people.
1468.2s
I feel that my understanding of large language models
1471.0s
has far surpassed my previous abilities.
1473.4s
This makes me feel there's
1475.4s
a lot of room for development in this area , which is a major realization for me. I believe this marks the arrival of a new era. So even if the progress of large language models isn't rapid, I think there will be many opportunities in the next two to three years . So, if you still
1477.7s
, which is a major realization for me.
1481.5s
I believe this marks the arrival of a new era.
1486.2s
So even if the progress of large language models isn't rapid,
1490.5s
I think there will be many opportunities
1492.4s
in the
1493.7s
next two to three
1497.3s
years . So, if you still
1501.2s
want to do cutting-edge research or try application development, it would be best to combine both. If I could do cutting-edge research that is automated, that would be amazing, right? I already feel that my research paradigm might be partially replaced by automated pipelines. You mean agents? Not necessarily agents , but agents are definitely a very important factor.
1505.1s
it would be best to combine both.
1507.4s
If I could do
1509.3s
cutting-edge research that is automated,
1512.0s
that would be amazing, right?
1513.5s
I already feel that my research paradigm
1516.3s
might be
1520.0s
partially replaced by automated pipelines.
1521.8s
You mean agents?
1523.6s
Not necessarily agents
1524.4s
, but agents are definitely a very important factor.
1527.4s
Using agents can help you do many things. For example, you might not need to reply to emails yourself or manage your to-do list. Lists to-do items , or tasks that you don't need to do yourself automation can be done by computers. This is definitely going to happen . But the more important question is
1528.3s
can help you do many things.
1530.1s
For example, you might not need to reply to emails yourself
1532.5s
or manage your
1537.7s
to-do list.
1539.1s
Lists to-do items , or tasks that you don't need to do yourself
1542.7s
automation can be
1544.7s
done by computers.
1547.6s
This is
1548.7s
definitely going to happen
1550.1s
. But the more important question is
1551.8s
whether AI can replace humans in some advanced activities. This is a more complex issue, especially considering the challenges of advanced human thought processes. The key is the need for human insights. To what extent can AI help solve many difficult scientific problems? We don't yet know if AI can accomplish this . If it can,
1555.9s
This is a
1557.4s
more complex issue, especially considering the challenges of advanced human thought processes.
1560.6s
The key is the need for human
1563.5s
insights.
1564.7s
To what extent
1566.3s
can AI help solve
1569.5s
many difficult scientific problems?
1572.6s
We don't yet know if
1575.4s
AI can accomplish this
1578.1s
.
1579.2s
If it can,
1579.9s
it could, in turn, impact my research. From a research perspective, I might become a super researcher. With the addition of AI, I can conduct better research, and these tools can also benefit other things. That would be very interesting. Before you were pulled in to help with Llama 4, what were you researching? We were doing some research
1582.6s
From a research perspective,
1583.6s
I might become a super researcher.
1584.9s
With the addition of AI,
1587.1s
I can conduct better research,
1588.7s
and
1589.2s
these tools can also benefit other things.
1593.4s
That
1594.1s
would be very interesting.
1597.8s
Before you were pulled in to help with Llama 4,
1601.2s
what were you researching?
1603.3s
We were doing
1605.0s
some research
1606.9s
on reasoning, mainly on thought chains, their forms, and training methods. Before O1 came out last September , we noticed that very long thought chains affect the scaling law of the model. If you don't have many long thought chains , the scaling law isn't ideal you need many samples to get a good result.
1609.6s
mainly on
1613.0s
thought chains, their forms, and training methods.
1614.0s
Before
1615.8s
O1 came out last September
1617.0s
, we noticed that
1619.7s
very long thought chains
1620.9s
affect the scaling law of the model.
1625.1s
If you don't have many
1626.5s
long thought chains
1627.6s
, the scaling law isn't ideal
1630.8s
you need many samples
1632.3s
to get a good result.
1635.0s
But with long thought chains, the model's scaling will be affected. The code of the scaling law becomes very ideal. I can get better results with, for example, one-tenth of the samples, and one-tenth of the parameters. It's something like that. We've actually discovered this , but then we 're doing all sorts of transformations and explorations
1638.2s
the model's scaling will be affected. The code of the scaling law
1641.2s
becomes very ideal. I can
1643.2s
get better results
1644.7s
with, for example, one-tenth of the
1646.8s
samples, and one-tenth of the parameters.
1650.1s
It's
1651.2s
something like that.
1653.4s
We've actually discovered this
1654.5s
, but then we 're doing all sorts of transformations
1656.7s
and explorations
1658.5s
on the thought chain, right? Including our recent work at the end of last year, the continuous thought chain , which uses continuous space for latent space inference. This paper has indeed received a lot of attention, probably over 200 citations in just six months, and many people are willing to follow it. We 've been doing some exploratory work
1661.5s
Including
1662.6s
our recent
1663.4s
work at the end of last year, the continuous thought chain
1666.6s
, which uses continuous space for latent space inference.
1670.0s
This paper has indeed received a lot of attention,
1673.6s
probably over 200 citations in just six months,
1676.4s
and many people are willing to follow it. We
1678.8s
've been doing some exploratory work
1679.8s
and have seen some progress, so I think these things are very interesting . Last year, we also published a paper called Dualformer, which was one of the earliest to propose how to create hybrid mental models how to train long-term and short-term thinking together. We found that this model is actually
1683.1s
so I think these things are very interesting
1685.8s
. Last year, we also published
1687.1s
a paper called Dualformer,
1689.1s
which was one of the earliest to propose
1690.8s
how to create hybrid mental models
1693.2s
how to train long-term and short-term thinking together.
1696.9s
We found that this model is actually
1699.4s
more effective than simply training long-term or short-term thinking. Now, this has become standard practice all mental models have this adaptive property of combining long and short-term thinking . So, last year's research was quite up-to-date. Do you have any regrets about FAIR? That's an interesting question. I think my regret might be this I should have done more
1701.2s
Now, this has become standard practice
1703.3s
all mental models have this
1706.8s
adaptive property of
1708.9s
combining long and short-term thinking . So, last year's research
1712.3s
was quite up-to-date.
1713.8s
Do you have any regrets about FAIR?
1717.4s
That's an interesting question.
1720.8s
I think my regret might be this
1724.6s
I should have done more
1728.7s
engineering work at FAIR. Actually, when I first joined FAIR , in the first few years, I did a lot of engineering work. For some of my previous projects, like Go, I did a lot of engineering work myself. At the time, I was even criticized for coming here as a research person. The scientist research scientist
1730.6s
Actually, when I first joined FAIR
1732.6s
, in the first few years, I did a lot of engineering work.
1736.3s
For some of my previous projects,
1738.5s
like Go,
1739.4s
I did a lot of engineering work myself.
1742.1s
At
1746.3s
the time, I was even criticized for
1748.4s
coming here as a research person. The scientist research scientist
1750.2s
who was always doing engineering told me that while others' screens were full of articles , mine were full of code. I was criticized like that, so I said, Okay, if research scientists can't do engineering, then I'll read more code and more articles . So, you'll find that from 2015 to 2018, I was mostly doing engineering,
1751.8s
told me that while
1753.4s
others' screens were full of articles
1756.4s
, mine were full of code.
1758.2s
I was criticized like that,
1760.8s
so I said,
1761.8s
Okay, if research scientists can't do engineering,
1764.7s
then I'll read more code and more articles
1767.6s
. So, you'll find that
1768.7s
from 2015 to 2018, I
1772.2s
was mostly doing engineering,
1774.6s
and from 2018 until now, I've been doing more research. That's roughly the pattern. This is certainly related to the FAIR Fair policy at the time , and also because I had some research interests and wanted to do more research, so I switched to that approach. But now you'll find that in this era,
1777.0s
until now, I've been doing more research.
1780.3s
That's roughly the pattern.
1781.9s
This is certainly related to the FAIR Fair policy at the time
1787.7s
, and also because
1788.8s
I had some research interests
1790.6s
and wanted to do more research,
1792.7s
so I switched to that approach.
1794.0s
But now you'll find
1796.1s
that in this era,
1798.5s
people with strong engineering skills are more sought after, right? So it's interesting that people with strong research skills are also popular , but ideally, they should have both strong engineering and research skills that's extremely difficult. But I think I can achieve that, so I'm doing more engineering work
1801.4s
So it's interesting
1803.3s
that people with strong research skills are also popular
1806.5s
, but ideally, they should have both strong engineering and research skills
1810.2s
that's
1811.3s
extremely difficult.
1812.5s
But I think I can achieve that,
1814.7s
so
1816.6s
I'm
1819.1s
doing more engineering work
1821.2s
now . I can pick up a lot of things again and do these engineering things well. I think my biggest gain from FAIR was after 2018 I've had a lot of research during that period. Research taste refers to an appreciation for research and an understanding of research methods. This appreciation can be gradually developed , and it's
1823.6s
and do these engineering things well.
1824.8s
I think my biggest gain from FAIR
1827.2s
was after 2018
1829.5s
I've had a lot of research
1831.3s
during that period. Research taste
1833.6s
refers to an appreciation for research
1835.4s
and an understanding of research methods.
1838.0s
This appreciation
1839.1s
can be gradually developed , and it's
1842.2s
become increasingly apparent in recent years' publications. Therefore, having research taste is very helpful for one's future career path. This is crucial because a person who only does engineering has a significant problem they might only tackle difficult engineering problems without understanding their applications. However, having research taste means setting a path for oneself that can be continuously advanced.
1844.4s
in recent years' publications.
1846.0s
Therefore, having research taste
1848.6s
is very helpful
1850.2s
for one's future career path.
1852.1s
This is crucial
1853.8s
because a person who only does engineering
1855.0s
has a significant problem
1857.4s
they might only tackle difficult engineering problems
1861.0s
without understanding their applications.
1863.7s
However, having research taste
1865.6s
means setting a path for oneself
1867.4s
that can be continuously advanced.
1869.0s
This is extremely beneficial for one's life. Yes, I have another question I'm very curious about. Given the fierce competition in AI among companies and the intense talent war including Meta's latest lab , which spends a lot of money on a single person what kind of AI talent do you think is most scarce at this stage
1870.7s
for one's life. Yes,
1874.0s
I have another question I'm very curious about.
1877.3s
Given the fierce competition in AI among companies
1881.8s
and the intense talent war
1884.2s
including Meta's latest lab
1886.2s
, which spends a lot of money on a single person
1889.9s
what kind of AI talent
1892.3s
do you think is most scarce at this stage
1895.6s
? I think it completely depends on each person's positioning. First, I want to correct a point don't think about the present... Who is the most scarce? Because the definition of scarcity might change in a couple of years, right? So, think about Yann LeCun sitting on the sidelines for so many years and then suddenly winning the Turing Award
1902.2s
First, I want to correct a point
1903.6s
don't think about the present... Who is the most scarce? Because
1907.2s
the definition of scarcity might change
1908.9s
in a couple of years,
1911.0s
right? So, think about Yann LeCun sitting on the sidelines for so many years
1914.2s
and then suddenly winning the Turing Award
1917.2s
. So I think everyone should think about what they truly want to do, rather than doing what companies might like. I think that's more important because the whole process is different now. In the past, the market would send a signal saying what kind of talent we needed. This signal
1921.2s
about what they truly want to do,
1924.5s
rather than doing what companies might like.
1930.4s
I think that's more important
1932.0s
because the
1933.7s
whole process is different now.
1936.8s
In the past,
1940.1s
the market would send a signal
1942.5s
saying what kind of talent we needed. This signal
1945.4s
would then spread through universities, saying what kind of talent would be most sought after in the next ten years . Universities would then expand enrollment in the corresponding departments and hire more professors. Students would apply to those departments , and after four or more years of training, these students would finally meet the market's requirements. That's roughly how
1947.8s
spread
1950.2s
through universities,
1952.3s
saying what kind of talent would be most sought after in the next ten years
1954.9s
. Universities would then expand enrollment in the corresponding departments
1958.2s
and hire more professors.
1960.1s
Students would apply to those departments
1962.2s
, and after four or more years of training,
1965.4s
these students would finally meet the market's requirements.
1969.3s
That's roughly how
1970.8s
it worked before because the whole logic and speed were relatively slow, right? The industry cycle might have been... For example, the fluctuations used to occur over 10 or 20 years, so this process was possible. But now, the entire cycle might be very fast . By the time you want to learn a hot technology in the market,
1972.0s
because the whole logic and speed were
1975.9s
relatively slow, right?
1978.1s
The industry cycle might have been...
1979.4s
For example, the fluctuations used to occur over 10 or 20 years,
1982.3s
so this process was possible.
1984.7s
But now, the entire cycle might be very fast
1988.5s
. By the time you want to learn a hot technology in the market,
1992.8s
everyone in the world is learning it, right? You've thought of it, and others have thought of it too, right? Everyone in the world is learning it. There will always be someone who learns faster than you, someone who learns better than you, and someone who can immediately get started and make things work. So you might find that
1994.8s
You've thought of it, and others have thought of it too, right?
1996.9s
Everyone in the world is learning
1998.0s
it. There will always be someone who learns faster than you,
2000.3s
someone who learns better than you, and someone who
2001.8s
can immediately get started and make things work.
2004.0s
So you might find that
2005.5s
after studying for half a year or a year, you can't compete with others, and you still can't stand out. In this case, the market has changed. Maybe next year won't be the era where this particular skill is most important. Maybe something else has taken its place. If you start learning then,
2008.2s
you can't compete with others,
2009.3s
and you still can't stand out.
2011.9s
In this case,
2012.7s
the market has changed.
2013.5s
Maybe next year won't be
2017.9s
the era where this particular skill is most important.
2020.6s
Maybe something else has taken its place.
2022.4s
If you start learning then,
2023.2s
you might always be following in others' footsteps. So maybe in the future, everyone will suddenly realize that instead of following the market's orders, it's better to do what you want to do. You 'll be happy doing it, and also, once this thing is discovered ... The benefits are huge, of course, that's the ideal situation, right?
2026.5s
So maybe in the future, everyone will suddenly realize
2029.1s
that instead of following the market's orders,
2030.6s
it's better to do what you want to do. You
2033.2s
'll be happy doing
2034.4s
it, and also, once this thing is discovered
2038.0s
... The benefits are huge,
2040.4s
of course, that's the ideal situation, right?
2042.0s
Because in reality, you definitely need to combine both sides. You 'll definitely want to judge for yourself whether this thing will be useful in the future, plus your own interests. Finally, you can put more effort into it after combining the two. Yes , that's roughly it. So it's very difficult to make a judgment
2044.4s
You 'll definitely want
2046.3s
to judge for yourself whether this thing will be useful in the future,
2049.8s
plus your own interests.
2053.0s
Finally,
2054.3s
you can put more effort into it after
2057.0s
combining the two. Yes
2059.9s
, that's roughly it.
2060.6s
So it's very difficult to make a judgment
2066.0s
because it completely depends on your own ability. I feel you are a very idealistic person. Yes, and I feel that FAIR was a very idealistic team before , as we talked about in the last podcast. But you feel that the market is a bit distorted now because when the competition is particularly fierce, many cultures and beliefs
2068.1s
I feel you are a very idealistic person.
2070.8s
Yes, and I feel that FAIR was a
2073.4s
very idealistic team
2075.2s
before
2076.5s
, as we talked about in the last podcast.
2078.6s
But you feel that
2080.6s
the market is a bit distorted now
2082.0s
because when the competition is particularly fierce,
2084.0s
many cultures and beliefs
2086.5s
may deviate. Do you think that in the current situation, there are still relatively idealistic research labs? Maybe Ilya Sutskever's team or Mira's team are considered relatively idealistic. Their counterparts are Sam... Altman is very commercial and aggressive. How do you view this balance? I think firstly,
2088.8s
you think that in the current situation,
2091.5s
there are still
2092.8s
relatively idealistic research labs?
2096.6s
Maybe Ilya Sutskever's team
2100.3s
or Mira's team
2102.7s
are considered relatively idealistic.
2106.1s
Their counterparts are Sam... Altman
2108.2s
is very commercial and aggressive.
2110.2s
How do you view this balance?
2112.9s
I think firstly,
2114.3s
you shouldn't treat large companies as monolithic. In fact, there are many groups , and many of these groups have research teams. These teams themselves have a research spirit and research freedom. This will always exist. Fair is just a very famous and well-known place. But there are many places that are not as famous as Fair
2116.9s
In fact, there are many groups
2117.8s
, and many of these groups have research teams.
2120.7s
These teams themselves
2121.4s
have
2123.6s
a research spirit and research freedom.
2124.6s
This will always exist.
2126.4s
Fair is just a very famous and
2129.5s
well-known place.
2131.8s
But there are many places
2134.7s
that are not as famous as Fair
2137.5s
, but they also have a free space to do research. Even within Meta, there are many groups that have space to do research. I have many collaborators in Meta who also do some research. I don't think this is a problem. Maybe Fair might not be as research-oriented in the future because of this or other reasons, right?
2139.9s
to do research.
2141.0s
Even
2142.3s
within Meta, there are many groups
2143.9s
that have space to do research.
2145.7s
I have many collaborators in Meta
2147.4s
who also do some research.
2149.1s
I don't think this is a problem.
2152.8s
Maybe Fair might
2155.5s
not be as research-oriented
2158.1s
in the future because of this or other reasons, right?
2160.9s
But there will still be many places where you can do research. Even when you are a startup, you might find that the problem is very cutting-edge, so there will definitely be things you can do there. Because when we talk about research, we mean the process itself is to find new solutions to difficult problems. That's called research,
2163.7s
Even when you are a startup,
2165.8s
you might find
2167.9s
that the problem is very cutting-edge,
2168.9s
so there will definitely be things you can do there.
2171.0s
Because when we talk about research,
2172.2s
we mean the process itself
2174.2s
is to find new
2177.5s
solutions to difficult problems.
2180.0s
That's called research,
2181.1s
or re-search. Right? Actually, it's about research exploration , so it's not an abstract concept . I think there are many areas where it can be done. It's not a monolithic concept it's not that big companies can't do it, but small companies can. It's not that simple . It completely depends on which group, which person
2182.3s
Actually, it's about research exploration
2183.9s
, so it's not an abstract concept
2186.9s
. I think there are many areas where it can be done.
2189.3s
It's not a monolithic concept it's not that big companies can't do it, but small companies can.
2192.6s
It's not that simple
2193.6s
. It completely depends on which group, which person
2196.9s
, what resources, what kind of things, and what kind of chemical reaction will occur when these people come together. Maybe it can be done today but not tomorrow , or maybe there's room for it for a period of time, but not at other times. So countless people are thinking about this problem , and maybe
2199.3s
what kind of chemical reaction will
2200.5s
occur when these people come together.
2202.3s
Maybe it can be done today but not tomorrow
2204.0s
, or maybe there's
2206.6s
room for it for a period of time, but not at other times.
2209.1s
So countless people are thinking about this problem , and maybe
2211.6s
a new work will definitely emerge during this period, influencing the entire field. So research will always continue, it's just that its form may become more like guerrilla warfare. It's not that some very famous research institutions will do research , saying, I'll dedicate all our time and energy to research. Maybe not.
2213.1s
during this period,
2214.8s
influencing the entire field.
2217.2s
So research will always continue,
2220.3s
it's just that its form may become
2223.4s
more like guerrilla warfare.
2225.5s
It's not that some very famous
2228.3s
research institutions
2229.3s
will do research
2231.4s
, saying,
2232.0s
I'll dedicate all our time and energy to research.
2234.2s
Maybe not.
2235.1s
But you will always find many idealistic people and small organizations continuing to do what they want to do. It's roughly like this process. Yes, it's not 0 or 1 there will be many gray areas. The last question is, what is your next step? As I just said, the next step is not yet determined,
2239.1s
and small organizations
2242.4s
continuing to do what they want to do.
2244.5s
It's roughly like this process. Yes,
2245.9s
it's not 0 or 1
2247.4s
there will be many gray areas.
2249.7s
The last question
2250.6s
is, what is your next step?
2252.5s
As I just said, the next step is not yet determined,
2254.7s
so it's still under discussion. Because it hasn't been a week since I was laid off, so... There are some considerations and ideas. The question you just asked was whether I want to work on applications or continue my scientific research, right? My answer is, of course, it's best to combine both.
2257.2s
Because it hasn't been a week since I was laid off,
2261.4s
so... There are some considerations and ideas. The
2266.1s
question you just asked was whether I want to work on applications
2269.0s
or continue my scientific research, right?
2273.2s
My answer is, of course, it's best to combine both.
2275.3s
We want to find a way to empower my scientific research while also being able to do many other things. Does such an opportunity exist? I don't know, but generally speaking , we set a high goal first and then look at the options. Because generally, people are more realistic
2279.7s
while also being able to do many other things.
2282.3s
Does such an
2284.2s
opportunity exist?
2287.5s
I don't know, but generally speaking
2289.4s
, we set a high goal first
2291.2s
and then look at
2292.2s
the options.
2293.8s
Because generally, people are more realistic
2295.1s
they think, If such an opportunity exists, I don't need to think about it. But actually, it should be the other way around. First, think of an impossible goal, and then think about what can support it. This might give you a better direction to take. Okay, then we look forward to your next announcement. Okay, that
2296.5s
But actually, it should be the other way around.
2298.0s
First, think of an impossible goal,
2300.2s
and then think about what can support it.
2303.1s
This might give you a better direction to take.
2307.2s
Okay,
2309.0s
then we look forward
2310.6s
to your next announcement.
2312.8s
Okay, that
2314.4s
concludes our interview with Tian Yuandong. We also look forward to his next move. I sincerely hope he can find a new role that balances cutting-edge research and engineering applications. I think this is the path that cutting-edge AI engineers are exploring. Good luck to him! Do you think such AI work exists?
2317.0s
We also look forward to his next move.
2319.2s
I sincerely hope he can find
2322.3s
a new role that balances cutting-edge research and engineering applications.
2326.2s
I think this is the path that cutting-edge AI engineers
2328.5s
are exploring.
2330.0s
Good luck to him!
2331.4s
Do you think such AI work exists?
2333.6s
Welcome to leave us comments, share, and like! Your support is the best motivation for Silicon Valley 101 to produce in-depth technology and business content. See you in the next video! Bye!
2336.4s
Your support is
2338.3s
the best motivation for Silicon Valley 101 to produce in-depth technology and business content.
2341.5s
See you in the next video! Bye!