Andrej Karpathy: Software Is Changing (Again)

Оценили: 25

Drawing on his work at Stanford, OpenAI, and Tesla, Andrej sees a shift underway. Software is changing, again. We’ve entered the era of “Software 3.0,” where natural language becomes the new programming interface and models do the rest.

*https://www.youtube.com/watch?v=LCEmiRjPEtQ
**https://300.ya.ru/v_xrCKEaUB

таймкоды

00:00:00 Введение

Приветствие бывшего директора по искусственному интеллекту Tesla Андре Карпати.
Обсуждение изменений в программном обеспечении в эпоху искусственного интеллекта.
Подчёркивание уникальности текущего момента для входа в индустрию.

00:01:12 Эволюция программного обеспечения

Объяснение карты GitHub как инструмента для анализа программного обеспечения.
Введение понятия «программное обеспечение 20» — нейронные сети и их параметры.
Пример использования Hugging Face и Atlas Models для управления параметрами моделей.

00:02:53 Программируемые нейронные сети

Переход от фиксированных функций нейронных сетей к программируемым с помощью больших языковых моделей.
Введение понятия «программное обеспечение 30» — подсказки на английском языке для LLM.
Пример классификации настроений с помощью Python, нейронных сетей и LLM.

00:04:27 Пример из Tesla

Описание процесса развития автопилота в Tesla: удаление кода C++ и увеличение роли нейронных сетей.
Перенос функциональности из одного приложения в другое.
Важность владения разными парадигмами программирования для работы в отрасли.

00:06:05 LLM как коммунальные службы

Сравнение LLM с коммунальными службами: капитальные вложения в обучение, операционные расходы через API.
Требования к LLM: низкая задержка, высокое время безотказной работы, стабильное качество.
Возможность переключения между различными типами LLM.

00:07:40 Свойства LLM как фабрик

LLM как фабрики с высокими капитальными затратами.
Быстрое развитие технологических древов в лабораториях LLM.
Сосредоточение секретов исследований и разработок в лабораториях LLM.

00:08:32 Аналогии между LLM и операционными системами

LLM сравниваются с операционными системами, а не с простыми товарами, такими как электричество.
Экосистемы программного обеспечения становятся сложнее, включая конкурирующих поставщиков с закрытым и открытым исходным кодом.
LLM можно рассматривать как новый тип компьютера с эквивалентом центрального процессора и контекстными окнами, похожими на память.

00:10:37 Загрузка приложений LLM

Приложения LLM можно загружать и запускать на разных платформах, аналогично VS Code.
Пример с загрузкой приложения на разные платформы: GPT, Cloud, Gemini Series.

00:11:00 Централизация LLM в облаке

Из-за высокой стоимости вычислительных ресурсов LLM централизованны в облаке.
Пользователи взаимодействуют с ИТ через сеть, как тонкие клиенты.
Мини-компьютеры Mac могут быть полезны для некоторых задач LLM.

00:11:56 Текстовый интерфейс LLM

Общение с LLM в текстовом режиме напоминает работу с операционной системой через терминал.
Графический интерфейс для всех задач LLM ещё не разработан.

00:12:41 Уникальность LLM

LLM меняют направление распространения технологий, делая их доступными для широкой аудитории.
Первые пользователи LLM — обычные люди, а не правительства или корпорации.

00:14:29 Психология LLM

LLM рассматриваются как стохастические имитации людей с энциклопедическими знаниями и памятью.
Они обладают сверхспособностями, но также имеют когнитивные нарушения, такие как галлюцинации и ошибки в самопознании.

00:16:46 Ограничения LLM

Контекстные окна LLM сравниваются с рабочей памятью, требующей прямого программирования.
Ограничения безопасности LLM включают доверчивость и риск утечки данных.

00:17:20 Примеры из популярной культуры

Фильмы «Воспоминание» и «51-е свидание» иллюстрируют проблемы с контекстными окнами и их влияние на повседневную жизнь.

00:17:51 Безопасность и когнитивные ограничения ИИ

Необходимо учитывать когнитивные недостатки ИИ при его использовании.
Важно найти баланс между преодолением недостатков и использованием сверхчеловеческих способностей ИИ.

00:18:15 Приложения с частичной автономией

Пример приложения для программирования с частичной автономией — Cursor.
Cursor сочетает традиционный интерфейс и интеграцию с LLM.
LLM управляют контекстом и организуют обращения к моделям.

00:19:52 Графический интерфейс

Графический интерфейс упрощает проверку работы систем и ускоряет процесс.
Визуальные изменения помогают лучше интерпретировать данные.

00:20:22 Ползунок автономности

В Cursor можно настроить степень автономности ИИ.
Примеры команд: изменение фрагмента кода, всего файла или всего репозитория.

00:21:02 Пример «Недоумения»

«Недоумение» имеет похожий функционал с Cursor.
Возможность выбора уровня автономии: быстрый поиск, исследование, глубокое исследование.

00:22:07 Ускорение верификации

ГИС помогают ускорить процесс верификации.
Визуальные представления облегчают аудит систем.

00:22:57 Контроль ИИ

Важно контролировать гиперактивность ИИ.
Работа небольшими порциями помогает избежать ошибок.

00:24:17 Лучшие практики работы с LLM

Конкретные подсказки увеличивают вероятность успешной проверки.
ИИ нужно «держать на привязи» для предотвращения ошибок.

00:25:17 Образование и ИИ

Создание отдельных приложений для преподавателей и студентов.
Промежуточные варианты курсов помогают убедиться в их качестве.

00:25:49 Аналогия с автопилотом

Пример частичной автономии из Tesla: графический интерфейс автопилота и ползунок автономности.
История успешного тестирования самоуправляемого автомобиля в 2013 году.

00:27:01 Автономия и управляющие агенты

Несмотря на 12 лет работы, проблема автономии всё ещё не решена полностью.
Автомобили Waymo выглядят автономными, но часто требуют участия человека.
Программное обеспечение для управления агентами сложное и требует осторожности.

00:27:50 Аналогия с костюмом Железного человека

Костюм Железного человека сочетает функции дополнения и агента.
Текущая задача — создавать продукты с частичной автономией и пользовательским интерфейсом.
Важно автоматизировать процесс проверки генерации у человека.

00:28:47 Новый тип программирования

Появился новый язык программирования, основанный на английском языке.
Vibe-кодирование позволяет любому стать программистом без длительного обучения.
Раньше для работы с ПО требовалось 5–10 лет обучения, теперь это не так.

00:30:30 Пример Vibe-кодирования

Автор создал приложение Menugen с помощью Vibe-кодирования.
Приложение генерирует изображения меню на основе фотографий.
Реализация приложения оказалась сложнее, чем ожидалось, из-за необходимости интеграции с DevOps.

00:33:04 Взаимодействие с агентами

Агенты могут взаимодействовать с программной инфраструктурой через текстовые файлы.
Markdown упрощает понимание документации для LLM.
Примеры служб, переводящих документацию для LLM: Vercel и Stripe.

00:35:54 Адаптация документации для LLM

Необходимо адаптировать документы, чтобы они были понятны LLM.
Vercel заменяет «нажатия кнопок» эквивалентными командами curl для LLM.
Адаптация документации открывает новые возможности для взаимодействия с агентами.

00:36:18 Модельные контекстные протоколы и инструменты для LLM

Модельные контекстные протоколы от Anthropic позволяют агентам напрямую общаться с цифровой информацией.
Инструменты, такие как git ingest, преобразуют данные из репозиториев GitHub в формат, удобный для LLM.
Deep Wiki от Devin анализирует репозитории и создаёт документацию, которая также полезна для LLM.

00:37:17 Будущее доступа к информации для LLM

В будущем LLM смогут взаимодействовать с окружением, но сейчас это сложно и дорого.
Необходимо упростить доступ LLM к информации, чтобы они могли эффективно работать.
Программное обеспечение должно адаптироваться к новым возможностям LLM.

00:38:14 Перспективы LLM в индустрии

Сейчас подходящее время для внедрения LLM в индустрию.
LLM сравниваются с утилитами, фабриками и операционными системами.
Инфраструктура должна быть адаптирована для работы с LLM, как с подверженными ошибкам элементами.

00:38:34 Аналогия с костюмом Железного человека

LLM сравниваются с костюмом Железного человека, который будет развиваться в течение следующего десятилетия.
Ожидается перемещение ползунка от ручного управления к автоматическому.
Автор выражает интерес к участию в этом процессе.

В этом видео

Intro
0:01
Please welcome former director of AI Tesla Andre Carpathy.
0:07
[Music] Hello.
0:14
[Music] Wow, a lot of people here. Hello.
0:22
Um, okay. Yeah. So I’m excited to be here today to talk to you about software in the era of AI. And I’m told that many
0:30
of you are students like bachelors, masters, PhD and so on. And you’re about to enter the industry. And I think it’s
0:36
actually like an extremely unique and very interesting time to enter the industry right now. And I think fundamentally the reason for that is
0:43
that um software is changing uh again. And I say again because I actually gave
0:49
this talk already. Um but the problem is that software keeps changing. So I actually have a lot of material to
0:55
create new talks and I think it’s changing quite fundamentally. I think roughly speaking software has not
1:00
changed much on such a fundamental level for 70 years. And then it’s changed I think about twice quite rapidly in the
1:06
last few years. And so there’s just a huge amount of work to do a huge amount of software to write and rewrite. So
1:12
let’s take a look at maybe the realm of software. So if we kind of think of this as like the map of software this is a
1:17
really cool tool called map of GitHub. Um this is kind of like all the software that’s written. Uh these are
1:23
instructions to the computer for carrying out tasks in the digital space. So if you zoom in here, these are all different kinds of repositories and this
Software evolution: From 1.0 to 3.0
1:30
is all the code that has been written. And a few years ago I kind of observed that um software was kind of changing
1:35
and there was kind of like a new type of software around and I called this software 2.0 at the time and the idea
1:42
here was that software 1.0 is the code you write for the computer. Software 2.0 know are basically neural networks and
1:48
in particular the weights of a neural network and you’re not writing this code directly you are most you are more kind
1:55
of like tuning the data sets and then you’re running an optimizer to create to create the parameters of this neural net
2:00
and I think like at the time neural nets were kind of seen as like just a different kind of classifier like a decision tree or something like that and
2:06
so I think it was kind of like um I think this framing was a lot more appropriate and now actually what we
2:12
have is kind of like an equivalent of GitHub in the realm of software 2.0 And I think the hugging face is basically
2:18
equivalent of GitHub in software 2.0. And there’s also model atlas and you can visualize all the code written there. In
2:24
case you’re curious, by the way, the giant circle, the point in the middle, uh these are the parameters of flux, the
2:30
image generator. And so anytime someone tunes a on top of a flux model, you basically create a git commit uh in this
2:37
space and uh you create a different kind of a image generator. So basically what we have is software 1.0 is the computer
2:43
code that programs a computer. Software 2.0 are the weights which program neural
2:48
networks. Uh and here’s an example of Alexet image recognizer neural network. Now so far all of the neural networks
2:55
that we’ve been familiar with until recently where kind of like fixed function computers image to categories
3:01
or something like that. And I think what’s changed and I think is a quite fundamental change is that neural
3:06
networks became programmable with large language models. And so I I see this as
3:12
quite new, unique. It’s a new kind of a computer and uh so in my mind it’s uh
3:18
worth giving it a new designation of software 3.0. And basically your prompts are now programs that program the LLM.
3:25
And uh remarkably uh these uh prompts are written in English. So it’s kind of a very interesting programming language.
3:33
Um so maybe uh to summarize the difference if you’re doing sentiment classification for example you can
3:39
imagine writing some uh amount of Python to to basically do sentiment classification or you can train a neural
3:46
net or you can prompt a large language model. Uh so here this is a few short prompt and you can imagine changing it
3:51
and programming the computer in a slightly different way. So basically we have software 1.0 software 2.0 and I
3:57
think we’re seeing maybe you’ve seen a lot of GitHub code is not just like code anymore. there’s a bunch of like English
4:03
interspersed with code and so I think kind of there’s a growing category of new kind of code. So not only is it a
4:09
new programming paradigm, it’s also remarkable to me that it’s in our native language of English. And so when this
4:14
blew my mind a few uh I guess years ago now I tweeted this and um I think it
4:20
captured the attention of a lot of people and this is my currently pinned tweet uh is that remarkably we’re now programming computers in English. Now,
4:28
when I was at uh Tesla, um we were working on the uh autopilot and uh we
4:34
were trying to get the car to drive and I sort of showed this slide at the time where you can imagine that the inputs to
Programming in English: Rise of Software 3.0
4:41
the car are on the bottom and they’re going through a software stack to produce the steering and acceleration
4:47
and I made the observation at the time that there was a ton of C++ code around in the autopilot which was the software
4:52
1.0 code and then there was some neural nets in there doing image recognition and uh I kind of observed that over time
4:58
as we made the autopilot better basically the neural network grew in capability and size and in addition to
5:05
that all the C++ code was being deleted and kind of like was um and a lot of the
5:12
kind of capabilities and functionality that was originally written in 1.0 was migrated to 2.0. So as an example, a lot
5:19
of the stitching up of information across images from the different cameras and across time was done by a neural
5:24
network and we were able to delete a lot of code and so the software 2.0 stack quite literally ate through the software
5:32
stack of the autopilot. So I thought this was really remarkable at the time and I think we’re seeing the same thing again where uh basically we have a new
5:39
kind of software and it’s eating through the stack. We have three completely different programming paradigms and I
5:44
think if you’re entering the industry it’s a very good idea to be fluent in all of them because they all have slight pros and cons and you may want to
5:50
program some functionality in 1.0 or 2.0 or 3.0. Are you going to train neurallet? Are you going to just prompt an LLM? Should this be a piece of code
5:57
that’s explicit etc. So we all have to make these decisions and actually potentially uh fluidly trans transition
6:03
between these paradigms. So what I wanted to get into now is first I want
6:09
to in the first part talk about LLMs and how to kind of like think of this new paradigm and the ecosystem and what that
LLMs as utilities, fabs, and operating systems
6:15
looks like. Uh like what are what is this new computer? What does it look like and what does the ecosystem look
6:20
like? Um I was struck by this quote from Anduring actually uh many years ago now
6:25
I think and I think Andrew is going to be speaking right after me. Uh but he said at the time AI is the new electricity and I do think that it um
6:33
kind of captures something very interesting in that LLMs certainly feel like they have properties of utilities
6:38
right now. So um LLM labs like OpenAI, Gemini,
6:44
Enthropic etc. They spend capex to train the LLMs and this is kind of equivalent to building out a grid and then there’s
6:51
opex to serve that intelligence over APIs to all of us and this is done
6:56
through metered access where we pay per million tokens or something like that and we have a lot of demands that are
7:01
very utility- like demands out of this API we demand low latency high uptime consistent quality etc. In electricity,
7:08
you would have a transfer switch. So you can transfer your electricity source from like grid and solar or battery or
7:14
generator. In LLM, we have maybe open router and easily switch between the different types of LLMs that exist.
7:20
Because the LLM are software, they don’t compete for physical space. So it’s okay to have basically like six electricity
7:26
providers and you can switch between them, right? Because they don’t compete in such a direct way. And I think what’s
7:31
also a little fascinating and we saw this in the last few days actually a lot of the LLMs went down and people were
7:38
kind of like stuck and unable to work. And uh I think it’s kind of fascinating to me that when the state-of-the-art LLMs go down, it’s actually kind of like
7:45
an intelligence brownout in the world. It’s kind of like when the voltage is unreliable in the grid and uh the planet
7:52
just gets dumber the more reliance we have on these models, which already is like really dramatic and I think will
7:58
continue to grow. But LLM’s don’t only have properties of utilities. I think it’s also fair to say that they have
8:03
some properties of fabs. And the reason for this is that the capex required for
8:09
building LLM is actually quite large. Uh it’s not just like building some uh power station or something like that,
8:15
right? You’re investing a huge amount of money and I think the tech tree and uh for the technology is growing quite
8:22
rapidly. So we’re in a world where we have sort of deep tech trees, research and development secrets that are
8:28
centralizing inside the LLM labs. Um and but I think the analogy muddies a little
8:34
bit also because as I mentioned this is software and software is a bit less defensible because it is so malleable.
8:40
And so um I think it’s just an interesting kind of thing to think about potentially. There’s many analogy
8:46
analogies you can make like a 4 nanometer process node maybe is something like a cluster with certain max flops. You can think about when
8:53
you’re use when you’re using Nvidia GPUs and you’re only doing the software and you’re not doing the hardware. That’s kind of like the fabless model. But if
8:59
you’re actually also building your own hardware and you’re training on TPUs if you’re Google, that’s kind of like the Intel model where you own your fab. So I
9:05
think there’s some analogies here that make sense. But actually I think the analogy that makes the most sense perhaps is that in my mind LLM have very
9:12
strong kind of analogies to operating systems. Uh in that this is not just
9:17
electricity or water. It’s not something that comes out of the tap as a commodity. uh this is these are now
9:22
increasingly complex software ecosystems right so uh they’re not just like simple
9:28
commodities like electricity and it’s kind of interesting to me that the ecosystem is shaping in a very similar
9:33
kind of way where you have a few closed source providers like Windows or Mac OS and then you have an open source
9:39
alternative like Linux and I think for u neural for LLMs as well we have a kind
9:45
of a few competing closed source providers and then maybe the llama ecosystem is currently like maybe a
9:51
close approximation to something that may grow into something like Linux. Again, I think it’s still very early
9:56
because these are just simple LLMs, but we’re starting to see that these are going to get a lot more complicated. It’s not just about the LLM itself. It’s
10:02
about all the tool use and the multiodalities and how all of that works. And so when I sort of had this realization a while back, I tried to
10:09
sketch it out and it kind of seemed to me like LLMs are kind of like a new operating system, right? So the LLM is a
10:15
new kind of a computer. It’s sitting it’s kind of like the CPU equivalent. uh the context windows are kind of like the
10:21
memory and then the LLM is orchestrating memory and compute uh for problem
10:26
solving um using all of these uh capabilities here and so definitely if
10:32
you look at it looks very much like operating system from that perspective. Um, a few more analogies. For example,
10:38
if you want to download an app, say I go to VS Code and I go to download, you can download VS Code and you can run it on
10:46
Windows, Linux or or Mac in the same way as you can take an LLM app like cursor
10:53
and you can run it on GPT or cloud or Gemini series, right? It’s just a drop down. So, it’s kind of like similar in
10:59
that way as well. uh more analogies that I think strike me is that we’re kind of like in this
The new LLM OS and historical computing analogies
11:04
1960sish era where LLM compute is still very expensive for this new kind of a
11:10
computer and that forces the LLMs to be centralized in the cloud and we’re all
11:15
just uh sort of thing clients that interact with it over the network and none of us have full utilization of
11:22
these computers and therefore it makes sense to use time sharing where we’re all just you know a dimension of the
11:28
batch when they’re running the computer in the cloud. And this is very much what computers used to look like at during
11:33
this time. The operating systems were in the cloud. Everything was streamed around and there was batching. And so
11:39
the p the personal computing revolution hasn’t happened yet because it’s just not economical. It doesn’t make sense. But I think some people are trying. And
11:46
it turns out that Mac minis, for example, are a very good fit for some of the LLMs because it’s all if you’re
11:52
doing batch one inference, this is all super memory bound. So this actually works. And uh I think these are some early
11:58
indications maybe of personal computing. Uh but this hasn’t really happened yet. It’s not clear what this looks like. Maybe some of you get to invent what
12:05
what this is or how it works or uh what this should what this should be. Maybe
12:10
one more analogy that I’ll mention is whenever I talk to Chach or some LLM directly in text, I feel like I’m
12:16
talking to an operating system through the terminal. Like it’s just it’s it’s text. It’s direct access to the
12:22
operating system. And I think a guey hasn’t yet really been invented in like a general way like should chatt have a
12:29
guey like different than just a tech bubbles. Uh certainly some of the apps that we’re going to go into in a bit
12:35
have guey but there’s no like guey across all the tasks if that makes sense. Um there are some ways in which
12:43
LLMs are different from kind of operating systems in some fairly unique way and from early computing. And I
12:49
wrote about uh this one particular property that strikes me as very different uh this time around. It’s that
12:57
LLMs like flip they flip the direction of technology diffusion uh that is usually uh present in technology. So for
13:05
example with electricity, cryptography, computing, flight, internet, GPS, lots of new transformative technologies that
13:10
have not been around. Typically it is the government and corporations that are the first users because it’s new and
13:16
expensive etc. and it only later diffuses to consumer. Uh, but I feel like LLMs are kind of like flipped
13:22
around. So maybe with early computers, it was all about ballistics and military use, but with LLMs, it’s all about how
13:29
do you boil an egg or something like that. This is certainly like a lot of my use. And so it’s really fascinating to me that we have a new magical computer
13:35
and it’s like helping me boil an egg. It’s not helping the government do something really crazy like some
13:40
military ballistics or some special technology. Indeed, corporations are governments are lagging behind the adoption of all of us, of all of these
13:47
technologies. So, it’s just backwards and I think it informs maybe some of the uses of how we want to use this
13:52
technology or like where are some of the first apps and so on. So, in summary so far, LLM labs LLMs. I
14:01
think it’s accurate language to use, but LLMs are complicated operating systems.
14:06
They’re circa 1960s in computing and we’re redoing computing all over again. and they’re currently available via time
14:11
sharing and distributed like a utility. What is new and unprecedented is that they’re not in the hands of a few
14:17
governments and corporations. They’re in the hands of all of us because we all have a computer and it’s all just software and Chaship was beamed down to
14:24
our computers like billions of people like instantly and overnight and this is insane. Uh and it’s kind of insane to me
14:30
that this is the case and now it is our time to enter the industry and program these computers. This is crazy. So I
14:37
think this is quite remarkable. Before we program LLMs, we have to kind of like spend some time to think about what
Psychology of LLMs: People spirits and cognitive quirks
14:43
these things are. And I especially like to kind of talk about their psychology. So the way I like to think about LLMs is
14:50
that they’re kind of like people spirits. Um they are stoastic simulations of people. Um and the
14:56
simulator in this case happens to be an auto reggressive transformer. So transformer is a neural net. Uh it’s and
15:02
it just kind of like is goes on the level of tokens. It goes chunk chunk chunk chunk chunk. And there’s an almost
15:08
equal amount of compute for every single chunk. Um and um this simulator of
15:14
course is is just is basically there’s some weights involved and we fit it to all of text that we have on the internet
15:20
and so on. And you end up with this kind of a simulator and because it is trained on humans, it’s got this emergent
15:26
psychology that is humanlike. So the first thing you’ll notice is of course uh LLM have encyclopedic knowledge and
15:32
memory. uh and they can remember lots of things, a lot more than any single individual human can because they read
15:37
so many things. It’s it actually kind of reminds me of this movie Rainman, which I actually really recommend people
15:43
watch. It’s an amazing movie. I love this movie. Um and Dustin Hoffman here is an autistic savant who has almost
15:49
perfect memory. So, he can read a he can read like a phone book and remember all of the names and phone numbers. And I
15:55
kind of feel like LM are kind of like very similar. They can remember Shaw hashes and lots of different kinds of
16:00
things very very easily. So they certainly have superpowers in some set in some respects. But they also have a
16:06
bunch of I would say cognitive deficits. So they hallucinate quite a bit. Um and
16:11
they kind of make up stuff and don’t have a very good uh sort of internal model of self-nowledge, not sufficient
16:17
at least. And this has gotten better but not perfect. They display jagged intelligence. So they’re going to be
16:22
superhuman in some problems solving domains. And then they’re going to make mistakes that basically no human will make. like you know they will insist
16:29
that 9.11 is greater than 9.9 or that there are two Rs in strawberry these are some famous examples but basically there
16:36
are rough edges that you can trip on so that’s kind of I think also kind of unique um they also kind of suffer from
16:43
entrograde amnesia um so uh and I think I’m alluding to the fact that if you have a co-orker who joins your
16:49
organization this co-orker will over time learn your organization and uh they will understand and gain like a huge
16:55
amount of context on the organization and they go home and they sleep and they consolidate knowledge and they develop
17:01
expertise over time. LLMs don’t natively do this and this is not something that has really been solved in the R&D of
17:06
LLM. I think um and so context windows are really kind of like working memory and you have to sort of program the
17:12
working memory quite directly because they don’t just kind of like get smarter by uh by default and I think a lot of
17:17
people get tripped up by the analogies uh in this way. Uh in popular culture I
17:22
recommend people watch these two movies uh Momento and 51st dates. In both of these movies, the protagonists, their
17:27
weights are fixed and their context windows gets wiped every single morning and it’s really problematic to go to
17:34
work or have relationships when this happens and this happens to all the time. I guess one more thing I would
17:39
point to is security kind of related limitations of the use of LLM. So for example, LLMs are quite gullible. Uh
17:46
they are susceptible to prompt injection risks. They might leak your data etc. And so um and there’s many other
17:52
considerations uh security related. So, so basically long story short, you have to load your you have to load your you
18:00
have to simultaneously think through this superhuman thing that has a bunch of cognitive deficits and issues. How do
18:05
we and yet they are extremely like useful and so how do we program them and
18:10
how do we work around their deficits and enjoy their superhuman powers.
18:15
So what I want to switch to now is talk about the opportunities of how do we use these models and what are some of the biggest opportunities. This is not a
Designing LLM apps with partial autonomy
18:22
comprehensive list just some of the things that I thought were interesting for this talk. The first thing I’m kind of excited about is what I would call
18:29
partial autonomy apps. So for example, let’s work with the example of coding. You can certainly go to chacht directly
18:36
and you can start copy pasting code around and copyping bug reports and stuff around and getting code and copy
18:42
pasting everything around. Why would you why would you do that? Why would you go directly to the operating system? It makes a lot more sense to have an app
18:48
dedicated for this. And so I think many of you uh use uh cursor. I do as well.
18:53
And uh cursor is kind of like the thing you want instead. You don’t want to just directly go to the chash apt. And I
18:59
think cursor is a very good example of an early LLM app that has a bunch of properties that I think are um useful
19:06
across all the LLM apps. So in particular, you will notice that we have a traditional interface that allows a
19:12
human to go in and do all the work manually just as before. But in addition to that, we now have this LLM
19:17
integration that allows us to go in bigger chunks. And so some of the properties of LLM apps that I think are
19:23
shared and useful to point out. Number one, the LLMs basically do a ton of the context management. Um, number two, they
19:31
orchestrate multiple calls to LLMs, right? So in the case of cursor, there’s under the hood embedding models for all
19:36
your files, the actual chat models, models that apply diffs to the code, and this is all orchestrated for you. A
19:43
really big one that uh I think also maybe not fully appreciated always is application specific uh GUI and the
19:50
importance of it. Um because you don’t just want to talk to the operating system directly in text. Text is very
19:56
hard to read, interpret, understand and also like you don’t want to take some of these actions natively in text. So it’s
20:03
much better to just see a diff as like red and green change and you can see what’s being added is subtracted. It’s
20:08
much easier to just do command Y to accept or command N to reject. I shouldn’t have to type it in text, right? So, a guey allows a human to
20:15
audit the work of these fallible systems and to go faster. I’m going to come back to this point a little bit uh later as
20:21
well. And the last kind of feature I want to point out is that there’s what I call the autonomy slider. So, for
20:27
example, in cursor, you can just do tap completion. You’re mostly in charge. You can select a chunk of code and command K
20:33
to change just that chunk of code. You can do command L to change the entire file. Or you can do command I which just
20:40
you know let it rip do whatever you want in the entire repo and that’s the sort of full autonomy agent agentic version
20:46
and so you are in charge of the autonomy slider and depending on the complexity of the task at hand you can uh tune the
20:53
amount of autonomy that you’re willing to give up uh for that task maybe to show one more example of a fairly
20:58
successful LLM app uh perplexity um it also has very similar features to what
21:04
I’ve just pointed out to in cursor uh it packages up a lot of the information. It orchestrates multiple LLMs. It’s got a
21:10
GUI that allows you to audit some of its work. So, for example, it will site sources and you can imagine inspecting
21:17
them. And it’s got an autonomy slider. You can either just do a quick search or you can do research or you can do deep
21:22
research and come back 10 minutes later. So, this is all just varying levels of autonomy that you give up to the tool.
21:27
So, I guess my question is I feel like a lot of software will become partially autonomous. I’m trying to think through
21:33
like what does that look like? And for many of you who maintain products and services, how are you going to make your
21:38
products and services partially autonomous? Can an LLM see everything that a human can see? Can an LLM act in
21:45
all the ways that a human could act? And can humans supervise and stay in the loop of this activity? Because again,
21:50
these are fallible systems that aren’t yet perfect. And what does a diff look like in Photoshop or something like
21:56
that? You know, and also a lot of the traditional software right now, it has all these switches and all this kind of
22:01
stuff that’s all designed for human. All of this has to change and become accessible to LLMs.
22:07
So, one thing I want to stress with a lot of these LLM apps that I’m not sure gets as much attention as it should is
22:14
um we we’re now kind of like cooperating with AIS and usually they are doing the generation and we as humans are doing
22:20
the verification. It is in our interest to make this loop go as fast as possible. So, we’re getting a lot of
22:25
work done. There are two major ways that I think uh this can be done. Number one, you can speed up verification a lot. Um,
22:32
and I think guies, for example, are extremely important to this because a guey utilizes your computer vision GPU
22:39
in all of our head. Reading text is effortful and it’s not fun, but looking at stuff is fun and it’s it’s just a
22:45
kind of like a highway to your brain. So, I think guies are very useful for auditing systems and visual
22:51
representations in general. And number two, I would say is we have to keep the AI on the leash. We I think a lot of
22:58
people are getting way over excited with AI agents and uh it’s not useful to me to get a diff of 10,000 lines of code to
23:05
my repo. Like I have to I’m still the bottleneck, right? Even though that 10,00 lines come out instantly, I have
23:11
to make sure that this thing is not introducing bugs. It’s just like and that it’s doing the correct thing,
23:16
right? And that there’s no security issues and so on. So um I think that um
23:22
yeah basically you we have to sort of like it’s in our interest to make the
23:28
the flow of these two go very very fast and we have to somehow keep the AI on the leash because it gets way too overreactive. It’s uh it’s kind of like
23:35
this. This is how I feel when I do AI assisted coding. If I’m just bite coding everything is nice and great but if I’m
The importance of human-AI collaboration loops
23:40
actually trying to get work done it’s not so great to have an overreactive uh agent doing all this kind of stuff. So
23:47
this slide is not very good. I’m sorry, but I guess I’m trying to develop like many of you some ways of utilizing these
23:53
agents in my coding workflow and to do AI assisted coding. And in my own work, I’m always scared to get way too big
23:59
diffs. I always go in small incremental chunks. I want to make sure that everything is good. I want to spin this
24:06
loop very very fast and um I sort of work on small chunks of single concrete thing. Uh and so I think many of you
24:13
probably are developing similar ways of working with the with LLMs. Um, I also saw a number of blog posts
24:19
that try to develop these best practices for working with LLMs. And here’s one that I read recently and I thought was
24:25
quite good. And it kind of discussed some techniques and some of them have to do with how you keep the AI on the leash. And so, as an example, if you are
24:32
prompting, if your prompt is vague, then uh the AI might not do exactly what you wanted and in that case, verification
24:38
will fail. You’re going to ask for something else. If a verification fails, then you’re going to start spinning. So it makes a lot more sense to spend a bit
24:45
more time to be more concrete in your prompts which increases the probability of successful verification and you can
24:50
move forward. And so I think a lot of us are going to end up finding um kind of techniques like this. I think in my own
24:56
work as well I’m currently interested in uh what education looks like in um together with kind of like now that we
25:01
have AI uh and LLMs what does education look like? And I think a a large amount
25:07
of thought for me goes into how we keep AI on the leash. I don’t think it just works to go to chat and be like, «Hey,
25:13
teach me physics.» I don’t think this works because the AI is like gets lost in the woods. And so for me, this is
25:18
actually two separate apps. For example, there’s an app for a teacher that creates courses and then there’s an app
25:24
that takes courses and serves them to students. And in both cases, we now have this intermediate artifact of a course
25:31
that is auditable and we can make sure it’s good. We can make sure it’s consistent. and the AI is kept on the leash with respect to a certain
25:37
syllabus, a certain like um progression of projects and so on. And so this is
25:42
one way of keeping the AI on leash and I think has a much higher likelihood of working and the AI is not getting lost
25:47
in the woods. One more kind of analogy I wanted to sort of allude to is I’m not I’m no
25:54
stranger to partial autonomy and I kind of worked on this I think for five years at Tesla and this is also a partial
Lessons from Tesla Autopilot & autonomy sliders
26:00
autonomy product and shares a lot of the features like for example right there in the instrument panel is the GUI of the
26:05
autopilot so it’s showing me what the what the neural network sees and so on and we have the autonomy slider where
26:10
over the course of my tenure there we did more and more autonomous tasks for the user and maybe the story that I
26:18
wanted to tell very briefly is uh actually the first time I drove a self-driving vehicle was in 2013 and I
26:25
had a friend who worked at Whimo and uh he offered to give me a drive around Palo Alto. I took this picture using
26:31
Google Glass at the time and many of you are so young that you might not even know what that is. Uh but uh yeah, this
26:37
was like all the rage at the time. And we got into this car and we went for about a 30-minute drive around Palo Alto
26:42
highways uh streets and so on. And this drive was perfect. There was zero interventions and this was 2013 which is
26:49
now 12 years ago. And it kind of struck me because at the time when I had this perfect drive, this perfect demo, I felt
26:56
like, wow, self-driving is imminent because this just worked. This is incredible. Um, but here we are 12 years
27:03
later and we are still working on autonomy. Um, we are still working on driving agents and even now we haven’t
27:09
actually like really solved the problem. like you may see Whimos going around and they look driverless but you know
27:14
there’s still a lot of teleoperation and a lot of human in the loop of a lot of this driving so we still haven’t even
27:20
like declared success but I think it’s definitely like going to succeed at this point but it just took a long time and
27:26
so I think like like this is software is really tricky I think in the same way
27:31
that driving is tricky and so when I see things like oh 2025 is the year of agents I get very concerned and I kind
27:38
of feel like you know this is the decade of agents and this is going to be quite
27:44
some time. We need humans in the loop. We need to do this carefully. This is software. Let’s be serious here. One
27:51
more kind of analogy that I always think through is the Iron Man suit. Uh I think
The Iron Man analogy: Augmentation vs. agents
27:56
this is I always love Iron Man. I think it’s like so um correct in a bunch of
28:01
ways with respect to technology and how it will play out. And what I love about the Iron Man suit is that it’s both an augmentation and Tony Stark can drive it
28:08
and it’s also an agent. And in some of the movies, the Iron Man suit is quite autonomous and can fly around and find Tony and all this kind of stuff. And so
28:15
this is the autonomy slider is we can be we can build augmentations or we can build agents and we kind of want to do a
28:21
bit of both. But at this stage I would say working with fallible LLMs and so on. I would say you know it’s less Iron
28:29
Man robots and more Iron Man suits that you want to build. It’s less like building flashy demos of autonomous
28:35
agents and more building partial autonomy products. And these products have custom gueies and UIUX. And we’re
28:41
trying to um and this is done so that the generation verification loop of the human is very very fast. But we are not
28:48
losing the sight of the fact that it is in principle possible to automate this work. And there should be an autonomy slider in your product. And you should
28:54
be thinking about how you can slide that autonomy slider and make your product uh sort of um more autonomous over time.
29:01
But this is kind of how I think there’s lots of opportunities in these kinds of products. I want to now switch gears a
Vibe Coding: Everyone is now a programmer
29:06
little bit and talk about one other dimension that I think is very unique. Not only is there a new type of programming language that allows for
29:12
autonomy in software but also as I mentioned it’s programmed in English which is this natural interface and
29:19
suddenly everyone is a programmer because everyone speaks natural language like English. So this is extremely
29:24
bullish and very interesting to me and also completely unprecedented. I would say it it used to be the case that you need to spend five to 10 years studying
29:31
something to be able to do something in software. this is not the case anymore. So, I don’t know if by any chance anyone
29:37
has heard of vibe coding. Uh, this this is the tweet that kind of
29:42
like introduced this, but I’m told that this is now like a major meme. Um, fun story about this is that I’ve been on
29:49
Twitter for like 15 years or something like that at this point and I still have no clue which tweet will become viral
29:56
and which tweet like fizzles and no one cares. And I thought that this tweet was going to be the latter. I don’t know. It
30:01
was just like a shower of thoughts. But this became like a total meme and I really just can’t tell. But I guess like it struck a chord and it gave a name to
30:08
something that everyone was feeling but couldn’t quite say in words. So now there’s a Wikipedia page and everything.
30:17
This is like [Applause]
30:25
yeah this is like a major contribution now or something like that. So, um, so Tom Wolf from HuggingFace shared
30:32
this beautiful video that I really love. Um, these are kids vibe coding.
30:42
And I find that this is such a wholesome video. Like, I love this video. Like, how can you look at this video and feel
30:48
bad about the future? The future is great. I think this will end up being like a
30:53
gateway drug to software development. Um, I’m not a doomer about the future of
30:59
the generation and I think yeah, I love this video. So, I tried by coding a
31:04
little bit uh as well because it’s so fun. Uh, so bike coding is so great when you want to build something super duper
31:10
custom that doesn’t appear to exist and you just want to wing it because it’s a Saturday or something like that. So, I built this uh iOS app and I don’t I
31:18
can’t actually program in Swift, but I was really shocked that I was able to build like a super basic app and I’m not going to explain it. It’s really uh
31:24
dumb, but uh I kind of like this was just like a day of work and this was running on my phone like later that day
31:30
and I was like, «Wow, this is amazing.» I didn’t have to like read through Swift for like five days or something like
31:35
that to like get started. I also vipcoded this app called Menu Genen. And this is live. You can try it in
31:41
menu.app. And I basically had this problem where I show up at a restaurant, I read through the menu, and I have no idea what any of the things are. And I
31:48
need pictures. So this doesn’t exist. So I was like, «Hey, I’m going to bite code it.» So, um, this is what it looks like.
31:55
You go to menu.app, um, and, uh, you take a picture of a of
32:01
a menu and then menu generates the images and everyone gets $5 in credits for free when you sign up. And
32:08
therefore, this is a major cost center in my life. So, this is a negative
32:13
negative uh, revenue app for me right now. I’ve lost a huge amount of money on
32:19
menu. Okay. But the fascinating thing about menu genen for me is that the code of
32:28
the v the vite coding part the code was actually the easy part of v of v coding menu and most of it actually was when I
32:35
tried to make it real so that you can actually have authentication and payments and the domain name and averal deployment. This was really hard and all
32:41
of this was not code. All of this devops stuff was in me in the browser clicking
32:47
stuff and this was extreme slo and took another week. So it was really fascinating that I had the menu genen um
32:54
basically demo working on my laptop in a few hours and then it took me a week because I was trying to make it real and
33:01
the reason for this is this was just really annoying. Um, so for example, if you try to add Google login to your web
33:07
page, I know this is very small, but just a huge amount of instructions of this clerk library telling me how to
33:13
integrate this. And this is crazy. Like it’s telling me go to this URL, click on this dropdown, choose this, go to this,
33:19
and click on that. And it’s like telling me what to do. Like a computer is telling me the actions I should be
33:24
taking. Like you do it. Why am I doing this? What the hell?
33:31
I had to follow all these instructions. This was crazy. So I think the last part of my talk therefore focuses on can we
Building for agents: Future-ready digital infrastructure
33:39
just build for agents? I don’t want to do this work. Can agents do this? Thank you.
33:46
Okay. So roughly speaking, I think there’s a new category of consumer and manipulator of digital information. It
33:53
used to be just humans through GUIs or computers through APIs. And now we have a completely new thing and agents are
34:00
they’re computers but they are humanlike kind of right they’re people spirits there’s people spirits on the internet
34:05
and they need to interact with our software infrastructure like can we build for them it’s a new thing so as an
34:10
example you can have robots.txt on your domain and you can instruct uh or like advise I suppose um uh web crawlers on
34:18
how to behave on your website in the same way you can have maybe lm.txt txt file which is just a simple markdown
34:23
that’s telling LLMs what this domain is about and this is very readable to a to an LLM. If it had to instead get the
34:30
HTML of your web page and try to parse it, this is very errorprone and difficult and will screw it up and it’s
34:35
not going to work. So we can just directly speak to the LLM. It’s worth it. Um a huge amount of documentation is
34:41
currently written for people. So you will see things like lists and bold and pictures and this is not directly
34:47
accessible by an LLM. So I see some of the services now are transitioning a lot
34:52
of the their docs to be specifically for LLMs. So Versell and Stripe as an example are early movers here but there
34:59
are a few more that I’ve seen already and they offer their documentation in markdown. Markdown is super easy for LMS
35:06
to understand. This is great. Um maybe one simple example from from uh my
35:12
experience as well. Maybe some of you know three blue one brown. He makes beautiful animation videos on YouTube.
35:19
[Applause] Yeah, I love this library. So that he
35:25
wrote uh Manon and I wanted to make my own and uh there’s extensive
35:30
documentations on how to use manon and so I didn’t want to actually read through it. So I copy pasted the whole
35:35
thing to an LLM and I described what I wanted and it just worked out of the box like LLM just bcoded me an animation
35:41
exactly what I wanted and I was like wow this is amazing. So if we can make docs legible to LLMs, it’s going to unlock a
35:48
huge amount of um kind of use and um I think this is wonderful and should should happen more. The other thing I
35:55
wanted to point out is that you do unfortunately have to it’s not just about taking your docs and making them appear in markdown. That’s the easy
36:00
part. We actually have to change the docs because anytime your docs say click this is bad. An LLM will not be able to
36:06
natively take this action right now. So, Verscell, for example, is replacing every occurrence of click with an
36:13
equivalent curl command that your LM agent could take on your behalf. Um, and so I think this is very interesting. And
36:19
then, of course, there’s a model context protocol from Enthropic. And this is also another way, it’s a protocol of
36:24
speaking directly to agents as this new consumer and manipulator of digital information. So, I’m very bullish on these ideas. The other thing I really
36:31
like is a number of little tools here and there that are helping ingest data
36:36
that in like very LLM friendly formats. So for example, when I go to a GitHub repo like my nanoGPT repo, I can’t feed
36:42
this to an LLM and ask questions about it uh because it’s you know this is a human interface on GitHub. So when you
36:48
just change the URL from GitHub to get ingest then uh this will actually concatenate all the files into a single
36:54
giant text and it will create a directory structure etc. And this is ready to be copy pasted into your favorite LLM and you can do stuff. Maybe
37:01
even more dramatic example of this is deep wiki where it’s not just the raw content of these files. uh this is from
37:08
Devon but also like they have Devon basically do analysis of the GitHub repo and Devon basically builds up a whole
37:14
docs uh pages just for your repo and you can imagine that this is even more
37:19
helpful to copy paste into your LLM. So I love all the little tools that basically where you just change the URL
37:24
and it makes something accessible to an LLM. So this is all well and great and u I think there should be a lot more of
37:30
it. One more note I wanted to make is that it is absolutely possible that in the future LLMs will be able to this is
37:38
not even future this is today they’ll be able to go around and they’ll be able to click stuff and so on but I still think it’s very worth u basically meeting LLM
37:46
halfway LLM’s halfway and making it easier for them to access all this information uh because this is still
37:51
fairly expensive I would say to use and uh a lot more difficult and so I do think that lots of software there will
37:58
be a long tail where it won’t like adapt apps because these are not like live player sort of repositories or digital
38:04
infrastructure and we will need these tools. Uh but I think for everyone else I think it’s very worth kind of like
38:09
meeting in some middle point. So I’m bullish on both if that makes sense. So in summary, what an amazing time to
Summary: We’re in the 1960s of LLMs — time to build
38:17
get into the industry. We need to rewrite a ton of code. A ton of code will be written by professionals and by
38:23
coders. These LLMs are kind of like utilities, kind of like fabs, but they’re kind of especially like
38:28
operating systems. But it’s so early. It’s like 1960s of operating systems and
38:34
uh and I think a lot of the analogies cross over. Um and these LMS are kind of like these fallible uh you know people
38:41
spirits that we have to learn to work with. And in order to do that properly, we need to adjust our infrastructure
38:47
towards it. So when you’re building these LLM apps, I describe some of the ways of working effectively with these
38:52
LLMs and some of the tools that make that uh kind of possible and how you can spin this loop very very quickly and
38:59
basically create partial tunneling products and then um yeah, a lot of code has to also be written for the agents
39:04
more directly. But in any case, going back to the Iron Man suit analogy, I think what we’ll see over the next
39:10
decade roughly is we’re going to take the slider from left to right. And I’m
39:15
very interesting. It’s going to be very interesting to see what that looks like. And I can’t wait to build it with all of
39:21
you. Thank you.

таймкоды

В этом видео

Похожие записи