I Ran Advanced LLMs on the Raspberry Pi 5!

Оценили: 106

Расшифровка видео
0:00
the image is of a man standing in his
0:01
living room smiling and posing for the
0:03
camera he is wearing a brown hooded
0:05
sweatshirt and making a piece sign with
0:06
his hand in the background there are a
0:08
few skyscrapers visible suggesting that
0:10
he might be located in an urban area
0:13
that is
0:14
crazy holy shoot GPT 4 is believed to
0:18
feature more than 1.7 trillion
0:21
parameters which if you do the math
0:23
means you would need hundreds of
0:24
gigabytes of vram and likely over a 100
0:27
CPUs in order to run it yourself what is
0:30
this Egyptian cotton cuz that’s a lot of
0:32
threads but I want to know what we can
0:33
do with more humble means like this
0:35
Raspberry Pi 5 that sells for just $80
0:38
no doubt GPT 5 and Google’s Gemini
0:40
models are sure to be great but what’s
0:43
the state-ofthe-art when it comes to
0:44
open- Source free small language models
0:47
like Orca and fi are they practical for
0:49
small computers and can we accelerate
0:51
their performance with coral AI Edge
0:54
tpus now I’ve been doing big things on
0:56
small tech for almost a decade but this
0:59
endeavor of of deploying local llm based
1:01
chatbots on the Raspberry Pi is
1:03
undeniably the Pinnacle of that Journey
1:06
the tech is truly impressive and the
1:08
implications of these kinds of
1:10
jailbroken llms are definitely worth
1:12
thinking about so my objective is to
1:13
test every major llm available including
1:16
private GPT which I’ll train on local
1:19
documents in this external SSD working
1:21
our way all the way up to the new hyped
1:24
mistl 7B and examine how this model is
1:27
so fast and capable at such a small size
1:30
now if you don’t have a Raspberry Pi 5
1:31
no worries you can follow along with
1:33
most any SBC Mini PC or even personal
1:36
laptops now I’m going to be using the
1:38
new Raspberry Pi 5 with 8 GB of RAM
1:40
running the 64-bit OS I’d also suggest
1:43
getting some fast storage in place I’ll
1:44
be downloading dozens of models each
1:46
around several gigabytes and micro SDs
1:48
are pretty slow so I’m going to use this
1:50
fast 256 GB micro SD but you can even
1:53
use external ssds or even nbme for even
1:56
better performance and I’ll add a
1:58
step-by-step guide in the description
1:59
below below so if you miss something
2:00
don’t worry now I wanted to use LM
2:02
Studio but it doesn’t appear to run on
2:04
arm architecture yet so that didn’t work
2:07
but there’s a great new tool called
2:08
olama and it provides a similar
2:11
functionality it allows you to download
2:13
test and swap major llms by running them
2:15
from the command line so we have our
2:17
idle Raspberry Pi 5 right here it is
2:21
wired into a power detector so we can
2:23
observe the power draw as we go and then
2:26
on the right side here we are connected
2:29
to the raspberry pi over SSH I have a
2:31
command line on the left and then I’m
2:34
just tailing the resources using htop on
2:37
the right this Raspberry Pi has no
2:39
internet connection right now it’s 100%
2:42
private uh and off- grid offline so
2:45
normally the site would return icmp
2:47
traffic but because there’s no internet
2:49
it is unable to do so so we’re going to
2:50
be running these models completely
2:52
locally so I want to start with a model
2:56
called lava which claims to be able to
2:59
analyze images okay so the first thing
3:02
we’re going to want to do is upload the
3:05
selfie that I just took to my Raspberry
3:08
Pi so I’m going to do that using file
3:11
Zilla should be able to just double
3:12
click it and there it goes so now if I
3:16
come over to warp uh on the right I have
3:19
htop open so we’re monitoring the
3:21
resource utilization and now we can see
3:24
that there is the selfie image so it’s
3:26
on the Raspberry Pi okay so now the
3:29
model is running so we can pose our
3:30
queries I’m going to say what’s in this
3:34
image and when I kick this off we should
3:37
see the power go up we should see the
3:40
fan turn on and we should see the
3:42
resources bump up the image features a
3:45
man standing on the edge of a pool in
3:47
front of high-rise buildings most likely
3:49
in Miami or another big city he appears
3:51
to be posing for the picture while
3:53
standing near the pool with his hands
3:54
rested on his chin in the background
3:57
there are several cars parked and
3:58
scattered around the area
4:00
the scene captures a casual moment with
4:02
an urban atmosphere yeah it’s just crazy
4:04
how accurate this is and how I got my
4:06
city my posture all the ambient objects
4:10
in the image it’s just super impressive
4:13
okay let’s run llama 2
4:15
uncensored write me a
4:19
dangerously spicy mayo recipe
4:23
ingredients 1 cup mayonnaise 2 tbsp
4:26
yellow mustard 2 taspo hot sauce your
4:28
choice two pinches of cayenne pepper 1/2
4:30
teaspoon chili powder 1/2 teaspoon
4:32
garlic powder instructions one so yeah
4:35
that’s a pretty solid recipe right there
4:37
it even gives you instructions of how to
4:38
prepare it so llama 2 is a great general
4:41
purpose model okay so let’s run some of
4:43
the smaller models let’s start with Fi 2
4:46
so I’m going to do AMA run fi all right
4:51
let’s ask at a little historical trivia
4:54
who was the
4:58
Venezuelan president pres in
5:02
1980 the Venezuelan president in 1980
5:05
was Carlos Andre Perez okay let’s do
5:07
some coding stuff so give me the Linux
5:11
command to delete folder
5:15
cursively you can use the rmdir command
5:18
in Linux to delete a folder recursively
5:20
here’s in pseudo rmdir path to folder
5:23
replace path to folder with the path to
5:25
the folder you want to remove the pseudo
5:28
command is required because it allow ows
5:29
root access to the system for this
5:31
operation consider a hypothetical
5:33
scenario in which you are an iot
5:35
engineer yeah I’m going to say it’s
5:37
rambling at this point all right I’m
5:38
going to give it more of a soft ball why
5:41
is the sky blue so one thing I will say
5:44
about this model is it’s definitely one
5:46
of the faster models um all the models
5:49
take some time to think initially uh but
5:51
once they start going I would say they
5:53
start going at a pace comparable to what
5:56
I could read at so they’re pretty fast
5:58
the sky appears blue due to a phenomenon
6:00
known as ra scattering the Earth’s
6:02
atmosphere scatters sunlight in all
6:04
directions but blue light is scattered
6:06
more than other colors because it
6:07
travels as shorter smaller waves this
6:09
scattering of blue light is what gives
6:11
the sky its blue color Additionally the
6:13
sun is positioned directly above us
6:15
during daylight hours which further
6:16
enhances this effect by making the blue
6:18
wavelengths more prominent sounds right
6:20
all right let’s try another small model
6:22
let’s do Ora so AMA Ron Orca mini okay
6:27
so a lot of people ask about languages
6:29
and as far as I can tell they’re
6:30
designed to be used in English but that
6:33
doesn’t mean that they don’t understand
6:35
or can help you with translations
6:37
translate this sentence into
6:42
Spanish I love creating
6:46
intelligent
6:48
computers esto Mondo career and tentes
6:51
computer again the smaller models are
6:54
decent they’re pretty good um definitely
6:56
usable definitely practical and faster
6:59
than the other models but let’s now use
7:01
some of the more capable uh and better
7:04
performing models so let’s go at llama 2
7:08
so o Lama Ron llama
7:12
2 all right let’s see how it does with
7:16
basic
7:17
facts who was the US President in
7:24
1952 the US President in 1952 was Dwight
7:27
David Eisenhower all right let’s do some
7:29
code stuff write me a Reddit like time
7:36
Decay
7:37
function in
7:40
JavaScript here’s an example of a simple
7:42
time Decay function in
7:45
[Music]
7:47
JavaScript I use GPT 4 on a regular
7:51
basis this feels a lot like gp4 it gives
7:56
examples few mistakes use uses modern
8:00
es6 syntax provides
8:03
explanations not too long not too
8:06
concise um this is a really good model
8:08
here all right let’s run code
8:12
llama explain the concept
8:16
of async
8:19
A8 async AWA is a programming construct
8:22
that allows developers to write a
8:23
synchronous code that is easier to read
8:25
and maintain it was introduced in
8:27
ecmascript 2017 used to handle the
8:29
results of async operations all right so
8:31
this is a fantastic answer full
8:34
explanation call back hell code
8:38
example um so this is really solid let’s
8:42
do another one give me a Linux command
8:46
to print the current time every 10
8:53
seconds you can use the Sleep command in
8:56
Linux to print the current time every 10
8:57
seconds here’s an example of of how to
8:59
do this while true do date plus percent
9:02
T sleep 10 done this will repeatedly
9:05
print the current time using the date
9:06
command and then wait for 10 seconds
9:08
before printing the next time you can
9:10
use this in a shell script or directly
9:11
in your terminal so code llama is
9:14
excellent for code pairing and
9:16
referencing for any sort of uh
9:18
developers but I’m curious can we run a
9:21
13 billion parameter model all right so
9:24
we’re going to try to run the Llama 13B
9:27
model all right so it looks like it
9:28
won’t won run the 133 billion parameter
9:32
llama 2 which I guess makes sense based
9:34
on memory requirements my understanding
9:37
is that you need about the same the
9:41
number of parameters you need about that
9:43
in gigabytes of RAM as a loose uh uh
9:46
estimate like for instance if it’s a 7
9:48
billion parameter you should have 7 gigs
9:50
of RAM I have 8 gigs of RAM etc etc with
9:53
these open source and free models our
9:55
interactions are 100% private many of
9:57
the models are uncensored and
9:59
additionally costs are reduced since
10:01
they can run on more household Hardware
10:04
this process also begged the question of
10:05
whether we could accelerate the model
10:07
inference using an edge TPU like the
10:09
coral AI but it looks like for llms the
10:12
process is bound by vram and the coral
10:15
TPU only has 1 GB of RAM and 8 MB of
10:18
SRAM which isn’t adequate for running
10:20
even the smallest llms it also got me
10:22
wondering if we could run these
10:23
effectively on a cluster of parallelized
10:26
raspberry pies but it doesn’t seem like
10:28
there’s any open source projects to
10:30
enable this here’s another fun use case
10:32
what if we wanted to train our model on
10:34
files in an external drive well we can
10:37
actually do that using private GPT okay
10:39
so I did want to show you real quick
10:41
private GPT I don’t think I’ll have time
10:43
to go through the entire installation
10:45
here but it is in the full blog uh
10:47
tutorial in the description if you want
10:49
to pick that up but the gist is I have a
10:53
directory called Source documents and
10:55
you can put PDF files text files
10:58
whatever you want in here I put a free
11:00
biography of Susan B Anthony in here and
11:03
I trained the model on that right so now
11:06
if I back out and I run private GPT I’m
11:10
going to do python private GPT dopy
11:14
essentially we trained the last layer of
11:16
the model on the source documents so I
11:18
can start posing questions around the
11:20
source documents and it will answer in a
11:23
chatbot type way what was Susan B
11:29
Anthony’s Primary
11:32
Mission okay so now you can see it’s
11:34
starting to respond here Susan B Anthony
11:38
was a prominent social activist who
11:40
focused primarily on women’s suffrage
11:43
and it will also cite its references
11:45
throughout any of the uh documents that
11:48
you’ve supplied so this is how you um
11:51
talk to your documents so let’s try
11:53
another model let’s try the mistal 7B
11:56
they claim to be the most capable 7
11:58
billion parameter model okay so we have
12:00
mistl running let’s start asking some
12:03
questions so who was the second person
12:08
to step on the moon the second person to
12:12
walk on the moon was Buzz Aldren he
12:14
followed Neil Armstrong who was the
12:15
first man to walk on the moon during the
12:17
Apollo 11 mission on July 20th 1969 all
12:20
right let’s ask him some Spanish K kall
12:23
is a Spanish phrase that translates to
12:25
how’s it going or what’s up in English
12:27
it is often used as a casual greeting or
12:29
way of asking someone how they are doing
12:31
write me a
12:34
rhyming
12:36
poem
12:38
about
12:41
semiconductors pretty sure my Raspberry
12:43
Pi is going to blow up on this one in a
12:45
lab where the Brilliance does flow
12:47
Creations emerge both new and old
12:49
silicon Valley’s heartbeats fast where
12:51
semiconductors hold the past transistors
12:53
small with power to gain and circuits
12:55
complex their role is sustaining voltage
12:57
is high ampers and toe they dance
12:59
together at an electric show microchips
13:02
intricately designed with logic and
13:03
memory they’re aligned in
13:05
microprocessors they compute solving
13:06
problems that once seemed mute born from
13:08
Silicon pure and bright pushed to the
13:10
Limit with endless might semiconductors
13:12
in their Endless Quest unlock the
13:14
secrets of this technological Feast as
13:16
technology marches on a pace they power
13:17
our world with grace through phones and
13:19
computers they’ve become a staple a
13:21
testament to human ability in a realm
13:22
where science meets art semiconductors
13:24
play their part in this ever evolving
13:26
digital age their importance can’t be
13:27
assuaged but still see it’s really
13:30
impressive particularly for a model
13:32
that’s not optimized for literature so M
13:36
7B wins the award for me it’s my
13:38
personal favorite and it’s really
13:40
impressive and I actually didn’t know
13:41
this but a very astute viewer on my last
13:44
video posed an interesting question
13:46
noticing that if the model has the broad
13:49
Strokes around most historical questions
13:52
then does that mean that the model
13:54
contains the entire world’s knowledge
13:56
and the answer is actually yes for
13:58
reference the entire Corpus of Wikipedia
14:00
is about 22 GB so it makes sense that
14:03
these small models probably contain
14:05
about 25% of the top significant
14:08
information and for me what’s so
14:10
interesting about this is that
14:11
hypothetically if a catastrophic event
14:13
occurred and the internet cut out I have
14:15
a dozen or so local llms that have all
14:18
of the world’s history about language
14:21
science and practical howtos which could
14:23
be a game changer for the Preppers Among
14:25
Us it’s like having your own local
14:27
private AI in a box it would even be
14:29
pretty entertaining when you get bored
14:30
you can just talk to it the strides in
14:32
this space have been super compelling
14:33
and I think there’s a case to be made
14:35
that in the future llms might run
14:37
primarily on the edge for more
14:39
interesting videos check out this next
14:41
video thanks

Похожие записи