Why Neural Networks can learn (almost) anything

This video is about neural networks and how they can be used to learn functions.

The video starts with an introduction to neural networks and what they are. It then goes on to explain how neural networks can be used to learn functions. The video uses the example of the Mandelbrot set to illustrate this concept.

The video also discusses some of the limitations of neural networks, such as the fact that they cannot learn an infinite number of functions and that they require data to learn from. However, the video concludes by saying that neural networks are a powerful tool that can be used to learn a wide variety of functions.

Here are the key points from the video:

Neural networks are a type of artificial intelligence that can learn functions.
Neural networks can be used to learn complex functions, such as the Mandelbrot set.
Neural networks have some limitations, such as the fact that they cannot learn an infinite number of functions and that they require data to learn from.
Neural networks are a powerful tool that can be used to learn a wide variety of functions.

https://gemini.google.com

https://translate.google.com

Расшифровка видео

Intro
0:00
you are currently watching an artificial
0:02
neural network learn
0:04
in particular it’s learning the shape of
0:06
an infinitely complex fractal known as
0:09
the mandelbrot set
0:10
this is what that set looks like
0:12
complexity all the way down
0:15
now in order to understand how a neural
0:16
network can learn the mandelbrot set
0:18
really how it can learn anything at all
0:21
we will need to start with a fundamental
0:24
mathematical concept
0:26
what is a function
Functions
0:28
informally a function is just a system
0:31
of inputs and outputs numbers in numbers
0:34
out
0:35
in this case you input an x and it
0:37
outputs a y
0:38
you can plot all of a function’s x and y
0:41
values in a graph where it draws out a
0:43
line what is important is that if you
0:45
know the function you can always
0:47
calculate the correct output y given any
0:50
input x
0:51
but say we don’t know the function and
0:54
instead only know some of its x and y
0:56
values we know the inputs and outputs
0:59
but we don’t know the function used to
1:01
produce them
1:03
is there a way to reverse engineer that
1:06
function that produced this data
1:08
if we could construct such a function we
1:11
could use it to calculate a y value
1:13
given an x value that is not in our
1:15
original data set this would work even
1:17
if there was a little bit of noise in
1:19
our data a little randomness we can
1:21
still capture the overall pattern of the
1:23
data and continue producing y values
1:25
that aren’t perfect but close enough to
1:27
be useful what we need is a function
1:30
approximation and more generally a
1:33
function approximator
1:35
that is what a neural network is
1:37
this is an online tool for visualizing
1:39
neural networks and i’ll link it in the
1:42
description below this particular
1:44
network takes two inputs x1 and x2 and
1:46
produces one output technically this
1:49
function would create a
1:50
three-dimensional surface but it’s
1:52
easier to visualize in two dimensions
1:54
this image is rendered by passing the x
1:56
y coordinate of each pixel into the
1:58
network which then produces a value
2:00
between negative one and one that is
2:02
used as the pixel value these points are
2:05
our data set and are used to train the
2:07
network when we begin training it
2:09
quickly constructs a shape that
2:10
accurately distinguishes between blue
2:12
and orange points building a decision
2:15
boundary that separates them it is
2:17
approximating the function that
2:19
describes the data it’s learning and is
2:21
capable of learning the different data
2:23
sets that we throw at it
2:25
so what is this middle section then well
2:27
as the name implies this is the network
2:30
of neurons each one of these nodes is a
Neurons
2:32
neuron which takes in all the inputs
2:34
from the previous layer of neurons and
2:36
produces one output which is then fed to
2:38
the next layer
2:40
inputs and outputs sounds like we’re
2:42
dealing with a function
2:43
indeed a neuron itself is just a
2:46
function one that can take any number of
2:48
inputs and has one output each input is
2:51
multiplied by a weight and all are added
2:53
together along with bias the weights and
2:56
bias make up the parameters of this
2:58
neuron values that can change as the
3:00
network learns
3:01
to keep it easy to visualize we’ll
3:03
simplify this down to a two-dimensional
3:05
function with only one input and one
3:07
output
3:09
now neurons are our building blocks of
3:11
the larger network building blocks that
3:13
can be stretched and squeezed and
3:15
shifted around and ultimately work with
3:17
other blocks to construct something
3:19
larger than themselves the neuron as
3:22
we’ve defined it here works like a
3:23
building block it is actually an
3:25
extremely simple linear function one
3:27
which forms a flat line or plane when
3:30
there’s more than one input
3:31
with the two parameters the weight and
3:33
bias we can stretch and squeeze and move
3:35
our function up and down and left and
3:37
right
3:38
as such we should be able to combine it
3:40
with other neurons to form a more
3:42
complicated function one built from lots
3:45
of linear functions
3:47
so let’s start with a target function
3:49
one we want to approximate i’ve
3:51
hard-coded a bunch of neurons whose
3:53
parameters were found manually and if we
3:55
weight each one and add them up as would
3:57
happen in the final neuron of the
3:59
network we should get a function that
4:01
looks like the target function
4:04
well that didn’t work at all what
4:06
happened
4:07
well if we simplify our equation
4:08
distributing weights and combining like
4:10
terms we end up with a single linear
4:13
function
4:14
turns out linear functions can only
4:16
combine to make one linear function this
4:19
is a big problem because we need to make
4:21
something more complicated than just a
4:23
line we need something that is not
Activation Functions
4:25
linear a non-linearity
4:28
in our case we will be using a relu a
4:31
rectified linear unit we use it as our
4:33
activation function meaning we simply
4:36
apply it to our previous naive neuron
4:38
this is about as close as you can get to
4:40
a linear function without actually being
4:42
one and we can tune it with the same
4:44
parameters as before
4:46
however you may notice that we can’t
4:47
actually lift the function off of the
4:49
x-axis which seems like a pretty big
4:51
limitation
4:52
well let’s give it a shot anyway and see
4:54
if it performs any better than our
4:56
previous attempt
4:57
we’re still trying to approximate the
4:58
same function and we’re using the same
5:00
weights and biases as before but this
5:02
time we’re using a value as our
5:04
activation function
5:06
and just like that the approximation
5:07
looks way better unlike before our
5:10
function cannot simplify down to a flat
5:12
linear function if we add the neurons
5:15
one by one we can see the simple value
5:17
functions building on one another and
5:19
the inability for one neuron to lift
5:21
itself off the x-axis doesn’t seem to be
5:23
a problem many neurons working together
5:25
overcome the limitations of individual
5:28
neurons
5:29
now i manually found these weights and
5:31
biases but how would you find them
5:33
automatically the most common algorithm
5:35
for this is called back propagation and
5:37
is in fact what we’re watching when we
5:39
run this program
5:40
it essentially tweaks and tunes the
5:42
parameters of the network bit by bit to
5:44
improve the approximation and the
5:46
intricacies of this algorithm are really
5:48
beyond the scope of this video i’ll link
5:50
some better explanations in the
5:51
description
5:53
now we can see how this shape is formed
5:55
and why it looks like it’s made up of
5:56
sort of sharp linear edges it’s the
5:59
nature of the activation function we’re
6:00
using
6:01
we can also see why if we use no
6:04
activation function at all the network
6:06
utterly fails to learn we need those
6:09
non-linearities
6:11
so what if we try learning a more
6:12
complicated data set like this spiral
6:14
let’s give it a go
6:17
seems to be struggling a little bit to
6:19
capture the pattern no problem if we
6:21
need a more complicated function we can
6:23
add more building blocks more neurons
6:25
and layers of neurons and the network
6:27
should be able to piece together a
6:29
better approximation something that
6:30
really captures the spiral
6:35
it seems to be working
NNs can learn anything
6:37
in fact no matter what the data set is
6:39
we can learn it that is because neural
6:41
networks can be rigorously proven to be
6:44
universal function approximators they
6:46
can approximate any function to any
6:49
degree of precision you could ever want
6:51
you can always add more neurons
6:54
this is essentially the whole point of
6:56
deep learning because it means that
6:58
neural networks can approximate anything
7:00
that can be expressed as a function a
7:02
system of inputs and outputs
7:05
this is an extremely general way of
7:07
thinking about the world the mandelbrot
7:10
set for instance can be written as a
7:12
function and learned all the same this
7:14
is just a scaled-up version of the
7:15
experiment we were just looking at but
7:18
with an infinitely complex data set
7:20
we don’t even really need to know what
7:22
the manual brought set is the network
7:24
learns it for us and that’s kind of the
7:26
point if you can express any intelligent
7:29
behavior any process any task as a
7:32
function then a network can learn it for
7:34
instance your input could be an image
7:36
and your output a label as to whether
7:38
it’s a cat or a dog or your input could
7:41
be text in english and your output a
7:43
translation to spanish you just need to
7:45
be able to encode your inputs and
7:47
outputs as numbers but computers do this
7:49
all the time images video text audio
7:52
they can all be represented as numbers
7:54
and any processing you may want to do
7:56
with this data so long as you can write
7:58
it as a function can be emulated with a
8:00
neural network
8:02
it goes deeper than this though under a
8:03
few more assumptions neural networks are
8:06
provably turing complete meaning they
8:08
can solve all of the same kinds of
8:10
problems that any computer can solve
8:13
an implication of this is that any
8:15
algorithm written in any programming
8:17
language can be simulated on a neural
8:19
network but rather than being manually
8:22
written by a human it can be learned
8:24
automatically with a function
8:25
approximator
8:27
neural networks can learn anything
NNs can’t learn anything
8:32
okay that is not true first off you
8:35
can’t have an infinite number of neurons
8:38
there are practical limitations on
8:40
network size and what can be modeled in
8:42
the real world
8:44
i’ve also ignored the learning process
8:46
in this video and just assumed that you
8:48
can find the optimal parameters
8:50
magically how you realistically do this
8:53
introduces its own constraints on what
8:55
can be learned
8:57
additionally in order for neural
8:58
networks to approximate a function you
9:01
need the data that actually describes
9:03
that function if you don’t have enough
9:05
data your approximation will be all
9:08
wrong it doesn’t matter how many neurons
9:10
you have or how sophisticated your
9:11
network is you just have no idea what
9:13
your actual function should look like
9:16
it also doesn’t make a lot of sense to
9:18
use a function approximator when you
9:20
already know the function you wouldn’t
9:22
build a huge neural network to say learn
9:24
the mandelbrot set when you can just
9:26
write three lines of code to generate it
9:28
unless of course you want to make a cool
9:30
background visual for a youtube video
9:32
there are countless other issues that
9:34
have to be considered but for all these
but they can learn a lot
9:36
complications neural networks have
9:38
proven themselves to be indispensable
9:40
for a number of really rather famously
9:42
difficult problems for computers
9:44
usually these problems require a certain
9:47
level of intuition and fuzzy logic that
9:49
computers generally lack and are very
9:51
difficult for us to manually write
9:53
programs to solve
9:54
fields like computer vision natural
9:56
language processing and other areas of
9:58
machine learning have been utterly
10:00
transformed by neural networks
10:03
and this is all because of the humble
10:04
function a simple yet powerful way to
10:07
think about the world and by combining
10:10
simple computations we can get computers
10:12
to construct any function we could ever
10:15
want
10:15
neural networks can learn almost
10:18
anything
10:19
[Music]

Поделиться: