*https://www.youtube.com/watch?v=rmBFaNgg4wk
**https://300.ya.ru/summary
пересказ видео
Начало войны и работа американских ученых
- В 1939 году Европа была в войне, США ждали и наблюдали.
- Американские ученые и инженеры работали над идеями для войны и будущего.
Клод Шеннон и его теория информации
- Клод Шеннон работал над шифрованием речи в Bell Labs.
- Его идея о теории информации изменила мышление о информации.
- Теория Шеннона опиралась на работы предшественников, включая Найквиста и Хартли.
Оптический семафор Клода Чаппе
- В 1794 году французы использовали воздушный шар и оптический семафор для разведки.
- Семафор Чаппе мог передавать информацию мгновенно на большие расстояния.
- Система использовала 92 символа и кодовую книгу для передачи сообщений.
Открытие Ханса Кристиана Эрстеда
- В 1820 году Эрстед обнаружил связь между электричеством и магнетизмом.
- Это открытие заложило основу для новой эры общения.
Телеграфная система Сэмюэла Морзе
- В 1832 году Морзе узнал о возможностях электричества для передачи информации.
- В 1838 году он продемонстрировал первую телеграфную систему в Морристауне.
- Морс и Вейл использовали точки и тире для кодирования букв и цифр.
Конкуренция и усовершенствования
- В Великобритании компания Cook and Wheatstone также разработала телеграфную систему.
- Вейл использовал статистику английского языка для оптимизации кодов.
- В 1844 году Морс и Вейл продемонстрировали телеграф в Вашингтоне.
Автоматизация телеграфной связи
- В 1876 году система Морзе доминировала в мировой телеграфии.
- Эмиль Бордо изобрел Бордосский код для автоматизации телеграфной связи.
- Бордоский код использовал фиксированный 5-битный код для каждой буквы и цифры.
Теория информации
- В 1927 году Ральф Хартли представил доклад о количественной оценке информации.
- Хартли определил коммуникативную проблему как группу физических символов, передающих значение.
- Хартли задумался о количестве последовательностей символов и их влиянии на передачу информации.
Идея Хартли
- Количество информации зависит от общего числа возможных сообщений.
- Хартли предложил использовать логарифм для оценки информации.
Работа Найквиста
- Найквист связал дискретные символы с аналоговыми сигналами.
- Он установил ограничения на пропускную способность канала.
Вклад Шеннона
- Шеннон применил булеву логику к электрическим цепям.
- Он предложил использовать логарифм с базой 2 для измерения информации (бит).
Модель коммуникации Шеннона
- Коммуникационная система состоит из источника, передатчика, канала, приемника и получателя.
- Шеннон разделил информацию и канал, что стало основой для новых открытий.
Пропускная способность канала
- Пропускная способность зависит от количества возможных последовательностей символов.
- Шеннон показал, как рассчитать пропускную способность различных систем.
Источник информации и энтропия
- Источник информации имеет статистическое поведение и генерирует символы с определенной вероятностью.
- Шеннон предложил измерять информацию по мере устранения неопределенности, что привело к концепции информационной энтропии.
Энтропия и информация
- Энтропия измеряет неопределенность и неожиданность выбора символа.
- Чем меньше вероятность выбора символа, тем больше энтропия.
- Энтропия максимальна, когда вероятности всех исходов равны.
Кодирование и энтропия
- Энтропия устанавливает нижнюю границу эффективности кодирования информации.
- Использование кода переменной длины позволяет достичь оптимальной схемы кодирования.
- В среднем количество битов на символ приближается к энтропии источника.
Сжатие информации
- Шеннон доказал, что информацию можно сжать до уровня её энтропии.
- Сжатие ниже энтропии приводит к потере информации.
- Теория Шеннона применима к любым источникам информации, включая сложные, такие как английский язык.
Шум и условная энтропия
- Практические каналы зашумлены, что приводит к неопределённости в передаче информации.
- Шеннон ввёл понятие условной энтропии для учёта шума.
- Условная энтропия представляет недостающую информацию из-за шума и корректируется через канал коррекции.
Влияние шума на передачу информации
- На шумной вечеринке можно сузить поиск слова до восьми возможных вариантов из десяти.
- Шум создает неопределенность, что затрудняет точную передачу сообщений.
Проблемы с зашумленными каналами
- В зашумленных каналах получатель не может быть уверен в переданном сообщении.
- Туманные условия усложняют передачу сигналов по семафору.
Решение Шеннона
- Шеннон заявил, что при правильном кодировании можно достичь максимальной пропускной способности с ошибками, близкими к нулю.
- Он предложил использовать кодовую книгу для сопоставления сообщений с канальными последовательностями.
Снижение скорости передачи
- Шеннон снизил скорость передачи информации от источника к каналу, чтобы уменьшить количество последовательностей.
- Это позволило увеличить количество доступных выходных последовательностей канала и снизить вероятность ошибок.
Случайное распределение сообщений
- Шеннон случайным образом распределил сообщения по последовательностям каналов.
- При наличии помех вероятность правильной интерпретации сообщения остается высокой при достаточной длине блока.
Энтропия и непрерывные сигналы
- Шеннон распространил понятие энтропии на непрерывные сигналы.
- Он предложил способ расчета пропускной способности непрерывных каналов.
Теорема Хартли-Шеннона
- Теорема объединила идеи Найквиста-Хартли и Шеннона.
- Получено уравнение, определяющее пропускную способность канала, ограниченную мощностью и искаженную шумом.
In this video
0:02
1939, Europe has descended into total war. Across the Atlantic, the strategy
0:10
in the United States is to wait and see. But behind the scenes, American
0:16
scientists and engineers were working on ideas that would not only help win the war, but they would change the course of
0:24
history. At Bell Labs, a young engineer fresh out of MIT, Claude Shannon, spent
0:31
his days finding ways to encrypt speech to keep it hidden from the enemy. But at
0:36
night, he was working on his own idea, one that would go far beyond the war
0:43
effort. It was an idea that would revolutionize the very way we think
0:48
about information. But Shannon’s theory had its foundations in the work of earlier pioneers
0:55
including his colleagues at Bell Labs, Nyquist and Hartley and even further in
1:01
the intellectual revolutions of the Renaissance and industrial age.
1:11
[Music]
1:49
It is the summer of 1794. Two mighty armies are about to
1:54
clash near the town of Larus. On one side, the Austro Dutch
2:00
coalition stand for monarchy and the preservation of the aristocracy. On the other side, the
2:07
revolutionary French army see themselves as liberators, spreading democracy and
2:13
liberty. The technology of warfare saw two major innovations that day. The first, the
2:20
invading French army used a balloon for aerial reconnaissance, a first in
2:26
battle. And as the battle raged, the French deployed the second. It would
2:32
normally take about 24 hours for a message to be sent from this distance
2:38
back to Paris, but the French had a trick up their sleeves. This was the
2:43
first military application of a new type of longrange communication system called
2:49
the optical semaphore. The invention was a shape-shifting structure which could
2:55
signal information almost instantaneously across vast distances.
3:01
As the musket smoke cleared, Europe had entered a new era of warfare, one where
3:07
rapid communication shaped the outcome of battle. The key innovation was
3:12
invented by Claude Chapper. His semaphore could be positioned into 92
3:18
distinct positions. Each a separate configuration of the beam and sidearms.
3:25
These would be called symbols. And into each symbol could be encoded one of 92
3:31
possible messages, words, sentences, anything the coder desired. But Chapa’s
3:38
system went one step further. Instead of coding into the 92 primary symbols, he
3:46
coded his message into a set of two consecutive symbols. This now gave
3:52
92* 92 or 8,464 unique
3:59
combinations. This larger set would be known as secondary symbols.
4:05
The system used a code book. The first primary symbol told the receiver the
4:10
page and the second the index on the page. To the French military, important
4:17
commands like attack or retreat would be coded to a particular secondary symbol
4:23
as would every other word or sentence in the code book. Chapa’s system was a
4:29
revelation and soon his semaphore chains had spread across France. Chaper had
4:34
started something that would not be fully resolved for 150
4:46
years. In 1820, a discovery was made that would fundamentally transform
4:52
communication. In his laboratory in Copenhagen, Hans
4:57
Christian observed that an electric current could deflect a magnetic needle,
5:02
revealing the connection between electricity and magnetism. Remarkably,
5:08
Urststead found this effect was repeatable. He used a primitive battery,
5:13
a conductor, and a small compass. When he closed the circuit, the needle
5:19
deflected from its normal north south position. When he turned the current off, the magnetic needle returned to its
5:27
usual position. Earth’s findings would send shock waves around Europe. And this
5:34
breakthrough laid the groundwork for a revolution. Unlike Claude Chaper’s Semaphore, a slow, clunky system of
5:40
visual symbols that could be easily intercepted, electricity offered a new possibility, sending information as
5:47
pulses through hidden conductors, traveling far beyond the reach of the human eye. Urst discovery ushered in a
5:56
new age of communication.
6:20
Samuel Morse had been in Europe since 1829. Now in 1832, it was time to go
6:26
home. On the package ship Sully, Morse entered a discussion with Charles Thomas Jackson, a physician and scientist
6:34
well-versed in the emerging field of electromagnetism. Jackson noted to Morse that electricity travel through wires at
6:41
near instantaneous speeds. Morse, an artist by trade, immediately recognized
6:46
the possibilities. Jackson also remarked that a new device capable of detecting
6:51
electric currents had been invented in Europe. The idea was planted in Morse’s mind, and when he returned to the United
6:58
States, he immediately got to
7:06
work. January the 11th, 1838, a bitterly cold chill filled the
7:13
air in Morristown’s Speedwell Iron Works. It would be here that the first
7:18
public demonstration of Morse’s new telegraph system would be held. By now,
7:24
Morse had help in the form of business partner Alfred Vale. Whilst Morse was an artist and a visionary, Vale was an
7:31
engineer, a machinist, a practical problem solver. Between them, they
7:36
invented a machine which could impress pulses onto electrical conductors. These pulses would travel for tens or hundreds
7:44
of miles. Vale assigned a sequence of dots, single pulses, and dashes, double
7:49
pulses, to encode each letter and number. But Morse and Vale were not the
7:54
only show in town. They had competition. In Great Britain, Cook and Wheatstone
7:59
had their own invention. A telegraph system of five magnetic needles and six
8:04
wires. Pulses were sent in parallel. The operator at one end would press the keys
8:10
and within moments, the needles at the other end would swing into position,
8:16
spelling out the message letter by letter. But in its simplest form, Morse’s system only needed a single
8:21
wire. It was cheaper and easier to manufacture, but it could only send one pulse at a time. But out of this
8:28
limitation, Vale hit on something that would later be fundamental to information theory. Vale studied the
8:35
statistics of the English language. In particular, the relative frequency of each letter. He noted that letters like
8:41
E were far more frequent than others. For example, the letter zed. From this, he assigned shorter codes to more
8:48
frequent letters and longer codes to less frequent letters. Overall, this meant fewer pulses on average were
8:55
required to send messages. Morrison Vale had no mathematical basis for doing this. It was pure intuition, but it
9:03
worked. On May 24th, 1844, Morrison Vale conducted another
9:09
public demonstration, except this time it would rock the foundations of
9:14
America. Professor Morse, seated amidst a hushed gathering of distinguished national
9:21
leaders in the chambers of the United States Supreme Court in Washington, tapped out a message on a device of cogs
9:27
and coiled wires. 40 mi away in Baltimore, Alfred Vale received the
9:33
electric signals and sent the message back. For thousands of years, messages have been limited by the speed at which
9:39
messengers could travel and the distance at which eyes could see signals. Neither Alexander the Great nor Benjamin
9:45
Franklin knew anything faster than a galloping horse. Now instant longrange
9:51
communication had become a reality.
10:07
[Music] By 1876, Morse’s system had dominated world telegraphy. But it was a manual,
10:15
laborintensive enterprise. Thousands of skilled operators were necessary to process the sequence of clicks into
10:22
words and numbers. Emile Bordeaux was a French engineer who saw an opportunity
10:27
to automate telegraphy. He invented an encoding scheme called Bordeaux code.
10:32
Rather than using a variable length code, he assigned each letter or number to a fixed 5-bit code capable of sending
10:41
one of 32 possibilities. Looking back through the lens of information theory,
10:46
we realize that Bordeaux’s code is less efficient because it fails to consider the redundancy in the information
10:53
source. No matter how infrequent, every letter or number is given the same
10:58
five-bit code. But what Bordeaux gave us is a way to automate. His telepinter
11:04
seamlessly impressed the necessary five bits onto the line just with the click of a keyboard. The receiver also
11:11
automatically read the signal back. This was the seeds of the modern digital age.
11:26
The globe is now covered in a patchwork of lines and cables. Morse and
11:32
Bodau had provided near instantaneous global connectivity. But there was something missing. There was as yet no
11:40
theory to quantify information, still no way to put bounds on the limits of
11:45
information capacity, and still no grasp of how noise or bandwidth shaped what a
11:52
wire or wave could carry. The analytical framework of information theory was
11:57
ready to be built, and two men in the late 1920s would lay the foundations.
12:05
In September 1927, Ralph Hartley, an employee of Bell Labs, made his way to
12:11
the International Congress of Telegraphy and Telefan in Bonito Mussolini’s Italy.
12:16
On the shores of Lake Ko, Hartley presented a paper which claimed to quantify information. It would go down
12:23
as one of the seinal works of information theory. Hartley claimed to have an objective measure of information
12:31
removed from what he called the elastic manner in which information had previously been defined. He defined the
12:38
communication problem a group of physical symbols which convey a certain
12:43
meaning as assigned by the designer of the system. This might be the visual semaphors of chaper or the dots and
12:50
dashes of Morse code. But now Hartley considered what happens if we send
12:56
symbols consecutively. The total number of symbol sequences Hartley said becomes
13:02
greater the more symbols we send. Every time we select an additional symbol we
13:08
select one message from an ever growing array of possible messages. For example,
13:15
apples are red. These three words represent one precise specific message
13:23
from an exponentially growing set of possible messages. In the same way, if
13:28
we send consecutive symbols, each new selection grows the total number of
13:33
possible messages exponentially. Hartley’s idea was simple but profound.
13:40
The quantity of information is bound to the total possible messages we select
13:45
from. And this said Hartley can be quantified with mathematical precision.
13:52
Let’s say we have a symbol alphabet of three and two consecutive
13:57
selections. There are 3 to the power of two or nine possible messages. But with
14:04
three selections, there are now 27. Four selections would give 81 possible ways
14:12
to encode a message. Consider Chapper’s Semaphore. He had 92 primary symbols to
14:18
select from. But recall that he used two consecutive symbols. This gives 92* 92
14:24
possible ways to encode a message. Each new consecutive selection in Chapa’s
14:30
system gave a factor of 92 increase. In Bordeaux’s teleprinter, there are 32
14:36
ways to encode a message. But if n selections are made, the number of possibilities increases by 32 to the
14:43
power of n. Each new selection broadens the total number of possible sequences
14:48
we can use to assign messages to. And when we select one message from a larger set, the information content, said
14:55
Hartley, is greater. But Hartley immediately hit a problem. Imagine
15:00
Chapper decided to use a code book with 10 consecutive messages instead of two.
15:07
Now there would be 92 to the power of 10 possible ways to encode a message. This
15:13
is greater than the number of stars in the observable universe. If this is our
15:19
measure of information, it scales up very, very, very quickly. In his attempt
15:25
to quantify information, Hartley had observed a monster that grows exponentially out of control. For this
15:32
reason, Hartley made a suggestion. Rather than use the raw number of possible sequences, it is far more
15:39
sensible to quantify information as its logarithm. This means as the number of selections goes up, information
15:46
increases linearly, not
15:52
[Music]
16:02
exponentially. Hartley talked about sending messages comprised of sequences
16:07
of symbols. But to send a symbol, we need a signal. Something which
16:14
represents the symbol we’re trying to send. Nyquist was the first to make the connection between discrete symbols and
16:21
analog signals. He looked at the pulses of a telegraph system, sequences of ones
16:27
and zeros. But he went further. He decomposed these pulses into their
16:33
sinosoidal components. And he realized something fundamental. As a signal
16:39
travels through a channel, higher frequencies get stripped away. The extent to which this happens depends on
16:46
the bandwidth of a channel. The problem is that your signal becomes distorted.
16:52
Nyquist realized that there is a connection between how many signaling events or symbols can be transmitted per
16:59
second and the bandwidth of the channel. And this places a fundamental limit on
17:04
how many signals can be sent per
17:11
second. In 1937, a 21-year-old from a small town in Michigan submitted his
17:18
master’s dissertation at MIT. Many now regard this thesis as the birth of
17:24
digital electronics and the greatest master’s thesis in all of engineering
17:30
history. Claude Shannon had showed how the mathematical algebra of bool, the logic
17:36
of ones and zeros, true and false, could be applied to electrical circuits, and
17:42
it was a blueprint for digital logic, the foundation of computers and all
17:48
modern digital systems. Shannon went on to receive his PhD degree in theoretical genetics, and
17:55
later took up a position at Bell Labs. It was here that he would spend his days
18:00
contributing to wartime cryptography, but at night he was working on a side
18:08
[Music]
18:15
project. In 1948, Claude Shannon proposed a set of ideas that would
18:21
revolutionize the way we think about information. But his starting point was indeed the observation of Hartley. A
18:28
measure of information is the number of messages in the set we select from all choices being equally likely. And
18:35
Shannon also agreed with Hartley that the logarithm was the most natural way to measure information. He proposed that
18:42
the base 2 is the most natural measurement of information. This is what we know as a bit. Then Shannon defined
18:49
the problem. A communication system is made up of an information source which
18:55
produces a sequence of messages. A transmitter which operates on the messages to produce the signal like the
19:02
pulses of a telegraph. The medium or the channel which is used to transmit the signal. A receiver which performs the
19:09
inverse operation of the transmitter to reconstruct the original sequence of messages. And finally the destination
19:16
the intended recipient of the message. Hartley had provided the first real
19:22
quantitative measure of information. He said that the amount of information transfer is the log of the total number
19:30
of available sequences from which that message is selected using the base 2. This means chaper semaphore has a rate
19:37
of 6 1/2 bits per symbol and Bordo’s teleprinter can send five bits per
19:42
symbol. But Shannon realized that the channel by itself does not convey information. A position of chapa
19:48
semaphore or a particular sequence of ones and zeros in Bordeaux’s teleprinter does not in itself convey a message. It
19:56
simply provides a way to transport information from the information source. This might seem
20:02
counterintuitive. Look at how Chapper and Bordeaux allocated their messages to channel signals. Every message was
20:09
allocated to every channel signal. But Shannon showed something remarkable. The information source itself has a rate,
20:16
something innate, an essence of the source rooted in its own behavior. By separating the information
20:24
source and the channel, Shannon had made the first step towards a number of
20:29
groundbreaking [Music]
20:34
discoveries. As a preamble, Shannon showed how the capacity of an arbitrary
20:39
channel can be calculated. Unlike the information source, the channel capacity is simply a function of how many
20:46
sequences can be constructed from the available symbols. And we’ve already seen that the capacity of the Bordeaux
20:52
teleprinter is 5 bits per symbol. So over time, the capacity is 5 n bits per
20:59
second, where n is the transmitted symbols per second. But Shannon also
21:04
showed how the capacity of Morse’s telegraph system can be calculated. This
21:09
one is different because we’re dealing with symbols of different lengths and restrictions on the kind of sequences we
21:16
can have. Nonetheless, Shannon showed that the capacity is the limit as t
21:21
tends to infinity of the log of the allowable sequences divided by time. But
21:27
the impact of Shannon’s next section would be monumental.
21:34
He turned his attention to the information source and he asked a question nobody had asked before. At
21:41
what rate is the source generating information? This is separate to asking
21:46
what capacity a channel has. What Shannon saw that Hartley did not is an information source that is not a random
21:53
generator of symbols but a statistical process. Each new symbol generated is
21:59
not independent of the previous symbols. When we count sequences, some are more
22:04
likely than others. Some sequences are so unlikely that they might as well never occur at all. This is most easily
22:12
observed in human language. There are, of course, a huge number of ways we can arrange words, but only a small subset
22:19
of these possible sequences are commonly used. Hartley’s approach was to count
22:24
all possible sequences. But Shannon introduced a new perspective, one that considered the likelihood of different
22:31
sequences occurring. He then developed a model, a mathematical machine capable of
22:36
producing symbols and generating information. He modeled an information source as a markoff process where the
22:44
source is represented as a complex web of interconnected states that transition
22:49
between each other randomly but according to set probabilistic rules. If
22:54
this was a language, these rules would be dictated by the normal structure of that language. But any information
23:00
source has this kind of statistical behavior. It contains patterns, repetition, and redundancy. Out of
23:07
Shannon’s machine came symbols. But how can we measure how much information is being generated? Shannon produced an
23:14
idea that would become a fundamental pillar of not just information theory but data compression, cryptography,
23:21
machine learning, artificial intelligence, economics and neuroscience. His idea was simple.
23:28
Information is best measured as the uncertainty resolved, not just the
23:33
number of possible choices.
23:40
[Music] Imagine you flip a fair coin. Hartley
23:47
had said that the information is the log of the possible choices, which here is
23:52
one bit of information per flip. You can keep flipping the coin and each new
23:57
event will give one extra bit of information. Now consider if we make the
24:02
coin biased, so there’s an 80% chance of heads. According to Hartley, there are
24:08
still two possible outcomes, and the information is still one bit per flip of
24:14
the coin. But is this correct? Shannon realized that the measure of choice gets
24:20
smaller as the probabilities of the outcomes become more skewed. In reality,
24:25
information is the log of a smaller number of effective choices. Shannon
24:31
called this new measure of information entropy.
24:45
To calculate entropy, Shannon calculated the surprise of each new selection from
24:51
an information source. This quantifies how surprised we are that this symbol has been chosen or in other words, how
24:59
much uncertainty that selection has resolved. For example, if we have four
25:04
options and if we select the symbol with a 50% probability, this is one bit of
25:10
surprise. Even though this is within a symbol space of four, selecting the 50% option provides the same amount of
25:18
information as flipping a coin. But the 12 1/2% symbol provides three bits of
25:24
surprise because it’s more unlikely. So more uncertainty is being resolved. For
25:29
the flip of a fair coin, the surprise is always one bit. But what happens if the
25:35
coin is biased? Of course, the more likely option is chosen most often, but
25:40
the surprise is relatively low. But every so often, the unlikely event happens. The surprise is greater. But
25:47
what Shannon showed is entropy is the average value of this surprise and it
25:52
maximizes when the probabilities are equal. Entropy is the weighted sum of
25:57
the surprise values of each possible outcome weighted by how likely each
26:03
outcome is. Consider again an event with two possible outcomes with probability P
26:09
and Q. If this was an information source, its entropy would be related to how balanced these probabilities are. If
26:16
one outcome is certain, entropy is zero. No information is gained if we learn the outcome. But if the probabilities are
26:22
balanced, this conveys the maximum amount of information and the entropy is
26:28
at its highest. However, if the probabilities are skewed, the entropy
26:33
inevitably drops. But why does this matter? We still need to encode every possible symbol into the channel, right?
26:40
Is entropy just a philosophical and academic measure of what we perceive as
26:45
information? What Shannon did next is prove that entropy places a fundamental lower bound on how efficiently we can
26:52
represent information and therefore make best use of the
27:02
channel. Suppose I have an information source which is represented here as four
27:08
colors. So, there’s a 50% chance I’ll select red. 25% chance I’ll select blue
27:14
and a 12% chance I’ll select white or black. I’m going to put these inside a
27:19
bag and I’ll select randomly. So, we need to assign a code to each one of these symbols to convey this over a
27:27
channel. So, the question is which code should we assign to each of the colors to get the most efficient coding scheme?
27:34
and it’s going to be a random selection. When we think about coding
27:40
these symbols, we might follow the advice of Bordeaux and assign a fixed code. In this case, we require two bits
27:48
to describe all four possible outcomes. Of course, the average number of bits per symbol will always be two bits. But
27:55
can we do better? Let’s try a variable length code. And the trick is to assign
28:01
a code length which matches the surprise of each symbol. So when red is selected, there’s a 50% chance. So there’s one bit
28:09
of surprise. With blue, a 25% chance, two bits of surprise. Matching code
28:14
length to surprise gives us the optimum coding scheme. This is the direction of
28:19
course that Morse and Veil were heading all those years ago. The key is assigning shorter code words to more
28:26
frequent symbols. In this experiment, we achieved an average of 1.78 bits per
28:32
symbol, which is getting close to the entropy of the source, 1.75 bits. But if we carried on the experiment, we would
28:38
actually converge on this value. The job of the coder is to reshape the output of the source into a bitstream whose
28:45
structure is statistically close to the most efficient possible use of the channel, which is when symbols are
28:51
encoded near their surprise level. Shannon said that doing this will allow you to maximize the number of symbols
28:58
which can be conveyed and make full use of the channel capacity. In the previous example, if our channel capacity was
29:05
let’s say 14 bits per second, we could send seven symbols per second with the
29:11
bad encoding scheme. But if we use the good encoding scheme, we could send eight symbols per second. In general,
29:18
Shannon said we can compress an information source down to its entropy. But if we go lower, we are almost
29:24
guaranteed to lose information. But we need to be careful because not every information source is so easy to encode.
29:32
The example we just looked at is a very special case. It works out neatly because the symbols are independent and
29:38
follow a simple probability distribution. But what about something more complex like the English language?
29:44
In this case, we can get close to the true entropy if we encode over long sequences of text. That’s because the
29:51
statistical structure of the language like grammar, word choice, and context only really show up when we look at
29:57
larger chunks. Letters aren’t independent, and some combinations are far more likely than others. So, how did
30:04
Shannon prove his theory was correct in all cases, that you can compress information down to its entropy in the
30:11
general case? How did Shannon prove that any source can be compressed down to its
30:18
entropy? To demonstrate, I’ll start with the simple information source we used previously. Remember, there’s a 50%
30:24
chance we’ll pick a red, 25% chance we’ll pick a blue, and 12% chance we’ll pick a black or a white. I’m going to
30:32
pick at random again and try to generate long sequences. But because I don’t have
30:39
the patience to do it for real, I’ve written a piece of code to simulate this information source. And I’m going to
30:45
simulate lots of these information sources. simultaneously. So each will generate its own long sequence. Shannon
30:54
noticed that something interesting happens if we make the number of selections very large. Can you see that
31:00
the percentage of red, blue, black, and white begins to converge to its expected
31:06
value, 50% red, 25% blue, and so on. This happens for all of the information
31:14
sources we simulated. Even though the sequences produced by our sources are all
31:20
different, they belong to the same typical set. And as the number of selections approaches infinity, this
31:27
probable set is all we are left with. A small set from an astronomically large
31:34
possible number of sequences. And the number of bits required to specify this
31:39
probable set. It is the entropy per symbol of the information
31:45
source. Hartley had taught us that information is the log of the number of
31:51
choices. Shannon had taken the sequence to infinity and showed the true number of choices is specified by the entropy.
32:00
And Shannon said the same result applies for any source. So even if we simulate a
32:07
complex source like a Markov chain, it will also eventually boil down to its
32:13
own set of typical sequences and it too can be specified by its entropy. In the
32:20
limit as we approach infinity, Shannon had proved that the information source
32:25
boils down to its entropy, the essential essence of the source. any fewer than h
32:31
bit bits per symbol and we cannot guarantee that every probable sequence
32:36
can be encoded. Now Shannon’s proof relies on taking a number of sequences which
32:43
approaches infinity but it shows the limit of
32:48
compression. You cannot compress any better than the entropy.
32:54
Shannon had introduced a new method of quantifying information entropy and he
33:01
showed that any information source can be compressed down to its entropy. This
33:08
allows communication with the minimum number of bits per symbol and the most
33:14
efficient use of the channel capacity. But there’s a problem. Every
33:20
practical channel is noisy and it means that there can never be a guarantee that
33:26
the bits you receive are the bits that were sent. What Shannon did next would be the
33:32
most audacious part of his paper and its implications would reverberate decades
33:39
into the
33:44
future. Imagine I flip a fair coin.
33:50
heads or tails. When I tell you the result, you gain one bit of information.
33:55
But now, let’s say I lie 10% of the time. So, if I flip a heads, I’ll tell
34:01
you its tails. And if I flip a tails, I’ll tell you it’s a heads, but only 10%
34:07
of the time. How much information am I now conveying? You might be tempted to
34:13
say, well, if 90% of the time you’re telling the truth, maybe I’m still
34:18
getting.9 bits of information, 9 bits out of 10. But that’s not right because
34:25
you don’t know when I’m lying. That uncertainty is actually baked into every
34:31
message. How did Shannon account for this and adjust the rate accordingly?
34:37
Consider a communication system with an observer capable of seeing both what is
34:42
sent and what is received. The observer notes the errors and then communicates
34:48
them over its own channel, a correction channel. The easiest way to do this is
34:53
to send a stream of bits with one indicating the position of an error and zero otherwise. But these bits could
35:01
occur anywhere. All we know is that the frequency of ones for a long message
35:06
will be 10% or whatever the bit error rate happens to be. Shannon modeled this
35:12
as a communication channel in its own right. And he realized that the capacity
35:17
of the correction channel coded down to it entropy is the missing information at
35:24
the receiver due to the additional uncertainty of the noise. He called this
35:30
conditional entropy. It is the average ambiguity of the received signal. In our
35:36
example of the coin, the entropy of the information source starts as one bit per
35:42
flip, but the uncertainty due to the noise is47 bits per flip. Therefore, the
35:50
rate of information is actually 0.53 bits per flip. Shannon’s
35:57
hypothetical correction channel shows how the conditional entropy represents the missing information due to the
36:05
noise. As an example, imagine being at a loud party and your friend says a word
36:13
and let’s say that word is selected from 2 to the^ of 10 possible words. So 10
36:19
bits of information is contained within that message. But because of the noise,
36:24
you can only narrow it down to one of eight possible words. That means there’s
36:29
still three bits of uncertainty. But the information you gained is the narrowing
36:35
of 2 to the 10 possibilities down to 2 to the 3 possibilities. So seven bits of
36:41
information has been transmitted. When noise is present,
36:46
there is an inherent uncertainty at the receiver and this means there could be
36:51
multiple possible messages for each received sequence. This is not practical
36:57
for communication over noisy channels because the receiver can never be sure
37:03
about what the transmitted message actually was. If we think back to Chapa
37:08
Semaphore, imagine the problems they might face if the weather is foggy. it
37:14
becomes difficult to see what symbol is actually being sent. But you might be able to reasonably narrow it down to a
37:21
limited set of possibilities. In such a case, you might be tempted to limit the
37:26
rate of transfer by restricting the number of symbols that can be sent, effectively separating the symbols
37:33
according to the ambiguity. One might consider the capacity of this channel to be the maximum possible rate given the
37:40
uncertainty of the fog. But the system needs to be designed properly to reach this capacity while still avoiding
37:47
errors. Shannon was now ready to make his most audacious claim yet. He said it
37:54
was possible with the right coding to transmit at the capacity of the channel
38:00
with close to zero errors. Imagine a source X producing information at the
38:07
rate of entropy H of X. Let’s assume this source has reached the channel
38:13
capacity. Over time, this source produces around 2 ^ of h * t probable
38:19
sequences. As Shannon showed, if a source keeps generating sequences, eventually you end up with just a set of
38:27
typical sequences and the number of them grows exponentially at a rate defined by
38:34
the entropy. If we now allocate these sequences to the channel, the number of possible channel output sequences also
38:41
grows roughly at the exponential rate. The trick, as we’ve seen, is to assign
38:48
messages to channel inputs using a code book mapping each message to a channel
38:54
sequence. But here’s the problem. In a noisy channel, we’re guaranteed to have more than one plausible input that could
39:02
produce a given output. that is more than one reasonable cause for what the
39:07
receiver sees. So Shannon tried something different. He reduced the rate
39:12
of the source to R where R is now less than the channel capacity. Now because
39:19
the source is generating information more slowly, it produces fewer sequences
39:24
compared to the number of available channel output sequences. It’s
39:30
interesting to see what happens when we simulate the rate of growth of sequences of R and the channel sequences. Even
39:36
with a small reduction, given enough time, the ratio becomes vanishingly
39:41
small, which means there are now a huge number of channel sequences available to encode messages. And this means the
39:48
channel has more than enough room to spread the messages out to keep them far
39:54
apart. Shannon then assigned messages at random to channel sequences and he
40:01
realized even though the channel is noisy, the chance that a received sequence could be reasonably interpreted
40:09
as more than one message drops to close to zero given a long enough block
40:15
length. [Music]
40:21
Shannon’s final trick was to extend the concept of entropy to continuous signals
40:27
and in doing so he provided a way to calculate the capacity of continuous
40:33
channels. This culminated in the Hartley Shannon theorem a unifying expression
40:39
that brought together the insights of Nyquist Hartley and Shannon himself. The
40:45
result was a single equation that defines the capacity of a channel constrained by power and corrupted by
40:52
noise. The ultimate speed limit for reliable communication
41:01
[Music]

