how to compress, there are multiple ways. A simple first way is gap encoding.

This gap encoding has a problem on separating the words/info, so Elias-Gamma in 1975 designed to write log2x zeros, then x in binary, it’s prefix-free because the number of initial zeros tells us exactly how many bits of the code come afterwards. To save more space, he came up with Elias-Gamma too. Solomon Golomb invented this common approach, use an integer parameter M, called modulus, write x as q.M + r

Lastly, there is a way called Variable-Byte(VB) which unicode applies. The idea is to use whole bytes,

Now the Entropy theory!

So if we have m symbols that are equally likely, applying the formula in we get

Claude Shannon in 1948 came up with the Shannon’s source coding theorem that no code can be better than the entropy, and there is always a code that is almost as good.
Here is an exercise demoed by the professor:
