to have unique, variable-length encodings, you want to organize the items you're representing in binary into a binary tree. Every node of a binary tree has two nodes, and each of those nodes can either be a leaf (in this case, one of the values to be encoded) or another node. The path through the tree - left being '0', right being '1' - defines the output to encode for each.
Set-length encodings like people are talking about can be thought of as binary trees where every leaf is the same number of nodes down, so, for example with bytes, the path from the root to any element is always 8 steps. This gives fixed-width outputs which are easy to read and manipulate. But variable-width encoding is also possible, and sometimes desirable - what if you don't have 256 different values to encode? what if you only had, say, 150? too many to encode in 7 bits, which would cap out at 128 different items, but still leaving a lot of unused or invalid values in an 8 bit encoding.
Here's an example binary tree, which just encodes the letters 'A' through 'D'.
{
{ 'A', 'B' },
{ 'C', 'D' },
},
This ends up being a fixed-width tree; we'll say element 1 in each node array is left (0), and element 2 is right (1).
from the root, left gives you the node {'A','B'}. Left again, just 'A'. So, left left, or "00", is 'A'. right, right, or "11", would be 'D'.
But supposed you added "E" and "F". We could expand it to a 3-bit fixed width encoding, keeping these as 000b - 011b and adding 100 = E and 101 = F. Then 110 and 111 would be meaningless, wasted patterns in our output encoding.
Instead, with binary trees, we can do this:
{
{
{ 'A', 'B' },
{ 'C', 'D' },
},
{ 'E', 'F'},
}
the entire previous tree just became the left branch of a new node, with the right node containing the two new values. So now, left takes us to the old tree, and 000 (left, left, left) would take us from root to 'A', 011 (left, right, right) would take us to 'D.' But starting at root, right takes us to {'E','F'} and from there a left and right give us 'E' and 'F', so E is "10" and F is "11." Not fixed length anymore, and 2 of our 6 symbols take up only 2 bits instead of 3, and we have no patterns that end up meaning nothing - the only invalid cases would come at the end of the entire file. If you had an extra "0" at the very end, that doesn't decode to a character; a single "left" instruction just gives you the node containing A,B,C, and D.
Huffman encoding is a compression algorithm for building binary trees based on repetition counts - the most frequent things get the shortest encodings, the shortest branches in the tree, while the least frequent get the longest branches/encodings. But any method will give a valid encoding, so long as it builds a valid binary tree.
The big limitation of variable-length encoding is that you
must read data encoded this way from the beginning,
always. You can't jump in in the middle and start reading, because you will have no way of knowing which bits end one value and begin the next one. 'E' is encoded as "10", but not every "10" in output is an 'E'; 'C' was encoded as "110", which obviously has "10" right there in it! Fixed width encodings remove this limitation, which is a big part of why they're favored by programmers (even when, as in the case of utf8 (which is variable-width) vs ascii (fixed-width), there's a lot of very good reasons to use the variable width one!)