This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
Pharap's profile picture

Bytes and Binary: The Basics

Started by Pharap, 06 March 2013 - 03:51 PM
Pharap #1
Posted 06 March 2013 - 04:51 PM
I was answering a question when I suddenly realised there are no tutorials about bytes and binary, which is frankly horrifying, so I'm making one now.

Firstly, binary.
In all modern countries, the dominant counting system is known as the decimal system (or base 10), which means that when counting, there are 10 designated symbols used before another column must be added to further counting ability.
Essentially there are the units: 0-9, of which the highest number is 9. When you try to add the lowest non-zero unit (aka 1) to the highest unit (9), you run out of symbols, thus another column must be added in order to continue counting, hence 10 is born.
This "adding 1" process can continue until the units yet again reach their maximum (19), after which another mark must be made to the second column, resulting in 20. This of course can continue until you run out of symbols to represent that column (99) which means yet another column must be added (resulting in 100).
Technically speaking, the number 1 is really more like 01 or 001 (or to a greater extent 00000000001), meaning in reality, 0 is actually shorthand for an infinitely long line of 0s that nobody can be bothered to write. The main reason they aren't written (aside from the fact nobody has that kind of time) is because writing them would be redundant. Our brains all know there is nothing before that 1. When someone sees a 1, they don't think there's an infinite amount of nothing before that 1, because really it's the 1 that matters most.

"So, what has this got to do with binary?".
SpoilerMy answer is: "everything".
Quite simply, binary is an alternate counting system.
Whereas decimal is base 10 (ie it has 10 symbols to choose from) binary is in fact base 2, giving only two symbols: 1 and 0.
Aside from this one major difference, it is pretty much exactly the same as counting in base 10. A fact I shall now prove:

0 + 1 in base 10 is 1,
0 + 1 in base 2 is 1.
1 + 1 in base 10 is 2,
1 + 1 in base 2 … oh dear, we've run out of symbols, time to add another column:
1 + 1 in base 2 is 10 (exactly the same value, just written differently),
2 + 1 in base 10 is 3,
10 + 1 in base 2 is 11 (yet again, exactly the same value, but using less symbols and more columns)

So, that's binary in a nutshell. Not exactly easy to use, but all the same principles as base 10.
In fact, here's another good principle to compare, this again applies to all bases. For this, we will be using exponentiation (aka powers of), denoted by this symbol: ^
eg 4^2 = 16 (4 to the power of 2, ie 4*4, is 16)

Exponentiation can be used to find out the possible limits of a set of columns depending on the base being used.
For example, if I am using base 10 (decimal) and I want to know what my number limit is if I have 4 columns, I can do 10 ^ 4 (10*10*10*10).
The result of course, is 10000.
"But, 4 digits would only go up to 9999".
SpoilerThis is correct, 4 digits would give you only 9999 as your maximum number. That is if you are only counting non-zero numbers. If you include zero, technically you have 10000 numbers from which to choose since we are counting 0000 as a possible value.
In decimal, as 0 is designated as 'nothing' or 'worthless', the 10000 result would be 0 if there were no possibilities. However since there are 9999 non zero numbers, and 1 zero, the result is in fact 10000.

This same rule applies to base 2 (binary).
If I have 4 columns in which to make a base 2 number, that's the same as 2 ^ 4 (2*2*2*2), which is 16.
If you count, including 0, there are 16 possible combinations of 1s and 0s you can get from 4 columns:
0000,0001,0010,0011,
0100,0101,0110,0111,
1000,1001,1010,1011,
1100,1101,1110,1111

There we go, just proved it. By using exponentiation, you can figure out the number of results without having to write all possible results down.
"So what has this got to do with bytes and computers?"
SpoilerThroughout its entire system, a computer stores data as bytes. Bytes themselves are made up of smaller units called bits.
A bit itself is a tangible thing capable of having two states: set or clear. Generally (at least in your computer's hard disk) bits are stored by defining the polarity of a piece of metal, being positive for set and negative for clear. The byte itself (on modern computers) is comprised of 8 bits.
"So what good is this and how does it store data"
SpoilerOk, so a bit has two states: set and clear. Those are a bit much to say and write, right? So lets make it a bit easier. Set is 1 and clear is 0. So now a bit has two states: 1 and 0. Let's try to represent a byte (8 bits)

01101001

Oh look, binary.
Yep - by treating set as 1 and clear as 0, you can represent a byte as a binary number.
This of course means we have a way of storing numbers on a computer. I wonder how high 1 byte goes?
2 ^ 8 = 256

Oh, that high? How about 2 Bytes?
2 ^ 16 = 65536
That's a lot of numbers!

All sarcasm aside, I hope that last little section made the first walls of text worthwhile.
Knowing the basics of bytes and binary are very useful things to have if you're going to go more in depth with programming.
Not only does it mean when people say 'binary is just 1s and 0s' you can get annoyed with them because you appreciate the complexity of it, but it also opens up a pathway to wider applications of this knowledge:
Cryptography,
data compression,
boolean logic,
the CC colour system,
compiling code scripts,
bit shifting and bit-level operations.

Of course, not all of these are quite as easy, but if nothing else it opens a world of possibilities for you to broaden your horizons.

Any criticism or corrections are accepted, please be kind, I wrote this at 4am and frankly I wanted it finished so the knowledge was on the tutorials.

I hope to follow this up with tutorials on bit-shifting and binary operations as well as how bytes are used to store text data, a bit of boolean logic, a bit of the CC colour system, the uses of base 8 and base 16 in terms of programming, easy conversion between base 2, base 10 and base 16 and possibly a bit of floating point arithmetic or some uses of the binary read/write filemodes.

Thanks for reading.

PS - I think I should probably go to bed soon, I have college tomorrow.
Dlcruz129 #2
Posted 06 March 2013 - 05:02 PM
Wow. Very… thorough.
Pharap #3
Posted 06 March 2013 - 05:07 PM
Wow. Very… thorough.
Tbh I wanted to cover boolean logic while I was at it, but I decided against it lol
And you would not believe how useful this stuff is when it comes to writing things for real computers.
If you can understand binary, binary file formats like png aren't that far out of reach.
Dlcruz129 #4
Posted 06 March 2013 - 05:09 PM
Wow. Very… thorough.
Tbh I wanted to cover boolean logic while I was at it, but I decided against it lol
And you would not believe how useful this stuff is when it comes to writing things for real computers.
If you can understand binary, binary file formats like png aren't that far out of reach.

I liked the tutorial. ;)/>
Very useful for people new to binary.
theoriginalbit #5
Posted 06 March 2013 - 07:03 PM
Its good that someone decided to do a tutorial on this. quite a few key areas you missed, and a few more you didn't touch on, in terms of conversion between the bases, and the such, but I'm sure you will expand on this over time :)/>
Pharap #6
Posted 07 March 2013 - 12:47 PM
Its good that someone decided to do a tutorial on this. quite a few key areas you missed, and a few more you didn't touch on, in terms of conversion between the bases, and the such, but I'm sure you will expand on this over time :)/>
Conversion isn't really such a big deal.
Lua's tonumber handles binary to decimal conversion for you.
Main reason I overlooked that is because frankly I've never needed to convert between base 2 and base 10 in code, I always do it in my head because I'm used to working with binary.
I also didn't mention base 8 or base 16, but I can cover them later if anyone thinks I should.
The important bit was getting the tutorial out there so we actually have somewhere to point beginners, especially since most online stuff regarding this is a tad cryptic or in bits and pieces.
If people find my tutorial useful, sure I'll make the rest, if not, someone else can do it.
What matters is what people can connect to and what makes sense to beginners.

I'm going to stop there before I ramble this into a page long reply lol
shiphorns #7
Posted 07 March 2013 - 03:22 PM
Just FYI, in digital logic design 'set' and 'clear' are normally used as verbs indicating state change (from logic low to high or high to low respectively) not used to denote static values. You'll see these terms on the inputs of simple logic circuits to indicate the effect a logic HIGH signal on that input would have on the state of the circuit's output. Likewise in computer science terms, you will hear of bits being set or cleared, but again these are action verbs. If someone says a bit is set, what they are saying is that it has been set, which normally implies to a value of 1, but not necessarily. I know this is really nuancy, but I thought it worth mentioning in case someone reading this has looks at other digital logic primers and is wondering about the terminology.

If you want to keep the tutorial to binary math, then simplying using 1 and 0 is fine. If you want to introduce digital logic terminology, then a little discussion of high/low signals and on/off states is probably a prerequisite to the concepts of set, clear, reset, hold, clock, etc…

Something else worth noting, is that computers don't store all of their numbers in straight binary. The system you've outlined is how unsigned integers are actually stored a lot of the time, but if you think about negative numbers, floating point numbers, etc.. you venture out of the land of the simple binary represention used in this tutorial, and into territory where each more significant bit in byte isn't simply the next power of 2, but rather different groups of bits are used to represent different parts of numbers. You don't need to expand your tutorial to cover IEEE 754 format, or 2's complement signed integers, but people should simply know that these things exists, and that you most often can't just count bits and translate a computer stored number to an integer from 0 to 2^N-1.
SuicidalSTDz #8
Posted 07 March 2013 - 03:36 PM
I feel as if someone should also detail the difference between ciphers and encryption, since these are both commonly confused, as well as Hashing ;)/> Very nice tutorial btw. Learn something new about Binary everyday.
theoriginalbit #9
Posted 07 March 2013 - 03:58 PM
Main reason I overlooked that is because frankly I've never needed to convert between base 2 and base 10 in code, I always do it in my head because I'm used to working with binary.
That's my point… You convert in your head. Not everyone can do that. And or how to do it. Which is what I think you should add. How to do it.

I also didn't mention base 8 or base 16, but I can cover them later if anyone thinks I should.
I think you should cover hex (base 16) it's a good programming tool. Makes number representation nicer. And I personally use hex for bit masking.
shiphorns #10
Posted 07 March 2013 - 05:47 PM
I think you should cover hex (base 16) it's a good programming tool. Makes number representation nicer. And I personally use hex for bit masking.

I wholeheartedly agree. In real-world desktop computer programming, no one ever looks at binary-stored information as 1s and 0s, only as hexadecimal. Binary file editors typically show you hexadecimal and its ASCII representation, and debuggers typically give you the option of decimal or hexadecimal for numeric formats. It's still important to know what 0-F are in binary representation, but no one wants to look at a huge stream of 1s and 0s.
Pharap #11
Posted 08 March 2013 - 01:29 AM
Just FYI, in digital logic design 'set' and 'clear' are normally used as verbs indicating state change (from logic low to high or high to low respectively) not used to denote static values. You'll see these terms on the inputs of simple logic circuits to indicate the effect a logic HIGH signal on that input would have on the state of the circuit's output. Likewise in computer science terms, you will hear of bits being set or cleared, but again these are action verbs. If someone says a bit is set, what they are saying is that it has been set, which normally implies to a value of 1, but not necessarily. I know this is really nuancy, but I thought it worth mentioning in case someone reading this has looks at other digital logic primers and is wondering about the terminology.

If you want to keep the tutorial to binary math, then simplying using 1 and 0 is fine. If you want to introduce digital logic terminology, then a little discussion of high/low signals and on/off states is probably a prerequisite to the concepts of set, clear, reset, hold, clock, etc…

Something else worth noting, is that computers don't store all of their numbers in straight binary. The system you've outlined is how unsigned integers are actually stored a lot of the time, but if you think about negative numbers, floating point numbers, etc.. you venture out of the land of the simple binary represention used in this tutorial, and into territory where each more significant bit in byte isn't simply the next power of 2, but rather different groups of bits are used to represent different parts of numbers. You don't need to expand your tutorial to cover IEEE 754 format, or 2's complement signed integers, but people should simply know that these things exists, and that you most often can't just count bits and translate a computer stored number to an integer from 0 to 2^N-1.

I can change it to 1 and 0 if I really must, but it breaks the flow of the tutorial, I'd rather change it to on/off or positive/negative since I don't want to mention the relation to binary until after covering what bytes are.
I'm not going to go into digital logic for several reasons:
1-This is a programmer-orientated tutorial written with lua users in mind. I'm not writing it for redpower circuitry.
2-This is supposed to be an introductory tutorial, I don't want to go scaring people off by going into more depth than is needed for understanding how binary and bytes relate to programming.
3-Boolean Logic in relation to binary is not covered as a part of this tutorial, I clearly state at the bottom I intend to cover that in a later tutorial if this tutorial is found useful.

This tutorial is not about binary math itself, it is merely about the basic relationship between bytes and binary. If you feel there should be a tutorial about in depth binary mathematics, you're free to go and write one. My concern however lies with the less experienced programmers who wish to learn why binary is related to programming and how memory is managed on a computer at a very simple level.

No they don't store all numbers in straight binary, but as before I don't want to go scaring people off with all the little details. When teaching beginners you have to simplify and generalise to keep interest and get the main points across. If you want to write your own tutorial about how computers represent different types of numbers, go ahead, but I don't think it's something that needs covering in a beginner's tutorial like this.


Main reason I overlooked that is because frankly I've never needed to convert between base 2 and base 10 in code, I always do it in my head because I'm used to working with binary.
That's my point… You convert in your head. Not everyone can do that. And or how to do it. Which is what I think you should add. How to do it.

I also didn't mention base 8 or base 16, but I can cover them later if anyone thinks I should.
I think you should cover hex (base 16) it's a good programming tool. Makes number representation nicer. And I personally use hex for bit masking.

That's fair enough. I can appreciate that working binary out mentally is an acquired skill, so I may write a tutorial covering some different conversion methods.
Personally I find binary easier to work with than hex, but I will almost certainly cover it if there is enough interest because I know a lot of people prefer hex or use it more often. Hex is quite easy to treat as binary anyway, since a single character (0-F) essentially represents a nibble (half a byte, gotta love those crazy scientists lol).


I think you should cover hex (base 16) it's a good programming tool. Makes number representation nicer. And I personally use hex for bit masking.

I wholeheartedly agree. In real-world desktop computer programming, no one ever looks at binary-stored information as 1s and 0s, only as hexadecimal. Binary file editors typically show you hexadecimal and its ASCII representation, and debuggers typically give you the option of decimal or hexadecimal for numeric formats. It's still important to know what 0-F are in binary representation, but no one wants to look at a huge stream of 1s and 0s.

I'll have you know I look at binary for numbers less than 2 bytes, and generally convert hex to binary more than hex to decimal. If given the option I'd also rather use a decimal view for binary file editing. Not everyone is against streams of 1s and 0s. I do however agree that Hexadecimal is widely used and useful for it's nibble representing capability, and I did say I'd consider covering base 16 if this tutorial is found useful.
shiphorns #12
Posted 08 March 2013 - 03:34 AM
Something else worth noting, is that computers don't store all of their numbers in straight binary. The system you've outlined is how unsigned integers are actually stored a lot of the time, but if you think about negative numbers, floating point numbers, etc.. you venture out of the land of the simple binary represention used in this tutorial, and into territory where each more significant bit in byte isn't simply the next power of 2, but rather different groups of bits are used to represent different parts of numbers. You don't need to expand your tutorial to cover IEEE 754 format, or 2's complement signed integers, but people should simply know that these things exists, and that you most often can't just count bits and translate a computer stored number to an integer from 0 to 2^N-1.

No they don't store all numbers in straight binary, but as before I don't want to go scaring people off with all the little details. When teaching beginners you have to simplify and generalise to keep interest and get the main points across. If you want to write your own tutorial about how computers represent different types of numbers, go ahead, but I don't think it's something that needs covering in a beginner's tutorial like this.

I wasn't suggesting you cover the different binary representations, merely mention them. When trying to make sense of binary that a programmer might come across in a real application, it's pretty essential to know about the sign bit and 2's complement formats. Like for example, if someone is tracing out the value of a 32-bit ARGB pixel (or seeing it in a debugger) and seeing that their white pixel as a decimal value of -1, they're going to be scratching their head wondering why it's not 2^32-1 (of course seeing it as 0xFFFFFFFF will makes sense, which is why they should know hexadecimal ;-)
Pharap #13
Posted 08 March 2013 - 06:30 AM
No they don't store all numbers in straight binary, but as before I don't want to go scaring people off with all the little details. When teaching beginners you have to simplify and generalise to keep interest and get the main points across. If you want to write your own tutorial about how computers represent different types of numbers, go ahead, but I don't think it's something that needs covering in a beginner's tutorial like this.

I wasn't suggesting you cover the different binary representations, merely mention them. When trying to make sense of binary that a programmer might come across in a real application, it's pretty essential to know about the sign bit and 2's complement formats. Like for example, if someone is tracing out the value of a 32-bit ARGB pixel (or seeing it in a debugger) and seeing that their white pixel as a decimal value of -1, they're going to be scratching their head wondering why it's not 2^32-1 (of course seeing it as 0xFFFFFFFF will makes sense, which is why they should know hexadecimal ;-)

If colours in argb format are showing up as signed values I'd worry about the library being used. Argb values shouldn't be signed, argb values are almost always unsigned (or in less common cases are float values between 0 and 1). Frankly if someone is moving on to compiled language programming (ie the C or .Net family), they'll almost certainly cover signed and unsigned numbers in the datatype tutorial for the language they are moving on to. For the computercraft computers, there's unlikely to be a situation where it's going to come up, so at the moment I'm aiming for the average lua programmer who doesn't really need to know that much.
shiphorns #14
Posted 08 March 2013 - 08:01 AM
If colours in argb format are showing up as signed values I'd worry about the library being used. Argb values shouldn't be signed, argb values are almost always unsigned (or in less common cases are float values between 0 and 1). Frankly if someone is moving on to compiled language programming (ie the C or .Net family), they'll almost certainly cover signed and unsigned numbers in the datatype tutorial for the language they are moving on to. For the computercraft computers, there's unlikely to be a situation where it's going to come up, so at the moment I'm aiming for the average lua programmer who doesn't really need to know that much.

Fair enough. That was probably a bad example on my part, because it's one that we see mostly in Java (since Java as no unsigned types). But you're right, this forum is really about the Lua. But then again, who programming in CC Lua is looking at binary.
Pharap #15
Posted 08 March 2013 - 08:58 AM
If colours in argb format are showing up as signed values I'd worry about the library being used. Argb values shouldn't be signed, argb values are almost always unsigned (or in less common cases are float values between 0 and 1). Frankly if someone is moving on to compiled language programming (ie the C or .Net family), they'll almost certainly cover signed and unsigned numbers in the datatype tutorial for the language they are moving on to. For the computercraft computers, there's unlikely to be a situation where it's going to come up, so at the moment I'm aiming for the average lua programmer who doesn't really need to know that much.

Fair enough. That was probably a bad example on my part, because it's one that we see mostly in Java (since Java as no unsigned types). But you're right, this forum is really about the Lua. But then again, who programming in CC Lua is looking at binary.

Another reason for me to hate Java, canon fodder is always welcome lol
Anyone who wants to try and make their own basic encryption system or file condenser, or maybe their own file format
(which I have done for the tower defence game I made in C#, and which many lua game developers would be good to do given the advantages of binary-based files as opposed to text-based).
If nothing else it's good for people to play around with and it stops from thinking substituting chars for strings like "0101" counts as turning it into binary, which tbh is understandable because few courses in school or college teach it.