This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
cdel's profile picture

Minimising File Sizes

Started by cdel, 20 March 2015 - 09:08 AM
cdel #1
Posted 20 March 2015 - 10:08 AM
I have noticed a lot of different posts on various topics all depicting how to compress maps for games etc, I have attempted to try and compress files myself but I have no idea what im doing, anyone care to explain? :)/>
Bomb Bloke #2
Posted 20 March 2015 - 10:29 AM
There are lots of different tactics, but a good simple starting point when thinking about compression is RLE - run-length encoding.

The idea is that if you have a set of data with repeating values, then instead of writing them as-is, you might output a special symbol to indicate that the next value is repeated, then you might output the symbol itself, then you might output a number indicating how many times it's to be repeated.

The values you output can be in whatever format you like, so long as you're able to read them back later.

A very simple example, using a table filled with values (though you don't even need to use a table!):

{ "block", "dirt", "air", "air", "air", "dirt", "stone", "stone" }

Here we've got a consecutive run of three "air"s in a row, and a consecutive run of two "stone"s in a row. So to condense that, we might do something like:

{ "block", "dirt", {"air", 3}, "dirt", {"stone", 2} }

You can hopefully see how trivial it would be to write the code which would turn the original table into this, and the code to turn this back into the original form.
cdel #3
Posted 20 March 2015 - 10:32 AM
thank you :)/>
Lupus590 #4
Posted 20 March 2015 - 11:14 AM
Another more advanced technique uses a dictionary to look for repeated text.

Example:
This is a string of words to be used as a compression example. This example uses a dictionary of symbols which represents repeated text. A symbol can represent parts of words or even two or more, "to be or not to be".
can be compressed to:
Dict:
1 This
2 use
3 symbol
4 words
5 to be
6 rep
7 example
8res
Text:
\1 is a string of \4 to be \2d as a comp\8sion \7. \1 \7 \2s a dictionary of \3s which \6\8ents \6eated text. A \3 can \6\8ent parts of \4 or even two or more, "\5 or not \5".

I think that's enough of me doing this manually for an example. Obviously try to not have your symbol/escape characters as something that can be printed on CC as people may want to use these. You will want to remove/reduce the size of the headings too as that is 5 characters lost too, removing the returns from the dictionary may help too (I know that MSWindows uses two characters for marking the end of a line)

I would recommend that you get Bomb Blokes method working first.
There are other methods too: http://en.wikipedia....ata_compression
Edited on 20 March 2015 - 10:17 AM