Compression Program - Compress And Decompress Files!

eniallator #1

56 posts

Posted 01 June 2016 - 09:37 AM

I have recently made a program that will compress entire files. I've tested it out on 24kb file and the program brought it down to just 12kb. Compression and decompression works 100% as well (of course if you run into any bugs please report them here so i can fix them).

Here is the pastebin for the program if you are interested!

How to use:

To use the program you will need to provide program arguments:

The first argument after that can either be com or decom, com being compress and decom being decompress.
The second argument is the input file that will be either being compressed or decompressed (as specified in the first argument)
The third argument is the output file of the compression/decompression (again, as specified in the first argument)

Explanation:

The way the compression works is firstly, it will scan the file and make a table of all the possible word combinations (in chronological order). After it's done that, it will create tables of the entire file but instead of putting the words in the table, it will instead put the corresponding index of the word from the table it's made. When it's done this, it will then create the new file string and instead of putting indexes of the words, it will convert them into characters through string.char(). It will then write the new compressed contents to a file of your choosing.

The way the decompression works is through reverse engineering how the compression worked: it will first look on the first line of the compressed file for all the possible word combinations and then it will look at the rest of the file for the file body. Then it will convert the bytes into the index integers that the word table is expecting. Then finally it will write the corresponding words to the file contents string and write that to the file that you have chosen.

Special characters:

I also have special characters for the bytes:

I have used 0 for if the index of the program is higher than 254 (there can only be 256 characters and its 254 because im already using 0 and there's another character that's an exception which i will give more information on below) basically the way that the 0's will work, is if say I have a number like 510, because 254 can go into it twice, there will be two 0 characters that are then followed by 2 because that's the remainder after dividing 510 by 254.

I have used 1 for indicating that the next line will start where that character is.

I have also got two space characters. character 3 will just indicate that there's only one space but character 2 is for multiple spaces. For example in programs, there will be a lot of white space and the way i have made that more efficient is by having character 2 and then the number of spaces in that block.

I'm having to avoid character 13 because when LUA converts it back into it's index, character 10 and character 13 are the same. For this reason I'm just completely avoiding using character 13 altogether.

Edited on 01 June 2016 - 07:39 AM

Emma #2

218 posts

Location tmpim

Posted 01 June 2016 - 04:28 PM

Nice! Great job. Will check it out.

Not sure if this is intended, but after compression there is a large comment at the top containing much of the uncompressed file.
Reproduce: Compress a copy of the compress program :P/>

Edited on 01 June 2016 - 02:33 PM

eniallator #3

56 posts

Posted 01 June 2016 - 07:31 PM

incinirate, on 01 June 2016 - 04:28 PM said:
Nice! Great job. Will check it out.

Not sure if this is intended, but after compression there is a large comment at the top containing much of the uncompressed file.
Reproduce: Compress a copy of the compress program :P/>

It's intended, thats the word index of the entire file. I'm not sure how to compress it otherwise :P/> but it still does compress ;)/>

Edited on 01 June 2016 - 05:32 PM

Emma #4

218 posts

Location tmpim

Posted 02 June 2016 - 12:02 AM

eniallator, on 01 June 2016 - 07:31 PM said:
incinirate, on 01 June 2016 - 04:28 PM said:
Nice! Great job. Will check it out.

Not sure if this is intended, but after compression there is a large comment at the top containing much of the uncompressed file.
Reproduce: Compress a copy of the compress program :P/>

It's intended, thats the word index of the entire file. I'm not sure how to compress it otherwise :P/> but it still does compress ;)/>

Ah, neat.

Gorzoid #5

Gorzoid's profile picture

44 posts

Posted 02 June 2016 - 04:59 PM

eniallator, on 01 June 2016 - 09:37 AM said:
<snip>
I'm having to avoid character 13 because when LUA converts it back into it's index, character 10 and character 13 are the same. For this reason I'm just completely avoiding using character 13 altogether.

I'm guessing this is probably because your using "w" instead of "wb" fs api needs to be in "wb" mode to accept ascii chars that are not raw text

eniallator #6

56 posts

Posted 02 June 2016 - 08:03 PM

Gorzoid, on 02 June 2016 - 04:59 PM said:
eniallator, on 01 June 2016 - 09:37 AM said:
<snip>
I'm having to avoid character 13 because when LUA converts it back into it's index, character 10 and character 13 are the same. For this reason I'm just completely avoiding using character 13 altogether.
I'm guessing this is probably because your using "w" instead of "wb" fs api needs to be in "wb" mode to accept ascii chars that are not raw text

I decided to use "w" because of the way that im handling the word list. Without using that i would probably have to have another special character that would be the end of the word list :P/>