Zippy - Archive format with (de)compressor CLI & API

Shefla #1

30 posts

Posted 31 May 2015 - 08:37 PM

Hello guys,

Here's my take on an archive format for ComputerCraft.
This little script allows you to compress/decompress archives from CLI or from your scripts using the embedded API.

Features:

Compress file(s) and folder(s) in a single archive
Preserve folder hierarchy
Compatible with commons images formats
Reduce file size a little bit

Install: 23QEETmW


pastebin get 23QEETmW zippy

Usage (CLI):

Archive creation:

Spoiler


zippy compress /source/folder /my/archive

-- example with short syntax
zippy -c /source/folder /my/archive

Archive extraction:

Spoiler


zippy extract /my/archive /dest/folder

-- example with short syntax
zippy -x /my/archive /dest/folder

Usage (API):

zippy.compress ( path, dest )

Spoiler


os.loadAPI('zippy')
zippy.compress('/source/folder', '/my/archive')

zippy.extract ( path, dest )

Spoiler


os.loadAPI('zippy')
zippy.extract('/my/archive', '/dest/folder')

Pyuu #2

224 posts

Posted 31 May 2015 - 08:59 PM

I don't think deleting certain characters from a file can be called "compression".
But the dictionary concept is a start; though I'm not very familiar with compression techniques.

Anyways, nice lightweight packaging program.

Shefla #3

30 posts

Posted 31 May 2015 - 09:17 PM

There should not be any characters deleted in the files after decompression, otherwise it's a bug.
If it's the case, please let me know how to reproduce it.

Bomb Bloke #4

Bomb Bloke's profile picture

7083 posts

Location Tasmania (AU)

Posted 01 June 2015 - 12:40 AM

Just glancing at the paste doesn't make it obvious that there's a bunch of non-ASCII symbols in those "empty"-looking strings. Without trying it, it does look to me like the removed characters should be replaced later. You could call it an example of basic RLE compression.

Shefla #5

30 posts

Posted 01 June 2015 - 11:23 AM

Yeah pastebin syntax highlighter has trouble displaying the special chars (they are ascii btw).
You can click on the "raw" link to see where they are, or better: use a decent text editor.

Didn't know about RLE compression, it seems I reinvented this wheel while looking for a way to save a few bytes.
Thanks for teaching me something. :D/>

To explain things a bit:
The compression code looks into files contents to find specific repeating characters (space, TAB, NL, CR).
If the character is repeated more than two times it's replaced with a control char followed by the number of repetition.
I agree this does not results in a high compression ratio but it's something.

Edited on 01 June 2015 - 03:00 PM

Shefla #6

30 posts

Posted 01 June 2015 - 02:05 PM

Updated to version 0.4-beta with a few fixes regarding relative path handling.

biggest yikes #7

biggest yikes's profile picture

673 posts

Posted 01 June 2015 - 10:21 PM

Tried it with an NFP file, compressed it fine but decompressing it resulted in a 4 MB file when the original file was only 600 bytes. Spaces in the original file seemed to be elongated by a few hundred thousand times.
After compressing the "zippy" file and extracting it, for some reason some lines of the code were saved to folders..

For my machine, the extracting seems to be a little broken, but it does succeed in decreasing the file size a little.

Shefla #8

30 posts

Posted 02 June 2015 - 12:51 AM

Thanks for reporting, update v0.5 should fix the extraction craziness.
Previous archives are not compatible with this update, sorry early beta.

About the folder name containing some code I suspect you to have compressed the zippy script with itself. :D/>
Anyway I have an idea about how to fix it, hopefully coming with the next update adding nested archives feature.

Edited on 01 June 2015 - 10:52 PM

biggest yikes #9

biggest yikes's profile picture

673 posts

Posted 02 June 2015 - 01:29 AM

Shefla, on 02 June 2015 - 12:51 AM said:
About the folder name containing some code I suspect you to have compressed the zippy script with itself. :D/>

Atenefyr, on 01 June 2015 - 10:21 PM said:
After compressing the "zippy" file and extracting it

I like how the start of any file compressed with zippy is

maybe it would be possible to insert the zippy version when archiving in case the format changes and you need to know which version of Zippy to decompress it with.
Kind of like

maybe?

Edited on 01 June 2015 - 11:33 PM

Bomb Bloke #10

Bomb Bloke's profile picture

7083 posts

Location Tasmania (AU)

Posted 02 June 2015 - 01:31 AM

Shefla, on 02 June 2015 - 12:51 AM said:
Anyway I have an idea about how to fix it

Does it involve making sure the script doesn't compress itself? Consider what happens if "/" is set as the path to compress - you might also consider setting up exclusions for ROM and disk drive folders.

Shefla #11

30 posts

Posted 02 June 2015 - 10:41 AM

Atenefyr, on 02 June 2015 - 01:29 AM said:
I like how the start of any file compressed with zippy is

maybe it would be possible to insert the zippy version when archiving in case the format changes and you need to know which version of Zippy to decompress it with.
Kind of like

maybe?

Oh man, nearly exactly what I had in mind for the header.
I would like to keep the length at 8 bytes so I think of removing ETB in favor of version number.
Something like this: SOHzippy1DLE
Be sure I will implement this feature before writing the archive format specification.

Bomb Bloke, on 02 June 2015 - 01:31 AM said:
Does it involve making sure the script doesn't compress itself? Consider what happens if "/" is set as the path to compress - you might also consider setting up exclusions for ROM and disk drive folders.

If you look at the source now you can see the header is stored in a string.
That's what is making zippy do weird things when compressing itself.
Easy fix is to store the header's bytes in a table and construct the header at runtime using a loop.

There's a bunch of assertions making sure files are readable/writeable when they should.
I don't think it is necessary to add exceptions for special folders.
Or maybe provide me an example when it's needed.

Thanks for your interrest, it helps a lot. :)/>

Edited on 02 June 2015 - 10:39 AM

Pyuu #12

224 posts

Posted 02 June 2015 - 12:23 PM

Bomb Bloke, on 01 June 2015 - 12:40 AM said:
Just glancing at the paste doesn't make it obvious that there's a bunch of non-ASCII symbols in those "empty"-looking strings. Without trying it, it does look to me like the removed characters should be replaced later. You could call it an example of basic RLE compression.

Shefla, on 01 June 2015 - 11:23 AM said:
Yeah pastebin syntax highlighter has trouble displaying the special chars (they are ascii btw).
You can click on the "raw" link to see where they are, or better: use a decent text editor.

Didn't know about RLE compression, it seems I reinvented this wheel while looking for a way to save a few bytes.
Thanks for teaching me something. :D/>

To explain things a bit:
The compression code looks into files contents to find specific repeating characters (space, TAB, NL, CR).
If the character is repeated more than two times it's replaced with a control char followed by the number of repetition.
I agree this does not results in a high compression ratio but it's something.

Ahh I see. Thanks for clearing that up. CC might have some trouble with special characters if you start adding more to the dictionary and moving towards 127 or 128.

Shefla #13

30 posts

Posted 02 June 2015 - 12:43 PM

You're welcome.

I'm using this asciitable to find out which special chars I can use.
Letting away [RS] and [US] because they are allready used in nft image format.

Bomb Bloke #14

Bomb Bloke's profile picture

7083 posts

Location Tasmania (AU)

Posted 02 June 2015 - 01:23 PM

If you're planning on capturing more repeated characters, then my recommendation is "do all of them". Ditch your dictionary and repeated usage of gsub, and instead go through the string once with sub, comparing each character to the one you read before it.

Shefla, on 02 June 2015 - 10:41 AM said:
Or maybe provide me an example when it's needed.

:blink:/> I don't think I can top the example I already gave you. :huh:/>

biggest yikes #15

biggest yikes's profile picture

673 posts

Posted 02 June 2015 - 09:24 PM

Shefla, on 02 June 2015 - 10:41 AM said:
Oh man, nearly exactly what I had in mind for the header.
I would like to keep the length at 8 bytes so I think of removing ETB in favor of version number.
Something like this: SOHzippy1DLE

how about

Shefla #16

30 posts

Posted 03 June 2015 - 05:43 PM

Bomb Bloke, on 02 June 2015 - 01:23 PM said:
:blink:/> I don't think I can top the example I already gave you. :huh:/>

Maybe I misunderstood what you mean (non-native english speaker here).
My point is: zippy lets you do what you want but if you try something silly (writing to /rom for example) it will fail with an error message explaining what's wrong.

Atenefyr, on 02 June 2015 - 09:24 PM said:
how about

Yes why not, maybe having a special char at the end of the header make it simpler to parse… or not?

Currently working on the next version which will feature a proper token dictionary.
At this time its compression ratio is between 10 & 30%, not too bad. ^_^/>

Creator #17

2679 posts

Location You will never find me, muhahahahahaha

Posted 03 June 2015 - 06:07 PM

What if the text is not in english? Or if it ain't a text at all?

Shefla #18

30 posts

Posted 03 June 2015 - 06:23 PM

Obviously if there's no text the dict will be empty.
Currently I use gmatch with pattern %w+ to tokenize file contents.
I don't think there's something special to do to handle non english chars.

Creator #19

2679 posts

Location You will never find me, muhahahahahaha

Posted 03 June 2015 - 06:50 PM

I see. How big will the dictionary be?

Shefla #20

30 posts

Posted 03 June 2015 - 09:43 PM

There's no fixed dictionnary size.
Each token is evaluated based on string length and number of occurences.
If it's possible to gain some space for this token then it is added to the dict.

Wait tomorrow, you'll see the updated code. :)/>

Edited on 03 June 2015 - 07:44 PM

Creator #21

2679 posts

Location You will never find me, muhahahahahaha

Posted 03 June 2015 - 10:13 PM

Oh, so that is how it works. It has nothing to do with languages.

Bomb Bloke #22

Bomb Bloke's profile picture

7083 posts

Location Tasmania (AU)

Posted 04 June 2015 - 12:35 AM

Shefla, on 03 June 2015 - 05:43 PM said:
Maybe I misunderstood what you mean (non-native english speaker here).
My point is: zippy lets you do what you want but if you try something silly (writing to /rom for example) it will fail with an error message explaining what's wrong.

Well, let's say you want to archive everything you've put on your computer. You do:

zippy / archive

In addition to getting your files, zippy will also get:

1) Itself
2) ROM
3) Any disks that're present

Obviously there's no point in getting the first two - compressing these is a waste of disk space. The third is more a matter of taste, but I'd suggest you exclude that too.

Shefla #23

30 posts

Posted 04 June 2015 - 01:33 AM

If you run a command like so its purpose is clear and you get what you ask for.
For example, under linux nothing forbids you from doing something crazy like this:


sudo rm -rf /

I agree adding exceptions would make the whole computer backup case easier.
But zippy aims to be an archiver and not a backup tool.

Anyway thanks for explanations, it help me to figure out how the zippy API would be used by such a backup tool.

biggest yikes #24

biggest yikes's profile picture

673 posts

Posted 04 June 2015 - 01:55 AM

Shefla, on 04 June 2015 - 01:33 AM said:
If you run a command like so its purpose is clear and you get what you ask for.
For example, under linux nothing forbids you from doing something crazy like this:
sudo rm -rf /
I agree adding exceptions would make the whole computer backup case easier.
But zippy aims to be an archiver and not a backup tool.

Anyway thanks for explanations, it help me to figure out how the zippy API would be used by such a backup tool.

His point is that if you compress the whole drive the rom folder will be saved, which is pointless since the rom folder comes with all computers, therefor shouldn't be archived to save space on the package.

Edited on 03 June 2015 - 11:56 PM

Bomb Bloke #25

Bomb Bloke's profile picture

7083 posts

Location Tasmania (AU)

Posted 04 June 2015 - 02:00 AM

Seems he understands that now, but has decided he doesn't mind.

biggest yikes #26

biggest yikes's profile picture

673 posts

Posted 04 June 2015 - 02:03 AM

Bomb Bloke, on 04 June 2015 - 02:00 AM said:
Seems he understands that now, but has decided he doesn't mind.

In that case I'm the one misinterpreting the text.

Shefla #27

30 posts

Posted 04 June 2015 - 11:29 AM

Ok I changed my mind, it seems that making a backup of the entire computer would be a common use case.
I will probably end up adding a CLI option "backup" to handle this task.

Why not adding exceptions to the (de)compression routine instead?
By hardcoding filesystem assumptions into the compress/extract functions it breaks compatibility with other environments.
For example an OS running a virtual filesystem may have a writeable /rom folder.

Lupus590 #28

Lupus590's profile picture

2427 posts

Location UK

Posted 04 June 2015 - 11:49 AM

Have it ask when it's about to compress the /rom folder? I can't imagine someone leaving (AFKing) a CC computer to do stuff, the CC file system is not huge.

Shefla #29

30 posts

Posted 04 June 2015 - 02:41 PM

Updated pastebin with zippy version 0.6-rc1
This is near of a full rewrite of the (de)compression code and hopefully the last breaking change to the archive format.

I would like to obtain a stable version 1 of the archive format before adding new features.
That way anyone could start using it without fear of data corruption/losses.
Any help to troubleshot this thing would be much appreciated.

Planned features:

Nested archives
Auto-extractible archives
Interactive system backup maker

Creator #30

2679 posts

Location You will never find me, muhahahahahaha

Posted 04 June 2015 - 03:19 PM

Or, have oit do something liken -except:rom;bla/ble/bleh;other/folder

biggest yikes #31

biggest yikes's profile picture

673 posts

Posted 04 June 2015 - 05:23 PM

I made a program that turns a zippy archive into a self-extractor, are you proud of me? :P/>
http://pastebin.com/rV88JWtn made http://pastebin.com/p0iE03YF

Shefla #32

30 posts

Posted 04 June 2015 - 05:47 PM

Well, at least it's a start. ^_^/>
Hoperfully it's not required to embed zippy totally to make an auto-extracting archive.

Also please respect the code license.

The MIT License is a free software license originating at the Massachusetts Institute of Technology (MIT).^[1] It is a permissive free software license, meaning that it permits reuse within proprietary software provided all copies of the licensed software include a copy of the MIT License terms and the copyright notice.

MKlegoman357 #33

1140 posts

Location Kaunas, Lithuania

Posted 04 June 2015 - 06:14 PM

Shefla, on 04 June 2015 - 02:41 PM said:
Planned features:
*Nested archives

I'd suggest you to just add full binary support, rather than specific binary files.

Edited on 04 June 2015 - 04:14 PM

biggest yikes #34

biggest yikes's profile picture

673 posts

Posted 04 June 2015 - 06:18 PM

Shefla, on 04 June 2015 - 05:47 PM said:
Well, at least it's a start. ^_^/>
Hoperfully it's not required to embed zippy totally to make an auto-extracting archive.

Probably not, I coded that in about 10 minutes :P/>
You could import it from the web if you wanted, but of course you'd have to save the file before extracting it, which I aimed not to do.

Shefla, on 04 June 2015 - 05:47 PM said:
Also please respect the code license.

Done

EDIT: Updated ZSF to v1.1, now gets API from the web (also loads the "zippy" file if given client-side)
EDIT 2: Updated ZSF to v1.2, chance of nested multi-line strings breaking the self-extractor is now very low

Edited on 04 June 2015 - 06:48 PM

Shefla #35

30 posts

Posted 04 June 2015 - 07:28 PM

MKlegoman357, on 04 June 2015 - 06:14 PM said:
I'd suggest you to just add full binary support, rather than specific binary files.

You're absolutly right! Not sure if its for v1 or v2 though.

Atenefyr, on 04 June 2015 - 06:18 PM said:
Shefla, on 04 June 2015 - 05:47 PM said:
Also please respect the code license.
Done

Thanks.

I see you use zsf file extension for auto-extractible archive which I'm fine with.
What about the "simple" zippy archive file extension… zyp?
Or is it even needed since file recognition is base on specific header string?

biggest yikes #36

biggest yikes's profile picture

673 posts

Posted 04 June 2015 - 08:22 PM

Shefla, on 04 June 2015 - 07:28 PM said:
I see you use zsf file extension for auto-extractible archive which I'm fine with.

"Zippy Self (extracting) File"

Shefla, on 04 June 2015 - 07:28 PM said:
What about the "simple" zippy archive file extension… zyp?

Chances are ZSF will be different from ZYP, and either way it isn't implemented yet, so right now I'm content with ZSF.

Shefla, on 04 June 2015 - 07:28 PM said:
Or is it even needed since file recognition is base on specific header string?

Why is ".png" needed? All png files start with "‰PNG".
Why is ".jpg" needed? All jpeg files start with

It's preferable just so you can tell you have to use zippy to extract it without opening the file, and it's easier for operating systems to handle because they don't have to open the file to tell if it's a zippy file, they can just glance at it's name.

Same goes for .zsf files; all .zsf files start with

where v1.2 is replaced with the ZSF version used to compile it, but it's still easier to glance at the file's name rather than reading the first 3 lines of the program.

Edited on 04 June 2015 - 06:51 PM