This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
columna1's profile picture

Binary data handling bug

Started by columna1, 17 December 2013 - 04:59 PM
columna1 #1
Posted 17 December 2013 - 05:59 PM
Im not sure what is happening but whenever i get binary data and load it into a string it ends up becoming currupted and it is very annoying, I can do the same things in other lua implementations like love2d or my custom lua implementation with c++ but in computercraft something makes it re-format in a really odd way here is an example:


For both of these examples I used the exact same code to create a string that I dumped into a file, of course the file manipulation was different but it was corrupted before I dumped to the file. and this causes a lot of problems for me (like when I try to read audio files it corrupts stuff, and also when i try to make binary transfer over 2 wires and also downloading binary files from the web etc…) so if anyone has any info on this I would like to know…

here is the code I used:

str = ""
for i = 1,255 do
str = str..string.char(i)
end
file = fs.open("test","w") -- I also tried "wb" and that worked but it serves no useful purpose if i cant
-- take the data and read/write it like this aka having to write a byte at a time with decimal rep... >.>
file.write(str)
file.close()
Edited on 17 December 2013 - 05:07 PM
oeed #2
Posted 17 December 2013 - 07:38 PM
Well, one of the reasons why it's not working as you may have intended is because ComputerCraft replaces unknown characters with '?'. However, I haven't done enough work with binary to know what the corruption problem is. Can you post your entire code?
Bomb Bloke #3
Posted 17 December 2013 - 07:42 PM
I ran into something similar when I started learning Java - tried using some function intended for writing text in order to write anything, and it stripped out all the characters it didn't consider to be standard ASCII. Or something. You'll notice you're only getting access to the first 128 symbols.

So if you want to write binary data, tell Lua you want to write binary data. If you want to write a whole bunch of data with a single function call, then write a function which breaks up your dataset into individual bytes, and call that!
Edited on 17 December 2013 - 06:57 PM
Lyqyd #4
Posted 17 December 2013 - 07:57 PM
Character bytes over 127 don't work in text-mode. Use binary mode instead.
columna1 #5
Posted 17 December 2013 - 08:06 PM
I ran into something similar when I started learning Java - tried using some function intended for writing text in order to write anything, and it stripped out all the characters it didn't consider to be standard Unicode. Or something. You'll notice you're only getting access to the first 128 symbols.

So if you want to write binary data, tell Lua you want to write binary data. If you want to write a whole bunch of data with a single function call, then write a function which breaks up your dataset into individual bytes, and call that!

Well i was trying to avoid that for a couple good reasons
first: there have been a few libraries that read binary files made in pure lua and sometimes can even deeply count on this functionality to be there (like one i wanted to encode files into base-64 so i could transfer binary files to/from my webserver with computercraft but the files kept being currupted when being encoded/decoded) and some of these libraries I have made myself
second: is there a way to read the responses from http.get() objects as a binary file reading a byte at a time? and if so is that even worth doing as it would take forever to loop a few kilobyte pic let alone a few megabyte one
Third: using a method like the one you describe is going to take up significantly more memory and processing power because of it (lua numbers are by default 4 byte floating point numbers but i believe that they can be assigned to be even 8 byte floating point numbers) meaning you suddenly explode a 1kb file into 4-8 kb in memory

i guess my point is that I am used to this functionality and I want to use it with CC too so if there is a way to do these things I would like to know (or have it fixed)
Edited on 17 December 2013 - 07:10 PM
Bomb Bloke #6
Posted 18 December 2013 - 02:52 AM
Sorry, you can't get binary handles from the HTTP API. But since you mention it, working around this exact sort of issue is the only point to base64 I'm aware of so if you and your web server're happy to use that then you should be ok. :)/>

Using binary handles to read/write local files also shouldn't be a great issue in terms of efficiency (putting aside for now that you're wanting to run this through a Lua VM running inside a Java VM which is also running the rest of a MineCraft server…). LuaJ is more likely than not using buffered readers/writers, meaning that even if you ask it to read a single byte, it's actually reading a bunch of the things into server RAM in one go then pulling further reads from there until that cache runs out. Likewise for writes - the drive access won't be occurring until you've actually queued off a sufficient amount of data to warrant it (or until you've closed the file handle).

And even if you don't like the data type that hands you back, you don't need to leave it in that format. Load the bytes into a table as they come in for eg, then table.concat() the result when you've got them all (if you really want it as a single string).
Edited on 18 December 2013 - 01:55 AM
columna1 #7
Posted 18 December 2013 - 12:20 PM
Sorry, you can't get binary handles from the HTTP API. But since you mention it, working around this exact sort of issue is the only point to base64 I'm aware of so if you and your web server're happy to use that then you should be ok. :)/>

Using binary handles to read/write local files also shouldn't be a great issue in terms of efficiency (putting aside for now that you're wanting to run this through a Lua VM running inside a Java VM which is also running the rest of a MineCraft server…). LuaJ is more likely than not using buffered readers/writers, meaning that even if you ask it to read a single byte, it's actually reading a bunch of the things into server RAM in one go then pulling further reads from there until that cache runs out. Likewise for writes - the drive access won't be occurring until you've actually queued off a sufficient amount of data to warrant it (or until you've closed the file handle).

And even if you don't like the data type that hands you back, you don't need to leave it in that format. Load the bytes into a table as they come in for eg, then table.concat() the result when you've got them all (if you really want it as a single string).
just to clear a thing or two up, with the efficiency problem: even if the binary files are using buffered readers/writers you still end up copying that into the memory for the program so it still can be up to 8 times less memory with strings but that is kinda beside the point now, when I used different base64 encoders/decoders the data kept getting corrupted so i would have large portions of the file be random garbage that i could not use nor salvage, if it worked it would all be golden and my web-server would be more than happy but if you have a base64 encoder/decoder for lua that you know works 100% with CC then I would be very grateful to you for that.
so I have to adapt my libraries and make them slower to work with CC which is kinda sad but oh well what can you do…
Edit: i just tested some base-64 encoders/decoders with love2d and it seems that it isn't computercraft doing the currupting but man that is annoying… maybe its a mis-match with php base-64 and the lua one.
Edited on 18 December 2013 - 11:39 AM
Lyqyd #8
Posted 18 December 2013 - 12:32 PM
You might take a look at the packFile and unpackAndSaveFile functions in nsh. I've successfully transmitted png and jpg image files between computers in-game, so I know the base64 implementation is viable. If you still have problems with getting the files from a web server, there may be other issues to investigate there.
columna1 #9
Posted 18 December 2013 - 12:51 PM
You might take a look at the packFile and unpackAndSaveFile functions in nsh. I've successfully transmitted png and jpg image files between computers in-game, so I know the base64 implementation is viable. If you still have problems with getting the files from a web server, there may be other issues to investigate there.

thanks i didn't try the library you sent but i did some test with other libraries and this is the result:

when encoded with lua, lua can decode just fine but with php you get random garbage
when encoded with php you can decode just fine with php but in cc it seems you get double the ammount of newline chars… but it kinda works…

so the problem I was having was with the mis-match in base-64 in lua and php hmm… how would i fix that

also i might try the same test with the library you recomended and see what happens
Edited on 18 December 2013 - 11:52 AM
Bomb Bloke #10
Posted 18 December 2013 - 05:13 PM
I'm kinda getting the impression that you're thinking Lua has its own base-64 converter built-in. Rather, it only "knows" about as much as the "libraries" you're running tells it, so if the implementations you've seen thus far are incompatible that's got nothing to do with Lua and everything to do with the code you're running.

Note that "incompatible" is not the same thing as "broken" - there's more then one implementation of base-64 out there, after all.
columna1 #11
Posted 18 December 2013 - 08:16 PM
I'm kinda getting the impression that you're thinking Lua has its own base-64 converter built-in. Rather, it only "knows" about as much as the "libraries" you're running tells it, so if the implementations you've seen thus far are incompatible that's got nothing to do with Lua and everything to do with the code you're running.

Note that "incompatible" is not the same thing as "broken" - there's more then one implementation of base-64 out there, after all.
I realize that and what I meant was that originally I thought that something in CC was screwing with it (the thing this post is about) but I realized that wasn't the case and I have tried different implementations although they seemed to do it in much the same way but thanks for that

also, if I thought that base-64 was built into lua then I would not be using lua to convert it because I would not know that I needed a library or a custom function to do so…

and one more thing I'm pretty sure I did not say "the base-64 lua implementation is broken" but rather I said stuff along the lines of "it did not work" and "there may be a mis-match" as well as "kept being corrupted" meaning that I knew it was something along the lines of a different format
Edited on 18 December 2013 - 07:24 PM
ElvishJerricco #12
Posted 20 February 2015 - 01:49 AM
I think this would be the topic to use to bring up new information on the LuaJ binary string bug. Jarle212 and I have found something new. The issue boils down to LuaJ trying to be smart about UTF8.

To fix, avoid all uses of LuaString.tojstring(), and LuaString.valueOf(String), instead using a custom encoder and decoder.


public static LuaString encodeLuaString(String str) {
	char[] c = str.toCharArray();
	byte[] b = new byte[c.length];
	for (int i = 0; i < c.length; ++i) {
		b[i] = (byte)c[i];
	}
	return LuaString.valueOf(B)/>/>;
}

public static String decodeLuaString(LuaString str) {
	byte[] b = str.m_bytes;
	char[] c = new char[b.length];
	for (int i = 0; i < b.length; ++i) {
		c[i] = (char)b[i];
	}
	return new String(c);
}
SquidDev #13
Posted 02 April 2015 - 12:05 PM
Am I allowed to bump this to be fixed in 1.74? I'm pretty sure you only need to replace this the valueOf and tojstring methods in the LuaString class itself to use these functions instead.
ElvishJerricco #14
Posted 02 April 2015 - 09:35 PM
Am I allowed to bump this to be fixed in 1.74? I'm pretty sure you only need to replace this the valueOf and tojstring methods in the LuaString class itself to use these functions instead.

It's definitely an extremely easy fix. But Dan may want to avoid modifying LuaJ itself. But even then, it's still relatively easy to just avoid using the built in String↔LuaString methods, instead using String↔byte[]↔LuaString.
Edited on 02 April 2015 - 07:35 PM
SquidDev #15
Posted 02 April 2015 - 09:42 PM
It's definitely an extremely easy fix. But Dan may want to avoid modifying LuaJ itself. But even then, it's still relatively easy to just avoid using the built in String↔LuaString methods, instead using String↔byte[]↔LuaString.

I'm going with the fact that Dan's added the abandon method to LuaThreads to it should be fine. Please Dan (insert puppy eyes image here). Pretty please?
MindenCucc #16
Posted 16 June 2015 - 12:36 PM
This isn't caused by Lua or LuaJ. This is a Java bug. I described it here.
SquidDev #17
Posted 16 June 2015 - 12:47 PM
This isn't caused by Lua or LuaJ. This is a Java bug. I described it here.

From memory there are certainly some issues caused by LuaJ. I can't test right now but I'm pretty sure this will fail:

local msg = string.char(223, 129, 243)
os.queueEvent("something", msg) -- Basically anything above 127
local _, result = coroutine.yield()

assert(result == msg)

However, the fix proposed by ElvishJerricco doesn't work as tested here. The real issue lies in the fact that strings are being used to represent binary data, and text ~= bytes.
Edited on 16 June 2015 - 10:59 AM
Bomb Bloke #18
Posted 16 June 2015 - 12:57 PM
This isn't caused by Lua or LuaJ. This is a Java bug. I described it here.

Er, no. Not exactly. BufferedReaders are built to work with text. If a project calls them inappropriately, then sure, there's a problem, but that doesn't mean the problem's with Java.
Edited on 16 June 2015 - 10:57 AM
MindenCucc #19
Posted 16 June 2015 - 02:25 PM
Lol, this is really unexpected.

Spoiler


File f1 = new File("bintest1");
File f2 = new File("bintest2");
if(!f1.exists()) f1.createNewFile();
if(!f2.exists()) f2.createNewFile();

StringBuilder sb = new StringBuilder("");
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(f1.getAbsoluteFile()));
DataOutputStream dos = new DataOutputStream(bos);
String teste = "\255\255\255";
char[] ia = teste.toCharArray();
for(int i = 0; i != ia.length; i++)
{
    sb.append(String.valueOf(ia[i]));
    bos.write((int)ia[i]);
}
bos.flush();

dos.writeBytes(sb.toString());
dos.flush();

dos.close();
bos.close();

produces this result:
Spoiler
http://puu.sh/iqXcH/51abcee2ad.png

The funny thing about the upper snippet is that LuaJ uses toCharArray() too :P/>
Bomb Bloke #20
Posted 16 June 2015 - 04:38 PM
Does it help if I point out that 0xAD = o255?
MindenCucc #21
Posted 16 June 2015 - 04:55 PM
But why is it in octal? It should be 0xFF, not 0xAD
ardera #22
Posted 16 June 2015 - 09:50 PM
But why is it in octal? It should be 0xFF, not 0xAD
I think it's not uncommon to specify character codes as octal numbers. In many ASCII tables, the octal value is given too. Why is that? I don't know.

I think this bug is actually 2 bugs. The first one is the one ElvishJerrico already said: LuaJ uses UTF-8, so this corrupts bytes larger than 127. This means that the byte after the byte > 127 is consumed, and doesn't appear in the Java String. I had issues with that before, bytecode dumping is problematic because of this. (See Here)
The second one is the one MindenCucc explained, although I honestly don't really think that this second bug causes the most problems.
Edited on 17 June 2015 - 03:19 AM
Bomb Bloke #23
Posted 17 June 2015 - 03:06 AM
But why is it in octal? It should be 0xFF, not 0xAD

If you wanted 0xFF, you would specify \377, or \u00FF. Java uses octal escapes because C does it too.

https://docs.oracle.com/javase/specs/jls/se7/html/jls-3.html#jls-3.10.6