Posted 21 May 2013 - 03:16 AM
I know it returns a handle as if it was a file opened in readonly text mode, but it's still just bytes. It seems that sometimes an extra byte is thrown in though… It's reproducable but I haven't seen a pattern/reason why.
I have searched several times for an answer since January but have not been able to find one.
Given this (the actual thing I'm working on), save it as 'getNBS' for an example:
Then this call should download a binary file from a server and save it into a new file. It will also print the hex to the screen.
getNBS http://geckocodes.org/James0x57/zeldaSecret.nbs test.nbs
If you use your browser to download the same file, you can open it in a hex editor to compare. At offset 0x27, the byte is 0xE8 but when you read it from HTTP Api, that byte becomes halfword 0xC3A8.
The rest of the data is intact. Since the file doesn't end in 0x0A or 0x0D (EOL characters), when you do readAll(), it should be an exact duplicate of the data. (…right?)
Interestingly, the same byte is affected in the same way when you read the handle returned from http.get() by doing repeated .readLine()'s on it.
It might be an issue with sting.byte() potentially too…
Any thoughts or insight into this?
End goal of this topic is to determine if this behavior is predictable so I can address and correct it on the fly OR, if it's actually a bug, to shine some light on it.
Thank you for your time!
(and thank you very much for ComputerCraft!)
EDIT:
Whoa, just figured it out. No idea why it came to me but I thought of an accented e right after I posted it, then checked:
UTF-8 CodePoint:
è = 00E8
UTF-8 Hex:
è = C3A8
So that's the issue, it's converting codepoints to hex. Odd though because it's automatically assuming the 00 on the left when it reads the 0xE8 byte by itslef to complete the 16bit codepoint, before sting.byte() outputs the UTF-8 Hex value instead.
I'll dig into a solution after work tomorrow. - would still like a thread, if possible, to share whatever I find. Hopefully it'll help someone else out
I have searched several times for an answer since January but have not been able to find one.
Given this (the actual thing I'm working on), save it as 'getNBS' for an example:
local args = {...};
if #args == 2
and string.find(args[1], ".nbs", -4, true) ~= nil
and string.find(args[2], ".nbs", -4, true) ~= nil then
local url = args[1];
local fn = args[2];
local fh = fs.open('songs/'..fn, 'wb');
local dt = http.get(url).readAll();
local ret = "";
if dt ~= nil and #dt > 0 then
local x = 1;
while x < #dt do
local byt = string.byte(dt, x);
ret = ret .. string.format("0%X ", byt):sub(-3);
if #ret == 48 then
print(ret);
ret = "";
end
fh.write(byt);
x = x+1;
end
end
print(ret);
fh.close();
end
Then this call should download a binary file from a server and save it into a new file. It will also print the hex to the screen.
getNBS http://geckocodes.org/James0x57/zeldaSecret.nbs test.nbs
If you use your browser to download the same file, you can open it in a hex editor to compare. At offset 0x27, the byte is 0xE8 but when you read it from HTTP Api, that byte becomes halfword 0xC3A8.
The rest of the data is intact. Since the file doesn't end in 0x0A or 0x0D (EOL characters), when you do readAll(), it should be an exact duplicate of the data. (…right?)
Interestingly, the same byte is affected in the same way when you read the handle returned from http.get() by doing repeated .readLine()'s on it.
It might be an issue with sting.byte() potentially too…
Any thoughts or insight into this?
End goal of this topic is to determine if this behavior is predictable so I can address and correct it on the fly OR, if it's actually a bug, to shine some light on it.
Thank you for your time!
(and thank you very much for ComputerCraft!)
EDIT:
Whoa, just figured it out. No idea why it came to me but I thought of an accented e right after I posted it, then checked:
UTF-8 CodePoint:
è = 00E8
UTF-8 Hex:
è = C3A8
So that's the issue, it's converting codepoints to hex. Odd though because it's automatically assuming the 00 on the left when it reads the 0xE8 byte by itslef to complete the 16bit codepoint, before sting.byte() outputs the UTF-8 Hex value instead.
I'll dig into a solution after work tomorrow. - would still like a thread, if possible, to share whatever I find. Hopefully it'll help someone else out
Edited on 21 May 2013 - 12:54 PM