Computercraft processing bytes weirdly - ComputerCraft Forums (archive)

olie304 #1

olie304's profile picture

16 posts

Posted 04 January 2018 - 09:28 PM

Lately I have been having issues with the way Computercraft reads and processes bytes in a file. I am on an older version of CC (1.7.10) so I do not have access to the byte reading modes. What I think happens is it skips some bytes and then merges them with others for some reason. When using Mimic or repl.it with my same program it works just fine, but, using the ingame system gives the strange results. Here is some data for comparison:

(Byte Val, Hex Val, Char Size)
- Output from the computer ingame

(Byte Val, Hex Val, Char Size)
- Output from Mimic

+ Actual values:

- Hexadecimal (in UTF-8 encoding)

- Characters (in UTF-8 encoding)

- Characters (in browser, probably also UTF-8 for you)

! ! ÿª #

- Code: https://pastebin.com/pzWiQPfD (UTF-8 decoding snippets taken from https://github.com/Stepets/utf8.lua)

Is this an encoding issue or coding issue? I am not too knowledgeable with encodings so the file on my computer could be using the wrong encoding thus causing the issue. Thanks.

Edited on 04 January 2018 - 08:31 PM

SquidDev #2

SquidDev's profile picture

1426 posts

Location Does anyone put something serious here?

Posted 04 January 2018 - 10:01 PM

olie304, on 04 January 2018 - 09:28 PM said:
Is this an encoding issue or coding issue? I am not too knowledgeable with encodings so the file on my computer could be using the wrong encoding thus causing the issue. Thanks.

It's an encoding issue, but sadly there isn't anything you can do about it. This is a bug in the Lua runtime that ComputerCraft uses, which was only fixed in later versions (ComputerCraft 1.76 to be precise).

olie304, on 04 January 2018 - 09:28 PM said:
Lately I have been having issues with the way Computercraft reads and processes bytes in a file. I am on an older version of CC (1.7.10) so I do not have access to the byte reading modes.

Binary mode still exists on older versions of ComputerCraft - it's just the latest version adds some new features. All methods documented on the wiki should work fine.


local handle = fs.open("my_file", "rb")
local buffer = {}
for byte in handle.read do table.insert(buffer, string.char(byte)) end
handle.close()

--# contents should be the raw bytes of the file
local contents = table.concat(buffer)

Bomb Bloke #3

Bomb Bloke's profile picture

7083 posts

Location Tasmania (AU)

Posted 05 January 2018 - 06:14 AM

Or you can serialise everything in one go, by dumping the numeric values straight into the table and then unpacking the whole thing to the string.char() function all at once.

Sorta worth noting that binary mode is missing from web-handles prior to CC1.80.

olie304 #4

olie304's profile picture

16 posts

Posted 13 January 2018 - 07:00 AM

Bomb Bloke, on 05 January 2018 - 06:14 AM said:

Can you go more in depth on this? How do things like your GIF API or BBpack compression work when it comes to binary data?

Bomb Bloke #5

Bomb Bloke's profile picture

7083 posts

Location Tasmania (AU)

Posted 13 January 2018 - 03:17 PM

I simply open my target file in binary mode and start pulling in byte values, converting them to other data types whenever need be. Exactly how I perform such conversions depends on the types the file needs to make use of (GIFs for eg use quite a few data types) - a bit of bit shifting for ints and longs, some bit.band()'ing for bit fields, string.char() when rebuilding strings… I could go on at length, but I suspect it'd be faster for you to elaborate on what it is you're wanting to achieve so I know what's actually relevant to you.

olie304 #6

olie304's profile picture

16 posts

Posted 14 January 2018 - 07:23 PM

Bomb Bloke, on 13 January 2018 - 03:17 PM said:

This is actually a continuation to my project that displays an image on the Open Peripherals terminal glasses. I put this issue in a new topic because it is a completely different approach on what I was doing before and brought up a specific problem I could not find the answer to. I managed to make a file format that puts the number of files as a byte first, then each pixel or block of pixels has the (x,y), length of block, hex color, and then opacity. I am doing it this way because I could not figure out how to parse an actual png file in lua. The way I am doing it takes a looooonnggg time to process an image. Is there something I can read up on that says how certain binary formats are setup and organized?

Edited on 14 January 2018 - 06:27 PM

Bomb Bloke #7

Bomb Bloke's profile picture

7083 posts

Location Tasmania (AU)

Posted 15 January 2018 - 01:50 AM

Each common file format usually has one or more specification documents provided by its developer that tells other developers how to work with it. For example, my GIF-based code is written of off these three documents.

Here's a PNG spec file, but I have to warn you - it's a bit more complex then GIF. If you intend to write a PNG decoder from scratch, starting from a level where you have trouble finding the documentation, you should expect to spend months on the project. I'm not saying "don't do it" - on the contrary, the skills you'd gain would broaden your horizons to no end - but there's a good chance you'd've lost all interest in your original goal by the time you were done.

Lua's a great language for it, though. I wish I had tables back when I was starting out as a programmer… on the other hand, I suppose they would've spoilt me.

I assume you want to decode PNGs specifically because of their alpha support - if you don't care about that attribute, then my advice would be "just use my GIF decoder". Another option is to cheat and use external software to convert PNG to something simpler - I could whip up a converter in Java, if you like (that language has encoding / decoding support for most common image formats built right in).

Edited on 15 January 2018 - 12:53 AM

olie304 #8

olie304's profile picture

16 posts

Posted 18 January 2018 - 06:32 PM

Bomb Bloke, on 15 January 2018 - 01:50 AM said:

I might end up using single framed GIFs and your decoder because of how fast it is. I already have a PNG decoder written in Java if you want to take a look at it. The problem I have is reading it and reading it fast on CC's side. I offset everything by a value of 32 because for some reason any bytes before 32 automatically have a value of 32.


import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;

public class ImageToGlasses {

public static String pointToByte(int codePoint, int offset) {
  String bytes = "";
  byte[] b = {
   (byte)(codePoint + offset)
  };
  bytes = new String(b, StandardCharsets.ISO_8859_1);
  return bytes;
}

public static String hexToByte(String hexString) {
  int val = Integer.parseInt(hexString, 16);
  String bytes = "";
  byte[] b = {
   (byte)((val >>> 24) &amp; 0xff),
   (byte)((val >>> 16) &amp; 0xff),
   (byte)((val >>> 8) &amp; 0xff),
   (byte)((val >>> 0) &amp; 0xff)
  };
  bytes = new String(b, StandardCharsets.ISO_8859_1);
  return bytes;
}

public static void main(String args[]) throws IOException {

  //Output file
  String outputName = "out";

  //Input PNG or JPG
  String fileName = "image.png";

  if (args.length == 1) {
   outputName = args[1];
   fileName = args[0];
  } else {
   System.out.println("Ex: java -jar ImageToGlasses.jar image.png outputFile");
  }

  ArrayList < String > bytes = new ArrayList < String > ();

  File file = new File(fileName);
  BufferedImage image = ImageIO.read(file);
  Files.deleteIfExists(Paths.get(outputName));

  BufferedWriter writer = Files.newBufferedWriter(Paths.get(outputName), StandardCharsets.UTF_8);
  writer.write("!"); //Default file header, defines the number of volumes that need to be read. E.x. "!" = (Code Point)33 - 32(Offset) = 1 Volume

  //Scan Left to right; Up to Down
  for (int y = 0; y < image.getHeight(); y++) {
   for (int x = 0; x < image.getWidth(); x++) {

	int clr = image.getRGB(x, y);
	int red = (clr &amp; 0x00ff0000) >> 16;
	int green = (clr &amp; 0x0000ff00) >> 8;
	int blue = clr &amp; 0x000000ff;
	int alpha = (clr &amp; 0xff000000) >>> 24;

	String hex = String.format("%02X%02X%02X", red, green, blue);

	//Add a box to the array
	if (alpha > 0 &amp;&amp; !hex.equals("000000")) {
	 bytes.add(pointToByte(x, 32));
	 bytes.add(pointToByte(y, 32));
	 bytes.add(pointToByte(boxLeng, 32));
	 bytes.add(hexToByte(hex));
	 bytes.add(pointToByte(alpha, 32));
	}
   }

   for (int i = 0; i < bytes.size(); i++) {
	writer.write(bytes.get(i));
   }
  
   writer.close();
  }
}

Here is the entire Lua code that I use to decode the new format

Edited on 18 January 2018 - 06:00 PM

Bomb Bloke #9

Bomb Bloke's profile picture

7083 posts

Location Tasmania (AU)

Posted 19 January 2018 - 11:13 AM

Bytes and characters aren't always interchangeable - single byte values can't always represent all characters (though multiple bytes can), and you can't use characters to represent all bytes (refer to the gaps in the ISO/IEC_8859-1 code page, for eg - notice how there aren't any glyphs before char 32?). If you were able to use something like good ol' code page 437 this wouldn't be an issue, but there's actually no need to be bringing strings or characters into this in the first place. The content you're working with isn't text, see, and those just aren't the right data types for the job.

So ditch the BufferedWriter and pass a FileOutputStream through a BufferedOutputStream instead, while likewise ditching all the UTF-8-handling stuff on the Lua side. Then you can simply pass your numeric values around without any special handling, since most of 'em fit within the 0-255 range that single bytes can represent.

Though it may be that you'll need to split your image dimensions and co-ords over multiple bytes (if you want to allow for images larger than 255px). For eg:

output.write(image.getWidth() &amp; 255);  // Where "output" is a BufferedOutputStream
output.write((image.getWidth()>>8) &amp; 255);

Hey presto, 16bit little-endian integer representation, good for values ranging from 0-65,535. Reading it back in Lua:

local width = input.read() + input.read() * 256  --# Where "input" is a binary-mode file handle

Hey presto, it's now… well, probably stored as something like a double-precision float somewhere, but in any case you've got some sort of representation of the original number back in memory.

As for the colours, you might do:

int clr = image.getRGB(x, y);
output.write((clr &amp; 0x00ff0000) >> 16);  // R
output.write((clr &amp; 0x0000ff00) >> 8);   // G
output.write(clr &amp; 0x000000ff);          // B
output.write((clr &amp; 0xff000000) >>> 24); // A

Or you could pull the pixel to a Color and getRed/getBlue/etc, if you prefer that aesthetic:

Color clr = new Color(image.getRGB(x, y), true);
output.write(clr.getRed());
output.write(clr.getGreen());
output.write(clr.getBlue());
output.write(clr.getAlpha());

Bear in mind ComputerCraft's timers don't generate any events until the next server tick, at the earliest. Assuming it isn't overloaded, the server ticks every 0.05s - meaning that "sleep(0.001)" call is really pausing your script for a twentieth of a second, not a thousandth. If you want to yield purely for the sake of yielding, then I suggest:

local function snooze()
    local myEvent = tostring({})
    os.queueEvent(myEvent)
    os.pullEvent(myEvent)
end

Edit:

It occurs to me that your reason for attempting to use UTF-8 might be because you intend to transfer these converted images through the pre-CC1.8 http API (which only handles text, and poorly at that). If this is the case, then I strongly recommend you let BBPack handle your uploads and downloads - it'll sort out the text encoding and decoding for you (using base64, which is rather more appropriate for that sort of thing), ensuring that the data you put in is the data you get out.

Edited on 19 January 2018 - 10:28 AM