This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
Hawk777's profile picture

Modems Are Not Binary-Safe

Started by Hawk777, 26 February 2012 - 07:40 AM
Hawk777 #1
Posted 26 February 2012 - 08:40 AM
Modems don't seem to be able to send strings containing arbitrary binary data and have it arrive intact. Assume that on one computer, number 14, I run the following:


rednet.open("top")
sender, message = rednet.receive()
rednet.close("top")
print("sender=", sender)
print("len=", message:len())
for i = 1, message:len() do
  print("message[", i, "] = ", message:byte(i))
end

I then go to a second computer, number 13, and run the following:


rednet.open("top")
rednet.send(14, string.char(135))
rednet.close("top")

On computer 14, I get the following output:


sender=13
len=3
message[1] = 239
message[2] = 190
message[3] = 135

To which I say, what the heck? It seems that every byte from 0 to 127 works fine, but those above do not. Since binary data is easier to work with programmatically than text data, it would be nice if modems provided a binary-safe transmission medium.
FuzzyPurp #2
Posted 27 February 2012 - 04:46 AM
Yea, needs to be brought to Dan's attention
Etherous #3
Posted 27 February 2012 - 04:49 AM
I can confirm this
Casper7526 #4
Posted 27 February 2012 - 05:49 AM
I'm literally just heading out, I'll be back in 30 minutes or so, but try this out.

Computer 1


message = "hello world"
data = ""
for i = 1, message:len() do
  print("message[", i, "] = ", message:byte(i))
  data = data .. "" .. message:byte(i)
end
print (data)
rednet.open("back")
a = loadstring("rednet.broadcast('"..data.."')")
a()

Computer 2


rednet.open("back")
id,message = rednet.receive()
print (message)
Hawk777 #5
Posted 27 February 2012 - 06:03 AM
I reduced the string to "hello", and I get the bytes 104, 101, 108, 108, and 111, as expected, followed by this data value, again as expected:

104101108108111

The receiver receives the string "hello" just fine.

However, if I change "hello" to "hell130o" in the sender, everything in the sender comes out fine (bytes are 104, 101, 108, 108, 130, 111 and escape sequence is "104101108108130111"), but the receiver sees "message" as being not six but SEVEN characters long, the bytes being 104, 101, 108, 108, 194, 175, and 0.
Casper7526 #6
Posted 27 February 2012 - 06:51 AM
So whats wrong with literally just sending the ascii characters that you want?

rednet.broadcast("Œ™®©¶þ")

Cause if you wanted to send straight binary you would have to just do the conversion to a string then the conversion back to binary on the other end.

And on further note, anything above 127 isn't a single byte anymore so
Hawk777 #7
Posted 27 February 2012 - 10:55 AM
A Lua string holds bytes, each of which can be zero to 255. Proof:


lua> s = string.char(157)
lua> s:len()
1
lua> s:byte(1)
157
lua> s:byte(2)
lua>

And yet, were I to send the string "s" through a wireless modem, the thing that came out the other end would not have the same properties. To me, this is a bug, either in the implementation (the string received should be the same as the string sent) or the documentation (which should make it abundantly clear that modems can not pass arbitrary strings, only certain ones, namely those whose bytes are all no more than 127).
Casper7526 #8
Posted 27 February 2012 - 05:51 PM
The first 128 characters (US-ASCII) need one byte. The next 1,920 characters need two bytes to encode. This includes Latin letters with diacritics and characters from the Greek, Cyrillic, Coptic, Armenian, Hebrew,Arabic, Syriac and Tāna alphabets. Three bytes are needed for characters in the rest of the Basic Multilingual Plane (which contains virtually all characters in common use). Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters and various historic scripts and mathematical symbols.
Casper7526 #9
Posted 27 February 2012 - 05:55 PM
I still ask though… why you can't just send the actual string and why you need to convert it to bytes, or if you are specifically reading from a binary file, then why it doesn't make sense to convert binary to string then on the receiving end string to binary.
interfect #10
Posted 28 February 2012 - 11:38 AM
I've been through the rednet code. The reason that only byte values 0-127 work is that the high-order bit (which is only used for values 128-255) is not sent over rednet. Only the low 7 bits of every character are sent. For "basic" ASCII values (normal characters you can type) this is fine, but for binary data (including Unicode strings, theoretically, or anything using extended ASCII) it's a problem, because you lose 1/8 of the data.

It's done this way because RedPower bundled cables have 16 wires in them. Using 2 wires for control, that leaves 14 data wires, which is enough to send two characters at once if you only send the low 7 bits. If all 8 bits were sent, only one character could be sent at a time, making rednet twice as slow.

As a workaround, you could base64-encode your data, create a custom base-128 encoding, or use a slower rednet-like system that sends all 8 bits of its data. (Shameless plug: my "reliable" communication system does this, and doesn't drop characters like rednet sometimes will. The downside is it's slower, requiring a round trip for every character.)
Casper7526 #11
Posted 28 February 2012 - 04:27 PM
Heh, we were mainly talking about modems where it's an actual string of data being sent over the network and not being converted into bits to be used for bundles :D/>/>

But yes, you are correct if you were ever worried about any sort of wierd characters you could convert a string to base64 and decode on the other end ;)/>/>
Hawk777 #12
Posted 29 February 2012 - 02:46 AM
I think we may not be communicating quite straight. I know I can work around this by using text instead of binary to encode my data, but it's not a question of "not encoding the data"; my data doesn't start out as plain text so it's encoding either way; I happened to try binary encoding first only to discover it was broken, so I'm now using text encoding. This was more of a "I don't have an immediate problem here, but I think this is a bug which should be fixed".

Second, values above 127 don't inherently use 2 bytes. That's only true in UTF-8, but as I proved earlier by my paste, I can create a Lua string with a singe byte of value 157. That string isn't even valid UTF-8, ergo, Lua strings aren't UTF-8.
Xtansia #13
Posted 29 February 2012 - 03:04 AM
I think we may not be communicating quite straight. I know I can work around this by using text instead of binary to encode my data, but it's not a question of "not encoding the data"; my data doesn't start out as plain text so it's encoding either way; I happened to try binary encoding first only to discover it was broken, so I'm now using text encoding. This was more of a "I don't have an immediate problem here, but I think this is a bug which should be fixed".

Second, values above 127 don't inherently use 2 bytes. That's only true in UTF-8, but as I proved earlier by my paste, I can create a Lua string with a singe byte of value 157. That string isn't even valid UTF-8, ergo, Lua strings aren't UTF-8.

This is true Lua does not use UTF-8 encoding.
It can be used in Lua but causes some problems when using some of the string functions.

http://lua-users.org/wiki/LuaUnicode