Hello, I spent some time pondering about why peripherals somehow can allow CC to "see" unicode strings, though we cannot make them normally. So I spent the last few hours experimenting with string.char, and eventually found a pattern.
0xeABBCC was the unicode format of CC strings, it's 3 bytes, not 2. I don't know much about Unicode or how it's formatted, but I was at least able to figure this much out. Group A, the main set of characters, is a group that contains 4,096 different characters; Group B, can go up to a maximum value of 64 (0xeA40) before looping. Group C goes up to 64 as well, adding up to a total of 0xFFFF characters; which is the standard unicode format (U+ABCD).
This apparently is UTF-8 Formatting.
So I created a little API that'll translate U+ABCD format to the way CC stores unicode, and also took the time to give a small language pack (JP Basic) for testing purposes.
Why should You care?
This system allows people to now be able to use unicode characters, my use will be in Chat Blocks to be able to color the text using the Section symbol. Others can use it for translation purposes; however, unicode will Not display in the CC computers, only in Peripherals or Items such as Floppy Discs (http://puu.sh/i4M2m/fac039315e.jpg).
This also works in FS Operations; however, viewed by some external editors may present symbols instead of the proper characters.
I also tested in Chat Blocks to see if it'd work properly, and sure enough: http://puu.sh/i4uyc/78c61e8a35.jpg
How I did this? (if you care)
I spent a few hours, literally string.char'ing random patterns together until I found something that made a bit of sense, and used the Chat Block to be able to send in data through the chat_message event and stared at the Hexadecimal of the strings for a while.
Source Code: http://pastebin.com/sseR1jk4
Usage:
Some Globals you have to be aware of until next update:
split ( str String ) - string.split in other languages.
num2hex ( int32 number ) - Converts number to hexadecimal, not sure if necessary.
API Usage:
unicode.char ( int32 number ) - Converts Unicode representation (0xABCD) to a UTF-8 representation which is compatible with CC.
unicode.format ( int32 number ) - Converts a number to a properly formatted Hexadecimal string representation.
unicode.unformat ( str hexadecimal ) - Reverses .format
unicode.read ( str char ) - Same as unicode.char, will be deprecated in next update.
unicode.readString ( str String , str LanguagePack ) - Returns a UTF-8 representation of the string based on the Language Pack.
Example: unicode.readString("a|i|u|e|o","jp_basic")
Data Sets:
unicode.set.eng
unicode.set.jp_basic
Thank you for reading this post! I hope it was of help.