This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
ElvishJerricco's profile picture

LASM - ComputerCraft's First Alternate Programming Language

Started by ElvishJerricco, 15 June 2013 - 07:33 PM
ElvishJerricco #1
Posted 15 June 2013 - 09:33 PM
LASM - An Assembly Language for the Lua Virtual Machine


I'm sure many of us know that lua runs in a virtual machine via bytecode. You can get bytecode for a function by using string.dump(func). This is nice because you can compile Lua code into bytecode and save the bytecode instead of the source.

LASM is an assembly language for this bytecode. It's entire purpose is to allow the direct programming of Lua bytecode. On the surface, this isn't so useful. But theoretically, this could be useful in creating any new languages. It's much easier to target an assembly language in a compiler than it is to target Lua itself.

I'm not the first to create a LASM assembler for Lua, but I am the first to make one that's compatible with LuaJ and CC. And I think mine's pretty nice.

Here is a large writeup on the Lua bytecode specification as it stands today. The syntax used in their examples isn't exactly like my LASM implementation's syntax, but it's close.

Writing Your First LASM Program

SpoilerFor those of you that don't know, assembly languages are very nearly an exact representation of every byte in the bytecode. You will type out every instruction to the Lua VM.


-- helloworld.lasm
.stacksize 2
.const "Hello, World!" -- constant is at index 0
.const "print"
getglobal 0 1 -- load into register 0, the global with name from constant at index 1 ("print")
loadk 1 0 -- load into register 1, the constant at constant index 0 ("Hello, World!")
call 0 2 1 -- call the function at register 0, with 1 parameter, and keeping no returns.

This simple LASM program prints the ever well known string "Hello, World!" First, we declare our constants. The order they're declared in determines the index they're at. So declaring constant "Hello, World!" puts it at index 0 (unlike Lua, the Lua VM usually works with 0 indexing). Then "print" is kept at constant index 1.

Next, we use the "getglobal" instruction to load the "print" global into register 0. Next we use "loadk" to put "Hello, World!" into register 1. Finally, we call register 0 (print), with one parameter (register 1, the "Hello, World!"), and keeping no return values. For more info on "call," read the writeup posted above.

So what are these "register" and "constant index" things? The Lua VM has four different stacks. The register stack, the constant stack, the upvalue stack, and the function prototype stack. The register stack is managed manually by the program. That's what .stacksize 2 is doing. It's telling Lua VM that we will be using no more registers than 2 (0, and 1). We don't have to use all the register we ask for, but we can't use more.

The constant stack is automatic. As you declare constants they get added to the stack. This is managed at compile time, not runtime, so there's no modifying it.

The upvalue stack is a stack that you can reference to get data from registers (or other upvalues) of the function prototype that you are a child of.

The function prototype stack is pretty much the same as the constant stack, except the data held is the prototypes of functions. You see, when you declare a function in code, you're not writing magical function code to memory or anything. The bytecode has a special section where all your functions are written, and your code creates closures to use those functions (see closure instruction).

Details of This LASM Language

SpoilerThe writeup has a different way of doing some things in its examples. For example, when you declare a function, you don't follow it with four numbers like the writeup does. Number of parameters and upvalues are managed automatically by the compiler, and varargs and stacksize are handled by the programmer.


.function
	.vararg 4
	.stacksize 0
.end
.varag defaults to 2, .stacksize must be specified in every function.

Locals and upvalues do require a string following them that acts as the name of them, just like in the writeup. But now, they serve a purpose besides debugging data for the VM.


.stacksize
.local "a"
.upvalue "b"

getupval %a %b

Using the string names, we can reference their indexes in the register and upvalue stacks. Just a handy feature of the compiler.

Params and functions also have this ability optionally added, and constants can be declared inline by prefixing with "&"


.stacksize 2
.function "a"
	.stacksize 2
	.param "b" -- register index 0
	getglobal 1 &"print"
	move 2 %b
	call 1 2 1
.end
.local "myClosure"
.closure %myClosure %a
loadk 1 &"Hello, World!"
call %myClosure 2 1

And that about does it. That's my language

Download: pastebin get ZghTBkmh lasm

Usage:

lasm [in file] [out file]

There is one little requirement though. LASM was designed with my Project NewLife in mind (which has been updated and now includes LASM), so it doesn't have any way of automatically making CraftOS able to run Lua bytecode files. So either at startup or at least before you try to run the output file, run the following code somehow.


function _G.loadfile(inFile)
    local data = {}
    local file = assert(fs.open(inFile,"rb"))
    for i = 1,fs.getSize(inFile)do
            data[i] = string.char(file.read())
    end
    file.close()
    return loadstring(table.concat(data), fs.getName(inFile))
end
Yevano #2
Posted 15 June 2013 - 10:47 PM
Just tried the example out. This is really awesome what you've made here. I believe I shall embark on a new coding project. :D/> Perhaps a Lisp compiler. :P/>
ElvishJerricco #3
Posted 15 June 2013 - 11:19 PM
Just tried the example out. This is really awesome what you've made here. I believe I shall embark on a new coding project. :D/>/> Perhaps a Lisp compiler. :P/>/>

I'm considering making a backend for LLVM that compiles to this assembly language. Then we'd be able to run C and C++ code in CC!
superaxander #4
Posted 16 June 2013 - 06:31 AM
Nicely made! But there is no real point of this. *Is gonna try make a language too*
ElvishJerricco #5
Posted 16 June 2013 - 09:35 AM
Nicely made! But there is no real point of this. *Is gonna try make a language too*

As I said above, the best reason for it is that it's easier to target an assembly language when building a compiler for a language than bytecode, so if you make a language, you'll have a better time compiling down to LASM than to bytecode or Lua.
svdragster #6
Posted 16 June 2013 - 10:50 AM
Cool, nice job!
M4sh3dP0t4t03 #7
Posted 16 June 2013 - 11:28 AM
Nice job with this language. Maybe I will try to make a language with a compiler that compiles to this.
ElvishJerricco #8
Posted 16 June 2013 - 11:35 AM
Nice job with this language. Maybe I will try to make a language with a compiler that compiles to this.

Thanks. I enjoyed learning compiler design to make it =P
SiKeDDeMoNMC #9
Posted 17 June 2013 - 10:10 AM
AMAZING!!!!!!
Nvirjskly #10
Posted 18 June 2013 - 06:59 AM
This is great. I'm going to use this as a backend 100%.
ElvishJerricco #11
Posted 18 June 2013 - 08:46 PM
I think I'm going to make Objective Lua. It'll be to Lua what Objective C is to C. I'll probably target LASM instead of Lua so that I can have some more freedom in the compiler, but valid Lua code will be valid Objective Lua code. This could take a while to build…
Dave-ee Jones #12
Posted 19 June 2013 - 04:59 AM
Wow. You made a programming language INSIDE a programming language. Nice work.
M4sh3dP0t4t03 #13
Posted 19 June 2013 - 11:47 AM
Wow. You made a programming language INSIDE a programming language. Nice work.
That isn't something special. Lua, python and lots of other programming languages are written in other programming languages.
theoriginalbit #14
Posted 19 June 2013 - 11:50 AM
Wow. You made a programming language INSIDE a programming language. Nice work.
That isn't something special. Lua, python and lots of other programming languages are written in other programming languages.
And as a matter of fact Lua for ComputerCraft is written in Java.
ElvishJerricco #15
Posted 19 June 2013 - 11:03 PM
Wow. You made a programming language INSIDE a programming language. Nice work.
That isn't something special. Lua, python and lots of other programming languages are written in other programming languages.
And as a matter of fact Lua for ComputerCraft is written in Java.

Realistically, it's impossible to create a new programming language without first creating it in another language. But once you've got it working you can use it to compile a compiler written in the new language so it needs no other language. Point is though, no language exists without some other language with the exception of machine language.
NeptunasLT #16
Posted 20 June 2013 - 01:33 AM
Lua + Assembly = Bad.
jesusthekiller #17
Posted 20 June 2013 - 02:00 AM
I used to say so, but it's actually powerful :)/>
Offtopic: You sure you use Unix? FYI Linux is not Unix. Last Unix release was in 1986….




Realistically, it's impossible to create a new programming language without first creating it in another language. But once you've got it working you can use it to compile a compiler written in the new language so it needs no other language. Point is though, no language exists without some other language with the exception of machine language.

It is, but not in CC :)/>
M4sh3dP0t4t03 #18
Posted 20 June 2013 - 06:53 AM
Actually, every programming language except assembly is made within another language.
jesusthekiller #19
Posted 20 June 2013 - 07:45 AM
Only thing that is not created in other language are logic gates.
lieudusty #20
Posted 20 June 2013 - 11:03 AM
Looks amazing! :D/>
GopherAtl #21
Posted 18 July 2013 - 01:04 AM
Only thing that is not created in other language are logic gates.

don't be silly. Logic gates are written in transistor logic.
UMayBleed #22
Posted 18 July 2013 - 04:12 PM
Well this is really sweet, but i would have to learn another language D:, Just if C++ or Java ran on CC :3
Zee #23
Posted 18 July 2013 - 11:30 PM
This is so awesome i wrote a variable dedicated to it:
awesome = "LASM"
Someone needs to make a C++ compiler.
Mrrraou #24
Posted 20 July 2013 - 05:33 AM
Awwwesome ! I'll try it when Java is installed on my Fedora :)/>
jesusthekiller #25
Posted 20 July 2013 - 09:11 AM
Only thing that is not created in other language are logic gates.

don't be silly. Logic gates are written in transistor logic.
Lambda Calculus FTW
Yevano #26
Posted 20 July 2013 - 07:02 PM
I've done only a little research on Lambda Calculus, but I'm pretty sure it has nothing to do with digital logic. Do you mean Boolean Algebra?
Edit: Unless your joke is that the abstraction level is recursive in nature (although it's not really unless you account for digital logic simulation).

@OnTopic: I wish I had the time to work on some high-level compiler that uses this. Maybe I'll just make a mathematical expression compiler. I'm sure many people would find that useful.
Pharap #27
Posted 20 July 2013 - 11:52 PM
I'll be fair, at first I was sceptical about this, but after having a quick read, you've managed to do a decent job.
ElvishJerricco #28
Posted 21 July 2013 - 12:19 AM
I'll be fair, at first I was sceptical about this, but after having a quick read, you've managed to do a decent job.

Thanks. In terms of proper compiler structure, the parsing phase could be a bit more powerful. But LASM's a basic enough language that I don't mind that. I am working on a compiler infrastructure for Lua though. Plan on making a Lua compiler with it =P That way I can add new language features. Plus with the infrastructure will make it much easier for other people to write compilers for other languages.
Dave-ee Jones #29
Posted 21 July 2013 - 03:26 AM
Wow. You made a programming language INSIDE a programming language. Nice work.
That isn't something special. Lua, python and lots of other programming languages are written in other programming languages.
I meant in ComputerCraft. Not in languages outside CC.
Pharap #30
Posted 21 July 2013 - 08:33 AM
Wow. You made a programming language INSIDE a programming language. Nice work.
That isn't something special. Lua, python and lots of other programming languages are written in other programming languages.
I meant in ComputerCraft. Not in languages outside CC.
Technically it's a programming language assembled in an assembler written in a programming language running on a virtual machine written in a language that runs on a virtual machine which itself is written in another language.
What has (computer) science done?
Zee #31
Posted 16 September 2013 - 02:19 AM
Why was the pastebin paste removed?
ElvishJerricco #32
Posted 16 September 2013 - 09:27 AM
Why was the pastebin paste removed?

Sorry cleaned out my paste bin and must've accidentally deleted this. Re-uploaded and posted the new link on the OP. Should work now.
Pyuu #33
Posted 25 December 2015 - 03:48 AM
Unfortunately, jLua breaks this entire system.
Can't even term.write.
Spoiler

.stacksize 2
.const "Hello, World!"
.const "term"
.const "write"
getglobal 0 1
gettable 1 0 259
loadk 2 0
call 1 2 0

When I went in to investigate the cause, I came across a wonderful surprise. When you string.dump(loadstring("term.write("Hello world"))) and then re-run the result of the dump, it'll error out with messages similar to this: http://puu.sh/m7Mk8/277377ef4b.png

You can't even compile this without breaking jLua:
Spoiler

a = print
a("Hello")

Bytecode is officially broken with jLua / CC and can only perform primitive tasks.

There is a possible fix though!
Someone could make a new function to read Lua Bytecode as it is Supposed to be read, and that'll essentially fix those problems. I may take a look into that so I can get this oh-so-neat program to run. (Though is it just mean, or does this program have a Lua Chunk Spy vibe to it?)

Going along into the long list of bugs: (Edit 12/27/15)

Spoiler

.stacksize 2
.const 1
.const "print"
getglobal 0 1
loadk 1 0
call 0 2 1
The code here prints 4.7302246E-4, instead of printing 1. Doubt I'm doing something wrong here, as when looking through the binary contents of the output file, the constant for the number "1" is an exact (double) 1. When doing a string.dump (through CC) on a different file (containing "print(1)"), instead of 0x3FF0000000000000 being the numbers value, it is 0x3FC3B00000000000. In theory, the exact double is what should work. jLua seems to be a rebel and not want to work. When compiling through a Pure Lua string.dump, the value is now an inverted (something with endianness.) double of 1. 03 (int) 0x 00 00 00 00 00 00 F0 3F = 1.

Lovely stuff we have going on here, what should work isn't working. Not sure if it is the emulator I'm running this all on, so I'll be back to promptly test the same in Single Player.
Alright, just tested in Single Player and I can safely say in CC 1.75, it completely crashes the VM with Java heap space
Edited on 27 December 2015 - 09:13 PM
PixelToast #34
Posted 29 December 2015 - 07:22 AM
If you would like to interpret the bytecode it would be easier to start with my friend's lua interpreter in lua: https://github.com/ds84182/LuaVM
Id probably look into this more if i had an instance with the latest CC, if its simply dumping the bytecode with the wrong endianess i could write a simple recompiler for it
LeDark Lua #35
Posted 29 December 2015 - 11:54 AM
Hmm is this a bug or what?

.stacksize2 --Works

.stacksize 2 --works
doublequestionmark #36
Posted 21 November 2016 - 09:51 PM
Oh god no… My worst nightmare: Assembler in CC.
But seriously, this is pretty freaking awesome. Think of all the languages we could build on top of this…