This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
Antelux's profile picture

ATLX-L3L1 Compiler

Started by Antelux, 17 January 2019 - 07:30 PM
Antelux #1
Posted 17 January 2019 - 08:30 PM
Perhaps my most ambitious program yet, I've set out to create a seven-phase Lua compiler written in Lua that targets Lua and compiles to . . Lua. Specifically, it targets Lua 5.3 and compiles it into bytecode for the Lua 5.1 virtual machine. What's so special about it, beside it being able to compile Lua 5.3 code? Well, take the example code below:


local x = 1
local y = 2.e
local z = @

if (x + y) < 3 then
print("smaller")
else
print("bigger")
end

for i = 1, 10, z do
print(i)
end

while true do

end

A more extreme example than not, but the point is how the compiler reacts to this snippet. Running this program as is in a ComputerCraft terminal produces the following error:

For input string: "2.e"

Not very helpful, unfortunately. Here's what ATLX spits out:

Compiling test..
Lexical Analyzer Time: 0.00099992752075195
Syntax Analyzer Time: 0
Semantic Analyzer Time: 0
Compilation time: 0.00099992752075195
test: compilation errors (4)
Lexical Analyzer:
[ERR] line:2:11-13: unfinished scientific notation
	  local y = 2.e
					^^^
[ERR] line:3:11: foreign character '@' (0x40)
	  local z = @
					^

Syntax Analyzer:
[ERR] line:3:9: expression(s) expected to follow '=' operator for local variable assignment on line 3:1
	  local z =
					^

Semantic Analyzer:
[WRN] line:5:1: unreachable code past 'if' statement
[ERR] line:11:18: expected variable 'z' (nil) to be of type 'number' in numeric 'for' loop for step value
[WRN] line:15:1: empty 'while' statement; absent of code

Off to a good start! The compiler won't stop at the first error it sees and, much like many modern compilers, it will list all the errors it can find in the code at once.
I've been working on this on and off for the past month or so– but there's still much to do!
Over that time I've created what I believe to be a rather vigorous test suite, consisting of 5,570 Lua programs, 31 MB in total. These come from a variety of sources, including the ComputerCraft forums, OpenComputer forums, and even real-world applications of Lua found in various Github repositories.


It's able to parse all of them. Now, I say parse rather than compile because the compiler doesn't actually do that yet. It turns out that the first three stages of compilation (Lexical, Syntax, Semantics) are rather complex in their own right– at least for me! But of course, the plan is to indeed compile, and to do so first to an intermediate form based on SSA, or single static assignment. To put it simply, Lua programs will see optimization the likes of which they haven't before, allowing for example constant propagation, function inlining, local value numbering, and many others! Exciting– at least for me. :)/>

So! What does this mean for the average user? Probably not much; most Lua programs are small enough that speed nor the best error messages don't matter anyway. But! For those few like me who enjoy making massive Lua programs for no real reason, this is the tool for you, enabling one to focus optimization efforts more on algorithm implementation rather than having to choose between debug-ability and speed while offering them advanced analysis of their programs to give the most accurate errors (read: plural!) that it's able to.

Unfortunately, I have no real ETA on when it will be released. I'm likely to at least give an update about progress here, and the code will be made available once I deem it both stable and functional enough.
Edited on 22 January 2019 - 04:23 AM
SquidDev #2
Posted 17 January 2019 - 10:21 PM
It's really cool to see something like this! I had a stab at something similar a few years ago, though with much less fancy error messages and no error recovery(yay to parser combinators for that I guess). I ended up trying to do type inference on Lua, which it turns out is a really difficult (undecidable) thing to do, so didn't get very far with anything else, but super cool to see someone else having a stab!

One thing I'd suggest is looking into a Lua source backend too. Even if it generates terrible code in some places (or generates Lua 5.2/3 with gotos everywhere), it's a useful debugging tool both for you and the user. Though again, generating nice expression trees ends up being pretty grot.

One thing I've always wanted to see more of, but never got round to implementing, is better analysis tools for Lua. Especially for beginners, so one could take some trivially-broken code from ask a pro (mismatched ifs, obvious misspellings, etc…) and suggest possible ways it might be wrong. It may be an interesting avenue you could explore instead of raw compilation, but I'd suggest on focussing one or the other, as both are pretty complex in their own right!).

Really excited to see where this project goes though!
Antelux #3
Posted 17 January 2019 - 10:56 PM
I was wondering if this kind of project had been done before! I wasn't able to find much besides things like Yueliang which isn't quite what I'm setting out to do, but was interesting nonetheless. But, I like how you had multiple frontends and backends planned or already supported. In the far future, I'd like to have multiple backends as well. Given the way that the IR is currently formatted though, I'm not sure how friendly it would be to multiple frontends outside of different Lua versions.

Funnily enough though, I think the funnest part about writing this so far has been the logic behind the semantic compiler's type inference– which of course wasn't very easy and is rather basic. It seems to be functional for now, but I still need to do more testing. It even works with metatables, but that's only because it just kind of throws it's hands up and says "welp, i guess this works" and flags the result as an unknown value. Better support is planned for them though.

Outside of that, I was definitely wanting to have it output Lua as well as bytecode since not all versions of ComputerCraft are able to load raw bytecode, unfortunately.

As far as the compiler being used for analysis goes, that's the next step I had planned for the syntax analysis phase of ATLX; even more accurate error messages and suggestions. I'd imagine that it would take an input sequence and compare it against all possible syntactical structures– slow when implemented directly like that, but I'd imagine there's ways to keep it relatively fast.

And thanks! I'm definitely excited as well. I'd love to see be used outside of my own plans, even if just rarely.
InDieTasten #4
Posted 18 January 2019 - 06:12 PM
Woah :o/> !

This is crazy! And you've gone so far with it already. When I think about doing these things I already think of how I won't be able to complete it until I lose interest.
A massive test suite, really nice error outputs. I really hope you will release soon. Maybe there'll be some contribution initiative in this community as well, if you decide to go open source.
I'd really love to take a look at this, because there's so many things I wanna learn about it now.
Antelux #5
Posted 18 January 2019 - 06:59 PM
I'd like to release it sooner than not! And it'll definitely be open source. I'd love to see contributors to it! :D/>
But, if there's anything specific that you'd like to know in the mean time about it, I'd be more than happy to tell you about it.
Luca_S #6
Posted 20 January 2019 - 04:08 PM
Really nice tool! I think this is already useful at it's current state. It makes it easier to see where errors are and it also can catch (some) runtime errors, which could make it useful to check bigger projects.You might want to think about releasing the part of the program that's already working just as an error checker.
Antelux #7
Posted 21 January 2019 - 11:27 PM
If the only thing needed is an error checker, programs such as luacheck already exist and are probably far more stable and helpful than ATLX currently is. Though personally, I'd like to make it fully working before releasing the code so there's real reason to use it over existing error checkers like the example I gave.

In other news! I've been working on the semantic analyzer to allow for more advanced analysis. For example, consider the following code:

local a = 1
if b then
   a = '1'
end
a()

Running the normal Lua compiler passes this code but fails at runtime with the following error:

test:5: attempt to call local 'a' (a number value)

ATLX had passed this code– that is, until I reworked semantic analysis. Now my compiler is able to catch runtime errors like this at compile time.

test: compilation error:
   Semantic Analyzer:
	  [ERR] line:5:1: expected local variable 'a' (number/string) to be of type 'function' in function call

Indeed, the compiler now realizes that the variable 'a' must be either a number or string by the time it's called, and thus errors as neither of those things are a function.
There's other features I've added or are working on, but the example I gave just happens to be one of the bigger ones.

Of course, I'll post more updates as they come. Can't wait for my first running "Hello, world!" program. :)/>
Edited on 21 January 2019 - 10:28 PM