Firstly, wow! This is an impressive project: especially the "actual" coroutines rather than LuaJ's threading method (and a clean implementation of Lua in Java).
I've a project which tries to solve some of the same problems named
Cobalt. This is a fork of LuaJ and so is harder to adapt to use "actual" coroutines but means the parsing/bytecode/stdlib problem is solved for me :)/>. Anyway, I've also thought about some of the above problems and so thought I'd ask some questions both about this project and also ideas you have about improving CC's Lua runtime.
Thank you for your kind words!
I was trying to do something similar to Cobalt earlier (before I started Rembulan), but didn't get very far. Moving away from the coroutine implementation used in LuaJ proved to be far too difficult, and fixing the coroutine problem was one of my main motivations. I think I did get quite far on the Lua 5.3 front with it, though, so that's definitely doable, should you choose to go that way.
Rembulan implements Lua 5.3 as specified by the Lua Reference Manual, explicitly attempting to mimic the behaviour of PUC-Lua whenever possible. This includes language-level features (such as metamethods and coroutines) and the standard library.
My worry with this is that LuaJ isn't standards compliant. Most of the time this is fixable, however sometimes this broken behaviour is relied upon a lot. One common pattern is pcall(error, "", i) to get a traceback (as debug.traceback is blacklisted). This produces different results on LuaJ and Lua. I'd love to fix these but there needs to be a way of doing so without breaking existing programs.
Also, have you tried running against
Lua's tests? Those were really useful in ensuring some features worked correctly.
I think those things should be fixable: I have quite an extensive test suite (written in Scala) that could with some work be adapted to be run against the ComputerCraft standard library with LuaJ/Cobalt to see where the behaviour diverges. That might help identifying such problematic spots.
Regarding Lua's tests: I have tried them, with good results, but in order to be able to run the entire test suite I'd have to modify them a bit. I have a few issues with them:
- They depend on what I'd call "implementation details", e.g. the debug library and the way error messages are formatted.
- They aren't particularly modular. For instance, a lot of test methods use string.gsub or load; until I had the correct implementations of these functions, the tests were failing because of these missing functions.
- And they terminate on the first error, so you can figure out that the test has terminated on line 235 out of 512, but you don't know whether there would be another 100 failures after this line, or none.
But I should give the tests another try.
Rembulan implements what I call "CPU accounting": Lua programs may be allocated a number of "ticks" for which they are allowed to run before they are paused (and resumed later).
I haven't looked into this but does it handle string.find and friends on long strings (string.find("a"):rep(1e4), ".-.-.-.-b$"))? I like it as a concept but I'm not sure how it would work in practice: either you keep the "too long without yielding" error or you'd need a way to terminate programmer errors (such as infinite loops).
How do you handle this currently, e.g., an infinite loop? When a turtle in ComputerCraft gets stuck in an infinite loop, then the only way to fix that is to destroy it, no? ;)/>
The point is that even if it does get stuck that way, it shouldn't be using 100% of your actual CPU time. If, for instance, it runs for a few ms every world tick (IIIRC that's how time is measured in MineCraft), it could simply be kept running forever. (Plus, there might be an in-game mechanism like "heat" that could shorten the time slices assigned to the turtle the more the CPU is used, so after a while, even if it is stuck in an infinite loop, it would get slower and slower, becoming less of a burden on the host system.)
For standard library functions, I avoided using CPU accounting as long as there weren't any non-raw Lua operations involved (e.g., string.gsub can pause, but string.find cannot). It also means that standard library functions are "for free"; but they don't have to be.
Coroutines in Rembulan do not use Java threads (as opposed to LuaJ). They are therefore much lightweight and coroutine switches them do not involve a thread context switch. They work with CPU accounting (above). This should allow you to have an order of magnitude more coroutines than with LuaJ.
Yessssssss. This is something I've been wanting to do for ever (especially as it is the next step towards serialisable state).
Worth noting that you'd probably want to use a wrapper thread to allow yielding/pulling events from CC functions (such as peripherals) without writing special resuming code. This may prove more problematic then that: all peripheral methods would have to be written to allow resuming.
I'll have to believe you on that one :)/> However, if they aren't using too many Lua operations, it should be doable, if perhaps annoying.
Rembulan implements Lua 5.3, whereas (IIRC) LuaJ implements Lua 5.2.
CC uses Lua 5.1 (LuaJ 2.0) though I think Dan wants to upgrade to Lua 5.2 (LuaJ 3.0). 5.3 would be awesome!
Aha! I really thought CC used LuaJ 3.0.
It's possible to have per-state metatables (e.g., for booleans or numbers) in Rembulan, whereas in LuaJ these are accessed via static fields, i.e., by default shared by all programs run on the JVM.
Again, woot!
Well, isn't that what you also did in Cobalt?
The project is not yet as mature as LuaJ, and a significant part of the standard library is still missing (most notably the module library). However, the runtime is by now mostly stable, and I do not expect it to change much until the release. (A detailed overview of the state of the standard library may be found
in the documentation.)
Several issues I noted when looking through the list:
- You're using Java strings: strings are used a lot as byte arrays. Whilst Java strings can handle this it is better not to. You can still do some rudimentary interning of strings to reduce performance issues though.
- No __gc. This is something I've got stuck on too. It might be possible to use ReferenceQueues to fire them.
- Tables: it might be worth checking LuaJ 3.0's implementation of tables (or Cobalt's: that fixes some of its bugs).
- No binary chunks. :(/> That would break jvml-jit.
- Debug library: whilst current LuaJ doesn't have it, there is no reason to under a custom VM which correctly handles inter-computing sandboxes. It would be really nice to have support for this.
Yes, Java strings will probably need to go. I'd
really, really prefer to keep them, though, if there was a reasonable way of making them work. They make useful things such as Java interop much easier, and I'm slightly worried about the performance impact of conversions between Java strings and Lua strings (especially for Lua strings not backed by a Java string). (Just as a note: Rembulan does not use types such as LuaBoolean or LuaNumber for its values: all values are Objects, e.g. Lua booleans are java.lang.Boolean. Having strings represented by java.lang.String is very convenient, even if Java strings are Unicode and Lua strings are not.)
About __gc: yes, that's the mechanism I intend to use for it. I haven't done anything on that front, though, because I didn't want to commit too early to an implementation strategy. (I'm also hoping not to close any doors w.r.t. to a possible future implementation of heap memory sandboxing.)
Tables: indeed! The current default implementation is really just a placeholder backed by java.util.Map. I did look at LuaJ's tables, but that's about it :)/> I haven't got a test suite for checking that a table implementation is valid, so I'm mostly postponing this until I have one. (I don't like writing tests.)
Binary chunks: yea, that's one of the things I'd file under "implementation details of PUC-Lua". I don't use the Lua bytecode (the compiler has its own intermediate representation), so that might be a problem. That said, I used to have a recompiler in Rembulan that was compiling Lua bytecode into Java bytecode; but I took it out, because effectively this meant having two compilers instead of one. Just so you know that it's definitely technically possible to have it there (but probably not a good idea maintenance-wise). But wow, JVML-JIT is cool!
Debug library: parts of the PUC-Lua debug library are there, implementing the things that do have an equivalent in Rembulan. But it's highly unlikely Rembulan will ever support things such as debug.getlocal, debug.getinfo or instruction hooks. These are highly implementation-dependent; I'd have to change a lot about the implementation strategy, almost always to the detriment of performance.
That said, it might be worth trying Rembulan out to see how it could be used in ComputerCraft, and perhaps help shaping it.
I'm looking at putting something together for CCTweaks-Lua which allows using this, just to see how it handles it. Really looking forward to playing with this :)/>.
This is exactly the reaction I was hoping for! Do let me know if I can help with anything! The JavaDocs for the runtime module should be mostly complete, so hopefully it won't be too frustrating! ;)/>