This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
lincore's profile picture

Ways to enable sudo static linking in cc

Started by lincore, 10 February 2015 - 08:02 PM
lincore #1
Posted 10 February 2015 - 09:02 PM
Hey everybody,

I am experimenting with ways to put a program and its external dependencies (i.e. "APIs") in a single file that can be executed and will run like the normal "dynamically linked" version. This is because I often intend to share my code and don't want to force users to download multiple files they don't need and adapt my file hierarchy.
The only somewhat reliable way I have found so far is to replace os.loadAPI in the resulting program with one that looks the filename up in a table, then loads and runs the associated string value (the source) and puts the resulting environment in _G like the default os.loadAPI does. This works okay, but I am wondering if there are more elegant ways to do this. Everything I thought of so far goes down the write-a-parser path which I am not intending to walk down for mental health reasons.
I am also not quite sure how os.loadAPI exactly works. Does every loaded api get a pristine environment or do they share one? And how is the table populated that os.loadAPI puts in _G?
Finally, I have been thinking about just inlining functions by replacing the os.loadAPI call with their code (like #include in c). Users could put decorators in front of loadAPI calls with a list of needed functions. This would only work if the inlined functions work like blackboxes, but most of my code that I regularly need consists of less than a dozen functions that have no dependencies on their own. Do you guys have any better ideas how to achieve this in a reliable way?

Thanks!

This is the resulting program that my bake program currently creates. I still need to do a lot of testing.

local __bake__ = {}
__bake__.loaded = {} -- avoid multiple chunk evaluation
__bake__.old_loadAPI = os.loadAPI
os.loadAPI = function(url)  
	-- load and run string in __bake__[api_filename]
	-- put resulting env in _G[api_filename]
	-- set __bake__.loaded[api_filename] = true
end
os.unloadAPI = function(url)
	-- set _G[api_filename] = nil
	-- set __bake__.loaded[api_filename] = nil
end
__bake__.apis = {
	["/lib/util"] = [========[ -- source -- ]========],
	-- etc.
}
-- flatcopy current environment for inlined apis to run in.
__bake__.default_env = {}
for ident, value in pairs(_G) do
	__bake__.default_env[ident] = value
end

__bake__.program = [==========[ --the original program-- ]==========]
local f = loadstring(__bake__.program)
assert(f, '[bake] Unable to load program, does the original program contain syntax errors?')
local success, error = pcall(f, ...)
if not success then
  printError(error)
end
-- clean up:
os.loadAPI = __bake__.old_loadAPI
InDieTasten #2
Posted 10 February 2015 - 09:40 PM
Without answering your question(s), I have to say that I absolutely love the idea of your program itself. it's like an automated resourcefile appender or something. you should definitely consider embedding other files other than apis by hand via arguments passed to your "bakery" :3
like bake –rc /someImage.npa main.lua main.final which redirects any read operations to the file someImage.npa to the resource embedded inside the final main.final
I don't know, whether os.loadAPI uses the fs api, but if it does, you could just add my suggestion and leave the rest out. Theres just like auto-embedding for os.loadAPI to be done, so that someone doesn't have to include all apis loaded via loadapi to the argument list of your bakery. I just like to call it bakery, don't know why though :P/>

Another point thing you have to remember is, that either apis and or the main programs do not have to finish their "requiries" after completion of execution. Programs/API's can be "interactive" and chose different apiloads upon different circumstances(like user input, os.version or even other files on the filesystem), so after just running it once, you can't be sure that all dependencies are embedded, because under different options it depends on other files. Just a thing to remind.
Bomb Bloke #3
Posted 10 February 2015 - 09:53 PM
I am also not quite sure how os.loadAPI exactly works. Does every loaded api get a pristine environment or do they share one? And how is the table populated that os.loadAPI puts in _G?

Each API gets its own unique environment, which is rigged to have access to _G. Lua "environments" are basically tables.

os.loadAPI() then runs the script file containing the API, which populates that unique environment with any global variables it defines (typically function pointers, for the most part).

Once the script has finished running, a new table is created (named after the API), any values remaining in the API's environment table are copied into that, then the result is dumped into _G (where other scripts can then access them from).

It's worth noting that shell and multishell aren't loaded in the same manner - they're executed as any other script, meaning their globals all go into the user's environment table. User scripts can access them (because they're all in the same environment), but since they're not in _G, that means that loading APIs can't access them.

Thus a very simple way to have a script that can load itself as an API is to simply have it check whether or not shell exists. If it does, then the script is being executed as per normal. If it doesn't, then the script is being executed via os.loadAPI(). Take a look at these two examples.

Generally, however, having a script load itself as an API is pointless - the only reason to add this behaviour is so that other scripts can also load them as APIs, thus allowing their functions to be used without clunky shell.run() calls.
lincore #4
Posted 11 February 2015 - 12:59 PM
Without answering your question(s), I have to say that I absolutely love the idea of your program itself. it's like an automated resourcefile appender or something. you should definitely consider embedding other files other than apis by hand via arguments passed to your "bakery" :3
like bake –rc /someImage.npa main.lua main.final which redirects any read operations to the file someImage.npa to the resource embedded inside the final main.final
I did not think of that, but it sounds like a reasonable idea. Overriding fs.open does not seem too difficult and everything else is already in place. I think this could not only be used to read baked files but also to write them (think config files being saved in the program itself).

Another point thing you have to remember is, that either apis and or the main programs do not have to finish their "requiries" after completion of execution. Programs/API's can be "interactive" and chose different apiloads upon different circumstances(like user input, os.version or even other files on the filesystem), so after just running it once, you can't be sure that all dependencies are embedded, because under different options it depends on other files. Just a thing to remind.
Of course there have to be cases where my approach does not work, like using non-literals as arguments to os.loadAPI etc. I solve that by requiring users to decorate their loadapi calls with –#inline:
os.loadAPI("foo") --#inline
I hope this will encourage users to double-check that bake can find their loadAPI-calls as intended. If such a call is not decorated, it will not be baked and thus be delegated to the original os.loadAPI at runtime. The scenario you describe however works fine, I think. bake does not need to know if a loadAPI-call is conditional, it will only load the api when loadAPI is called, so when or if that happens is not significant.


Each API gets its own unique environment, which is rigged to have access to _G. Lua "environments" are basically tables.

os.loadAPI() then runs the script file containing the API, which populates that unique environment with any global variables it defines (typically function pointers, for the most part).

Once the script has finished running, a new table is created (named after the API), any values remaining in the API's environment table are copied into that, then the result is dumped into _G (where other scripts can then access them from).

Thanks, I got most of that already in my code except for one thing: At the moment I simply copy _G's contents into a new table and use that as environment for the api. When the chunk returns however, the environment is still full of _G's stuff. I guess the right approach is to handle this via setmetatable(env, {.__index = _G})? Here's my current os.loadAPI etc.:

__bake__.loaded = {}
__bake__.old_loadAPI = os.loadAPI
__bake__.old_unloadAPI = os.unloadAPI

local get_filename = function(path)
	local _,_, file = path:find("/([^/]+)$")
	return file or path
end

os.loadAPI = function(url)	
	local flatcopy = function(tabl)
		local result = {}
		for ident, value in pairs(tabl) do
			result[ident] = value
		end
		return result
	end
	assert(url and type(url) == "string", "loadAPI: Expecting string, got " .. type(url))
	local api_name = get_filename(url)
	if __bake__.loaded[api_name] then
		return true
	end
	if __bake__.apis[url] then
		local chunk = loadstring(__bake__.apis[url])		
		if not chunk then
			printError("Could not load api " .. tostring(url))
			return false
		end
		local env = flatcopy(__bake__.default_env)
		env._G = _G
		setfenv(chunk, env)
		chunk()
		__bake__.loaded[api_name] = true
		_G[api_name] = getfenv(chunk)
		return true
	else
		return __bake__.old_loadAPI(url)
	end
end

os.unloadAPI = function(url)
	local api_name = get_filename(url)
	if __bake__.loaded[api_name] then
		__bake__.loaded[api_name] = false
		_G[api_name] = false
	else
		return __bake__.old_unloadAPI(url)
	end
end

It's worth noting that shell and multishell aren't loaded in the same manner - they're executed as any other script, meaning their globals all go into the user's environment table. User scripts can access them (because they're all in the same environment), but since they're not in _G, that means that loading APIs can't access them.

Thus a very simple way to have a script that can load itself as an API is to simply have it check whether or not shell exists. If it does, then the script is being executed as per normal. If it doesn't, then the script is being executed via os.loadAPI(). Take a look at these two examples.

Generally, however, having a script load itself as an API is pointless - the only reason to add this behaviour is so that other scripts can also load them as APIs, thus allowing their functions to be used without clunky shell.run() calls.
I am not sure who is misunderstanding who, but I do not intend to have a script loading itself as an api. I intend to put the apis that a given script needs into the script file itself so that it can be distributed as one file and stays one file when run. The script itself is not loaded as an api but it is stringified in order to be able to pcall it. As such I don't see a problem with shell or multishell or any other standard api - unless I completely misunderstood what you're saying.
Bomb Bloke #5
Posted 11 February 2015 - 02:11 PM
Thanks, I got most of that already in my code except for one thing: At the moment I simply copy _G's contents into a new table and use that as environment for the api. When the chunk returns however, the environment is still full of _G's stuff. I guess the right approach is to handle this via setmetatable(env, {.__index = _G})?

Indeed, that particular use of metatables allows the contents of _G to become accessible via env, without actually placing them into env. But because they're still "linked", you wouldn't generally want to place env directly into _G after filling it with the variables from your API - the original os.loadAPI() function simply iterates through the contents of the API's environment table and copies out anything it finds into a new table (which lacks the link), that being the one which actually ends up in _G.

You're doing things in reverse - copying stuff into your environment table (that doesn't need to be there!), running the API's code, then dumping the result directly into _G. This may be a silly question, but have you located the original function's source in bios.lua?

I am not sure who is misunderstanding who, but I do not intend to have a script loading itself as an api. I intend to put the apis that a given script needs into the script file itself so that it can be distributed as one file and stays one file when run. The script itself is not loaded as an api but it is stringified in order to be able to pcall it. As such I don't see a problem with shell or multishell or any other standard api - unless I completely misunderstood what you're saying.

The idea of having the script pass itself to the vanilla os.loadAPI() function was in regards to your talk about "inlining", or replacing the os.loadAPI() calls in your original code with that of your actual APIs (you'd simply redirect those calls to load the script itself as an API, instead of the original target files). However, this is likely more complex than the process you were thinking of, and I'd say your current approach is much better than either method.
lincore #6
Posted 11 February 2015 - 02:34 PM
You're doing things in reverse - copying stuff into your environment table (that doesn't need to be there!), running the API's code, then dumping the result directly into _G. This may be a silly question, but have you located the original function's source in bios.lua?
That is not silly at all. I was assuming that a. the source was not available (found it!) and b. computercrafts apis were written in java. Now I feel silly :~)
I'll take a thorough look at that, thanks for pointing it out.
Edited on 11 February 2015 - 01:36 PM