This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
magnevv's profile picture

[1.56] Lua Bug In Textutils.serialize

Started by magnevv, 13 October 2013 - 10:12 PM
magnevv #1
Posted 14 October 2013 - 12:12 AM
I just joined this forum because I thought I should report this bug, but it seems I need at least 3 approved posts before I can do it.
I'll still submit my bug report here in case anyone sees fit to fix it.

There is an error with the way textutils.serialize detects recursive references. The way it's done is that it makes sure that the same table does not exist twice anywhere within the tree structure, instead of the correct approach, which would be to make sure that same table does not exist twice inside the same branch.

This should be allowed, but is not
A
| \
B B
or in code
B = {}
A = {B, B}
This is (should be) absolutely fine, but textutils will throw an error.

Suggested fix: http://pastie.org/8400569
I added a shallowcopy method that does a simple key/value copy, and I modified the serializeImpl method in such a way that it passes copies of the tTracking list instead of the whole list, thus different branches don't interfere with each other.
Lyqyd #2
Posted 14 October 2013 - 12:40 PM
Split into new topic.

The suggested fix takes a slightly too naïve approach. Unserializing the resulting table would result in A = {B, C} where tables B and C are identical (with the exception of any tables they contain, etc.), so a somewhat different approach would need to be taken. A change in approach capable of re-assembling tables with correct sets of matched references would probably be able to handle recursive tables correctly as well. I believe there is an example of such a serialization function on either the lua-users wiki, or in the PIL.
ElvishJerricco #3
Posted 14 October 2013 - 07:51 PM
Split into new topic. The suggested fix takes a slightly too naïve approach. Unserializing the resulting table would result in A = {B, C} where tables B and C are identical (with the exception if any tables they contain, etc.), so a somewhat different approach would need to be taken. A change in approach capable of re-assembling tables with correct sets of matched references would probably be able to handle recursive tables correctly as well. I believe there is an example of such a serialization function on either the lua-users wiki, or in the PIL.

The issue with that is that you can't reference other values in the table in the serialized table. So you can't have a serialized table that looks like

{[1]={},[2]=[1]}
You would need to take a code-like approach if you wanted it to be parsed by loadstring the way it currently is.

(function() local A = {} local B = {} A[1] = B A[2] = B return A end)()
Problem is, that would be disgusting to try to write the serializer for. I've got some ideas on how to do it but I'm not going to write them out.
Lyqyd #4
Posted 14 October 2013 - 10:06 PM
That really wouldn't be all that bad to write a serializer for. Here's a very lightly tested serializer/unserializer (you'll note that it's very similar to the existing unserializer) that handles the A = {B, B} case correctly. It should handle most other cases correctly, though there are probably some corner cases I'm forgetting about. I'm not sure why that seems so disgusting to you. It is a little bigger than the existing serializer, of course.


function serialize(target, tracking, cursor)
	local firstRun = false
	local outString = ""
	if not tracking then tracking = {[target] = "__locTable", size = 1} outString = "__locTable = {}\n" cursor = "__locTable" firstRun = true end
	if not tracking[target] or tracking.size < 2 then
		for k,v in pairs(target) do
			if type(v) == "function" then error("Cannot serialize type: function!") end
			local keyName = ""
			if type(k) == "string" then
				keyName = '["' .. k .. '"]'
			else
				keyName = "[" .. k .. "]"
			end
			outString = outString..cursor..keyName.." = "
			if type(v) == "table" then
				if not tracking[v] then
					outString = outString .. "{}\n"
					tracking[v] = cursor..keyName
					tracking.size = tracking.size + 1
					local insertString = serialize(v, tracking, cursor..keyName)
					outString = outString..insertString
				else
					outString = outString..tracking[v]
				end
			elseif type(v) == "string" then
				outString = outString..string.format("%q", v).."\n"
			else
				outString = outString..tostring(v).."\n"
			end
		end
		if firstRun then outString = outString.."return __locTable\n" end
		return outString
	end
	return ""
end

function unserialize(str)
	local func, e = loadstring(str)
	if not func then
		return str
	else
		setfenv(func, {})
		return func()
	end
end
magnevv #5
Posted 14 October 2013 - 10:47 PM
I agree that my method is pretty naive, but I didn't want to spend a long time coding a better approach when I didn't know if my post would even get any responses.

If you want to, and think it's worth the effort to write and test a more solid approach that'd be awesome. I'm not sure how easy it is to submit bug fixes to CC, but seeing as how you're an admin with over 4k posts I'm thinking you'll probably have 100x chance that I have.
Lyqyd #6
Posted 15 October 2013 - 12:30 AM
For reference, I used the word naïve in a specific coding context. Just wanted to ensure that was understood as I meant it.

The developers to read through this section and the Suggestions section on occasion. I'd imagine that they're probably satisfied with the current state of the serialization function. It accomplishes good, compact serialization that successfully covers a large majority of the use cases. There are other serialization functions floating around on the forums that are available for use, and I believe some even claim to properly handle functions in tables. If the developers wanted to change out the serialization function, they've got a wide range to choose from. One major downside is that any change that requires a different unserialization method would break all of the existing serialized tables. For those reasons (easy availability of alternatives for those who want them and breaking changes to switch to a more feature-rich method), I find it unlikely that they would choose to incorporate a different serialization method into the ROM.
theoriginalbit #7
Posted 15 October 2013 - 01:56 AM
Also as another point, I have not seen one single Lua-side bug fix, code improvement, code replacement or anything of the such in all the time I've been using ComputerCraft… I've seen more programs added, I've seen more bugs added, I've never seen any of said bugs resolved or fixed. The reason for this is because there is a system in place which easily allows us, the users, to fix the bugs ourselves and use the better versions that we've written… It also gives us inspiration and motivation for writing programs that the community would like as they can address pain points of the general user. Lastly, I would much prefer that dan and Cloudy spend more time working on adding new features to the mod and squishing the Java-side bugs, than attempting to fix or improve something that I can do myself on the computer.
ElvishJerricco #8
Posted 15 October 2013 - 04:59 AM
That really wouldn't be all that bad to write a serializer for. Here's a very lightly tested serializer/unserializer (you'll note that it's very similar to the existing unserializer) that handles the A = {B, B} case correctly. It should handle most other cases correctly, though there are probably some corner cases I'm forgetting about. I'm not sure why that seems so disgusting to you. It is a little bigger than the existing serializer, of course.

I just tried your code in a lua shell, it does not work.


> x = {}
> a = {}
> a.b = {}
> a.c = {}
> a.b.d = x
> a.c.d = x
> s = serialize(a)
> = s
__locTable = {}
["c"] = {}
["b"] = {}

> -- wut?

That's not even all that complex of a table. And attempting to loadstring() that obviously errors with

[string "__locTable = {}..."]:2: unexpected symbol near '['
[/code]
amtra5 #9
Posted 15 October 2013 - 05:47 AM
Also as another point, I have not seen one single Lua-side bug fix, code improvement, code replacement or anything of the such in all the time I've been using ComputerCraft… I've seen more programs added, I've seen more bugs added, I've never seen any of said bugs resolved or fixed. The reason for this is because there is a system in place which easily allows us, the users, to fix the bugs ourselves and use the better versions that we've written… It also gives us inspiration and motivation for writing programs that the community would like as they can address pain points of the general user. Lastly, I would much prefer that dan and Cloudy spend more time working on adding new features to the mod and squishing the Java-side bugs, than attempting to fix or improve something that I can do myself on the computer.
http://www.computercraft.info/forums2/index.php?/topic/10921-typo-in-adventure/?
ElvishJerricco #10
Posted 15 October 2013 - 06:25 AM
Also as another point, I have not seen one single Lua-side bug fix, code improvement, code replacement or anything of the such in all the time I've been using ComputerCraft… I've seen more programs added, I've seen more bugs added, I've never seen any of said bugs resolved or fixed. The reason for this is because there is a system in place which easily allows us, the users, to fix the bugs ourselves and use the better versions that we've written… It also gives us inspiration and motivation for writing programs that the community would like as they can address pain points of the general user. Lastly, I would much prefer that dan and Cloudy spend more time working on adding new features to the mod and squishing the Java-side bugs, than attempting to fix or improve something that I can do myself on the computer.

Yea bit. I don't really think you're right on this. For example, I pointed out a lua-side rednet bug where the side couldn't be a modem attached via peripheral proxy from OpenPerihperal, and Dan came along and fixed that. There's been a number of lua fixes. It'd be stupid to leave a bunch of code as bug-ridden just because you want other people to fix your bugs for you.
Lyqyd #11
Posted 15 October 2013 - 10:20 AM
Hah! I copied the wrong version! Let me see if I can fix it real quick, sorry.

Edit: there, fixed the code I posted above. It should work now. Sorry about my miscopying. That's what I get for having it open in multiple windows. If it still doesn't work, I'll copy over the correct version of it when I get home from work.
ElvishJerricco #12
Posted 15 October 2013 - 01:27 PM
Hah! I copied the wrong version! Let me see if I can fix it real quick, sorry.

Edit: there, fixed the code I posted above. It should work now. Sorry about my miscopying. That's what I get for having it open in multiple windows. If it still doesn't work, I'll copy over the correct version of it when I get home from work.

Still not right. The d fields in b and c don't get set to a common table, or at all.


> x = {}
> a = {}
> a.b = {}
> a.c = {}
> a.b.d = x
> a.c.d = x
> s = serialize(a)
> =s
__locTable = {}
__locTable["c"] = {}
__locTable["b"] = {}
return __locTable

> --wut?
Lyqyd #13
Posted 15 October 2013 - 01:44 PM
Huh. That's odd. I'll have to drag out the working version in a few hours when I get home. Clearly, trying to make from memory the fixes I had figured out last night isn't working. I'll be sure to run the test cases you've posted before I upload it, but I believe both of them are fine in the actual working version. Sigh.
ElvishJerricco #14
Posted 15 October 2013 - 01:50 PM
Huh. That's odd. I'll have to drag out the working version in a few hours when I get home. Clearly, trying to make from memory the fixes I had figured out last night isn't working. I'll be sure to run the test cases you've posted before I upload it, but I believe both of them are fine in the actual working version. Sigh.

I'm working on my own version of this right now. The way I see it, it should be done in two passes. One to identify all values, and one to write them in a format. The result would look something like:


> x = {}
> a = {}
> a.b = {}
> a.c = {}
> a.b.d = x
> a.c.d = x
> = serialize(a)
local c1={}
local c2={d=c1}
local c3={d=c1}
local c4={b=c2,c=c3}
return c4

That result would be a well optimized version and I'm not sure I'll be willing to go that far with it, but I think the arbitrary cx values is the way to go.
ElvishJerricco #15
Posted 15 October 2013 - 02:50 PM
Ok this worked perfectly. The serialized text isn't the smallest but it works perfectly, even for recursively stored tables

Spoiler

function serialize(v)
	-- two pass
	-- pass 1: identify constants
	-- pass 2: write constants to format

	local function find(t, v)
		for k,v2 in pairs(t) do
			if v2 == v then return k end
		end
	end

	local function contains(t, v)
		for k,v2 in pairs(t) do
			if v2 == v then return true end
		end
		return false
	end

	local function pass1(v, track)
		if type(v) == "table" then
			if not contains(track, v) then
				table.insert(track, v)
				for k,v in pairs(v) do
					pass1(v, track)
				end
			end
		end
	end

	local function strForVal(v, track)
		local t = type(v)
		if t == "string" then
			return ("%q"):format(v)
		elseif t == "table" then
			return "c"..find(track, v)
		elseif t == "number" or t == "boolean" or t == "nil" then
			return tostring(v)
		else
			error("Invalid type: " .. t)
		end
	end

	local function format(v, track)
		local str = ""
		if type(v) ~= "table" then return str end
		for i,v in ipairs(track) do
			for k,v in pairs(v) do
				str = str .. "c"..i.."["..strForVal(k,track).."]=" .. strForVal(v,track).."\n"
			end
		end
		return str
	end

	local function pass2(v, track)
		local str = ""
		for i,v in ipairs(track) do
			str = str .. "local c"..i.."={}\n"
		end
		str = str .. format(v, track)
		str = str .. "return "..strForVal(v, track)
		
		return str
	end

	local track = {}
	pass1(v, track)
	return pass2(v, track)
end

function unserialize(s)
	local f,e = loadstring(s)
	if f then
		setfenv(f, {})
		return f()
	end
end

This worked:


> x = {}
> a = {}
> a.b = {}
> a.c = {}
> a.b.d = x
> a.c.d = x
> x.a = a
> = serialize(a)
local c1={}
local c2={}
local c3={}
local c4={}
c1["c"]=c2
c1["b"]=c4
c2["d"]=c3
c3["a"]=c1
c4["d"]=c3
return c1
> a1 = unserialize(serialize(a))
> = a1.b.d.a == a1
true
Lyqyd #16
Posted 16 October 2013 - 12:16 AM
Here, dug it up:


function serialize(target, tracking, cursor)
	local firstRun = false
	local outString = ""
	if not tracking then tracking = {[target] = "__locTable", size = 1} outString = "local __locTable = {}\n" cursor = "__locTable" firstRun = true end
	for k,v in pairs(target) do
		if type(v) == "function" then error("Cannot serialize type: function!") end
		local keyName = ""
		if type(k) == "string" then
			keyName = '[' ..string.format("%q", k).. ']'
		else
			keyName = "[" .. k .. "]"
		end
		outString = outString..cursor..keyName.." = "
		if type(v) == "table" then
			if not tracking[v] then
				outString = outString .. "{}\n"
				tracking[v] = cursor..keyName
				tracking.size = tracking.size + 1
				local insertString = serialize(v, tracking, cursor..keyName)
				outString = outString..insertString
			else
				outString = outString..tracking[v].."\n"
			end
		elseif type(v) == "string" then
			outString = outString..string.format("%q", v).."\n"
		else
			outString = outString..tostring(v).."\n"
		end
	end
	if firstRun then
		outString = outString.."return __locTable"
	end
	return outString
end

I didn't actually test the unserializer I posted above, but I don't see any reason it wouldn't work properly. Anyway, that should work, and it seems to be a fair bit more compact than the version you wrote above. An interesting exercise nonetheless!
ElvishJerricco #17
Posted 16 October 2013 - 12:39 AM
Here, dug it up:

That one's pretty good. Not to keep pointing out problems, but some still exist in there =P String keys aren't formatted via %q so there's potential syntax errors if the key contains new lines. Also it doesn't allow tables as keys.
Lyqyd #18
Posted 16 October 2013 - 12:45 AM
True, the tables as keys is indeed a problem. I've no idea what on earth I was doing last night when I was working on this. The version I copied over originally did have the string.format, but this version only has the block quotes. Ugh. Tables as keys would indeed not be handled by this one. Gonna paste the string.format fix into my post above for reference, but I've no particular interest in getting tables as keys going on with it.

Edit: Oh, keys. Yeah. Fixed that too. Now I'm done. :P/>