Program works in SP test world but crashes CC serverwide in SMP?

Spacefish #1

Spacefish's profile picture

3 posts

Posted 03 January 2015 - 12:28 AM

Hi there. This post is borderline to a bug report but I post this here asking for help because I have no idea what circumstances lead to the problem described. I have spent a few days on making this cool program in my own lightweight modpack in singleplayer. Now I wanted to deploy it on the server I actually play on, it uses the FTB DW20 pack for MC 1.7.10. It's version 1.0.2 of the pack and that one has CC 1.65. That's the same CC version I used in my dev world.

The problem is, in my dev world everything works great. But on the server, when I do something specific with my program (more later) I crash computercraft for the whole server. The server keeps running though and everything else still works, but all computers/turtles freeze, show the same screen they had when CC crashed and no (fresh) computers I place ever start up.

Now I would be very happy about any pointers as to what I could even do so wrong that this could happen. Since this problem clearly breaks out of the box it should be contained in, I'd think that it's not an actual problem with my program.

The program itself is about 500 lines all in all so I will not post that (also I don't want to share yet). It consists of a small API, a small program that sends stuff over a rednet protocol and the main program which receives the data. This program consists of three threads (input, rednet receive, screen update) that handle term.native() + multiple monitors with individual output.

My main problem here is that I can only reproduce the problem on the server, which is not mine. It restarts once a day and since i don't totally crash the server I only get one shot of testing this a day. Also it's not exactly good sport to crash CC every day. I could narrow it down to about the following scenario:

The program can be quit by pressing "end" on the keyboard or by clicking "X" on any screen. Every block involved is the advanced version. This seems to work when there have been no rednet messages yet. But when I had the sender running (it keeps transmitting when the receiver quits), the crash happenes when I click the "X". It may crash using other means of quitting as well, I could not test that. There is no message, all CC output serverwide just freezes. I'll now try to give an overview what code runs from the moment I click "X":

Input thread captures the click and sets data about the click for another thread to process.

function inputThread()
	while shutdown == false do
		local a, b, c, d = os.pullEvent()
		if a == "key" then
			if b == 207 then
				shutdown = true
				requestUpdate = true
			else
				--printU(tonumber(c))
			end
		elseif a == "monitor_touch" then
			if click == nil then
				click = {}
				click["b"] = 2
				click["a"] = 1
				click["x"] = c
				click["y"] = d
				click["scr"] = nil
				for i in pairs(screen) do
					if screen[i]["side"] == b then
						click["scr"] = screen[i]
						break
					end
				end
			else
				click["a"] = click["a"] + 1
			end
			requestUpdate = true
		elseif a == "mouse_click" then
			if click == nil then
				click = {}
				click["a"] = 1
				click["b"] = b
				click["x"] = c
				click["y"] = d
				click["scr"] = screen[1]
			else
				click["a"] = click["a"] + 1
			end
			requestUpdate = true
		end
	end
end

All threads work until shutdown is true, they are started with "parallel.waitForAll(inputThread, updateThread, receiveThread)" so everything shuts down in a controlled way.

The code that determines this click was a click on "X", part of "updateThread":

	if click ~= nil then
		if click["y"] == click["scr"]["h"] then
			<code missing>
			elseif click["x"] >= 13 and click["x"] <= 15 then
				shutdown = true
				os.startTimer(1) -- ensure event loop can quit
			end
			clearS(click["scr"])
		end
		click = nil
	end

And then everything just quits their loops after doing their last cycle of work. No data is invalidated or something like that, so there's really no reason anything should crash if it usually runs.

The main program continues with the following after all threads have quit:

parallel.waitForAll(inputThread, updateThread, receiveThread)
clearU()
nvapi.close()
os.unloadAPI("nvapi")
print("Bla.")

As you can see, the first thing that happens is clearing all screens. This does not happen, the crash therefore happens before that. Here are the other two main thread functions for the sake of completion:

function receiveThread()
	while shutdown == false do
		sid, msg, prot = rednet.receive(nvapi.protocol, rcvTime)
		if msg ~= nil then
			acceptMessage(msg)
			requestUpdate = true
		end
	end
end

function updateThread()
	while shutdown == false do
		while requestUpdate == false do
			sleep(0.2)
		end
		requestUpdate = false
		update()
		sleep(updTime)
	end
end

I don't know if it makes sense to even look for a problem in the code since it clearly works like a charm in my dev world. It's more to give an idea what features I use.

Thanks for your time. Help me, CC Forum, you're my only hope!

EDIT: I am looking for general weirdness in my program's behavior at the moment (in the test world). One thing I encountered is… What is term.native() anyway? Because it's *not* the original terminal object. I made a test program and on a freshly booted computer this:

term.redirect(term.current())
print("test")
sleep(5)

behaves differently from this:

term.redirect(term.native())
print("test")
sleep(5)

The difference is that the command line does not change it's cursor position in the second one and will have the command line overwrite "test". Hence "native()" is not the original object. Maybe that has something to do with the whole thing. I tried redirecting output to "term" itself, that one crashes the computer (it shuts off). I'm guessing that's not really intended behavior, even though I probably should not do that. However, I do not use a single redirect in my program. But I use term.native() like a wrapped monitor.

Edited on 03 January 2015 - 05:37 AM

krzys_h #2

krzys_h's profile picture

5 posts

Posted 03 January 2015 - 11:35 AM

You shouldn't use term.native(), use term.current().
If you look into /rom/apis/term you'll see:


term.native = function()
	-- NOTE: please don't use this function unless you have to.
	-- If you're running in a redirected or multitasked enviorment, term.native() will NOT be
	-- the current terminal when your program starts up. It is far better to use term.current()
	return native
end

"term" itself is also not correct:


term.redirect = function( target )
	if target == nil or type( target ) ~= "table" then
		error( "Invalid redirect target", 2 )
	end
	if target == term then
	  error( "term is not a recommended redirect target, try term.current() instead", 2 )
	end
	-- [...]

I have no idea why could it crashing ComputerCraft on the whole server.

Spacefish #3

Spacefish's profile picture

3 posts

Posted 03 January 2015 - 12:18 PM

Thanks for your help, krzys_h! I think I understand now. It certainly wouldn't help stability if I mess around with the native terminal object a lot. I made the changes already and I will give them a try once the server restarts. Crossing fingers!

(Will update this post with the result)

Dustmuz #4

Dustmuz's profile picture

209 posts

Location Denmark

Posted 03 January 2015 - 01:36 PM

since you are using the DW20 pack.
i have had similar issues, just using the modems though, if i attach a modem to anything, and activates it, the entire server crashes.
it even happens in SSP.

Bomb Bloke #5

Bomb Bloke's profile picture

7083 posts

Location Tasmania (AU)

Posted 03 January 2015 - 02:11 PM

Odds are the server log (accessible by the server owner) would shed some light on the matter. Does this make any use of peripherals? I'm inclined to think they're involved.

I vaguely suspect that the issue has to do with your use of parallel.waitForAll(). I think it'd be better to auto-end all co-routines at once with parallel.waitForAny() than to allow some to continue on their own (no matter how briefly). For one thing, you'd be able to remove the "rcvTime" timer from your rednet.receive() call in receiveThread(), cutting down processor usage somewhat.

I also really don't like your use of the "requestUpdate" variable. Rather than constantly spamming sleeps in updateThread() until it changes to what you want, I would use it alongside os.queueEvent():

local function receiveThread()
        while not shutdown do
                local sid, msg = rednet.receive(nvapi.protocol)
                acceptMessage(msg)
                requestUpdate = true
                os.queueEvent("requestUpdate")
        end
end

local function updateThread()
        while not shutdown do
                os.pullEvent("requestUpdate")
                requestUpdate = false
                update()
                sleep(updTime)
                if requestUpdate then os.queueEvent("requestUpdate") end
        end
end

Spacefish #6

Spacefish's profile picture

3 posts

Posted 03 January 2015 - 03:19 PM

Bomb Bloke, on 03 January 2015 - 02:11 PM said:
I think it'd be better to auto-end all co-routines at once with parallel.waitForAny() than to allow some to continue on their own (no matter how briefly). For one thing, you'd be able to remove the "rcvTime" timer from your rednet.receive() call in receiveThread(), cutting down processor usage somewhat.

Very good, very good…

Bomb Bloke, on 03 January 2015 - 02:11 PM said:
Rather than constantly spamming sleeps in updateThread() until it changes to what you want, I would use it alongside os.queueEvent():

Again, very good advice… Thank you so much! I was completely unaware of this possibility.

My program is kind of a beast, I have implemented dirty flags and things like that (concerning output) in the meantime. I will tackle that next. And may it only be for optimizations sake. Because looking at use cases, my program has to do soo much more than this per second (new data will arrive *often*). But of course you are totally right, it should not be anything but idle if there is nothing to do (which is actually much more likely than having little to do)

EDIT: Thank you all very much for your advice, my program no longer crashes the server! I suspect there was some undefined behavior caused by my use of term.native(). Besides that I have implemented many optimizations to make the whole thing much less resource hungry which may have helped stability as well. So thanks again, looking forward to sharing my software with you!

Edited on 04 January 2015 - 01:58 AM