2151 posts
Location
Auckland, New Zealand
Posted 02 October 2014 - 07:40 AM
The OneOS installer has been somewhat plagued with an issue which only appears to some users.
I'm not 100% sure what's causing it, but I think it might have something to do with the fact that I use the parallel API to download all the files (maybe about 100, not too sure) at once. This makes is significantly faster than doing one file at a time, which can take a few minutes rather than a few seconds. As I said, I'm not certain that this is the issue, but it seems the most likely reason.
Essentially, what's happening is http.get is returning nil at random points. Here is a heavily stripped version of the code. If you want to see the whole thing look at the
Pastebin file.
local latestReleaseTag = 'v1.2.6' --it gets this dynamically
function downloadBlob(v)
if v.type == 'tree' then
fs.makeDir('/'..v.path)
else
local f = http.get(('https://raw.github.com/oeed/OneOS/'..latestReleaseTag..v.path):gsub(' ','%%20'))
if not f then
error('Downloading failed, try again. '..('https://raw.github.com/oeed/OneOS/'..latestReleaseTag..v.path):gsub(' ','%%20'))
end
local h = fs.open('/'..v.path, 'w')
h.write(f.readAll())
h.close()
end
end
local downloads = {}
for i, v in ipairs(tree) do
table.insert(downloads, function()downloadBlob(v)end)
end
parallel.waitForAll(unpack(downloads))
Any thoughts would be appreciated. It might be of use to take a look at
this GitHub issue or the OneOS topic (my signature),
Edited on 02 October 2014 - 05:42 AM
808 posts
Posted 02 October 2014 - 08:14 AM
I think this might be a bug. When I tried CC 1.64, I was completely unable to make downloads off github, despite my whitelist being set to *. In fact, I couldn't use the http library for anything except pastebin. Wouldn't even work with computercraft.info. I dismissed it as probably my own error, and lazily went back to 1.63.
7083 posts
Location
Tasmania (AU)
Posted 02 October 2014 - 08:24 AM
EJ has a good point; there've been a few people posting http issues with 1.64, and no suggested workarounds/fixes thus far.
That said, I still think opening up ~100 connections at once is a really bad idea, and it certainly wouldn't hurt to implement a basic retry mechanism. Eg:
local connectionLimit = 10
local latestReleaseTag = 'v1.2.6' --it gets this dynamically
function downloadBlob()
while #tree > 0 do
local v = table.remove(tree,1)
if v.type == 'tree' then
fs.makeDir('/'..v.path)
else
local tries, f = 0
repeat
f = http.get(('https://raw.github.com/oeed/OneOS/'..latestReleaseTag..v.path):gsub(' ','%%20'))
if not f then sleep(5) end
tries = tries + 1
until tries > 5 or f
if not f then
error('Downloading failed, try again. '..('https://raw.github.com/oeed/OneOS/'..latestReleaseTag..v.path):gsub(' ','%%20'))
end
local h = fs.open('/'..v.path, 'w')
h.write(f.readAll())
h.close()
end
end
end
local downloads = {}
for i=1,connectionLimit do
table.insert(downloads, downloadBlob)
end
parallel.waitForAll(unpack(downloads))
2151 posts
Location
Auckland, New Zealand
Posted 02 October 2014 - 08:29 AM
EJ has a good point; there've been a few people posting http issues with 1.64, and no suggested workarounds/fixes thus far.
That said, I still think opening up ~100 connections at once is a really bad idea, and it certainly wouldn't hurt to implement a basic retry mechanism. Eg:
local connectionLimit = 10
local latestReleaseTag = 'v1.2.6' --it gets this dynamically
function downloadBlob()
while #tree > 0 do
local v = table.remove(tree,1)
if v.type == 'tree' then
fs.makeDir('/'..v.path)
else
local tries, f = 0
repeat
f = http.get(('https://raw.github.com/oeed/OneOS/'..latestReleaseTag..v.path):gsub(' ','%%20'))
if not f then sleep(5) end
tries = tries + 1
until tries > 5 or f
if not f then
error('Downloading failed, try again. '..('https://raw.github.com/oeed/OneOS/'..latestReleaseTag..v.path):gsub(' ','%%20'))
end
local h = fs.open('/'..v.path, 'w')
h.write(f.readAll())
h.close()
end
end
end
local downloads = {}
for i=1,connectionLimit do
table.insert(downloads, downloadBlob)
end
parallel.waitForAll(unpack(downloads))
Yea, it may be an issue with 1.64, although if you've got to the point where you're downloading files it's made numerous successful HTTP calls to get the current version, file list, etc. It's been happening for quite a while too.
I'll put a retry mechanism in, although your code won't completely, work. I'm pretty sure it will only download the first 10 files and ignore the rest. What I really need to do is maybe loop over all the files and make a table of tables, each with a max of ten items. Not sure about that sleep(5) either, although I suppose it's better than failing.
Edited on 02 October 2014 - 06:30 AM
7083 posts
Location
Tasmania (AU)
Posted 02 October 2014 - 09:05 AM
I'm pretty sure it will only download the first 10 files and ignore the rest.
Eh?
7508 posts
Location
Australia
Posted 02 October 2014 - 09:35 AM
Not sure about that sleep(5) either, although I suppose it's better than failing.
well the purpose of it would be that it has failed, this could be due to server load, so you just give it some time and then try again.
2151 posts
Location
Auckland, New Zealand
Posted 02 October 2014 - 09:39 AM
Not sure about that sleep(5) either, although I suppose it's better than failing.
well the purpose of it would be that it has failed, this could be due to server load, so you just give it some time and then try again.
Yea, it's just 5 seems a little high. Especially if you're trying 5 times.
7083 posts
Location
Tasmania (AU)
Posted 02 October 2014 - 10:24 AM
It serves as a throttle. If your connections are failing because either the sending or receiving server doesn't like you flooding, then the answer is to reduce the number of active connections at a time. Frankly I'd be tempted to increase the timer; bear in mind it only kicks in if there are "problems", and specifically serves to work around them.
You're more than capable of finding a balance that suits you, I'm sure. ;)/>
7508 posts
Location
Australia
Posted 02 October 2014 - 10:25 AM
Yea, it's just 5 seems a little high. Especially if you're trying 5 times.
it just allows things to potentially calm down.
8543 posts
Posted 02 October 2014 - 03:56 PM
Yeah, that many concurrent connections all kicking off at essentially the same time is a recipe for problems. I'd cut down the number of worker coroutines to 4-6 and implement a queue, whereby when a worker coroutine has finished downloading a file, the manager gives it a new file to download. You still gain any possible benefits from parallelizing the download process, without looking like a flooder or suddenly spiking the server's download bandwidth usage.