This is a read-only snapshot of the ComputerCraft forums, taken in April 2020.
PixelToast's profile picture

http not working for google search

Started by PixelToast, 20 November 2012 - 12:23 PM
PixelToast #1
Posted 20 November 2012 - 01:23 PM
normally the http api works fine
but
http.request("http://www.google.com/search?q=anything")
always fails instantly, anyone know why? :s

EDIT:
its because of a incompatable user agent sent when connecting to google is preventing me
and cannot be fixed from lua
(unless you use a custom proxy)
Kingdaro #2
Posted 20 November 2012 - 01:39 PM
Try https? (if cc can even use https, that is)
PixelToast #3
Posted 20 November 2012 - 01:42 PM
Try https? (if cc can even use https, that is)
nope, it complains that its not http

the problem could be is that it takes a couple seconds for google to send results, and the http ques http_failure before the message is sent
Lyqyd #4
Posted 20 November 2012 - 02:00 PM
Are you using http.request correctly? You know it throws an event with the results rather than returning them, correct?
PixelToast #5
Posted 20 November 2012 - 02:19 PM
Are you using http.request correctly? You know it throws an event with the results rather than returning them, correct?
yes, i know
Lyqyd #6
Posted 20 November 2012 - 02:48 PM
We'll need to see the code, then.
PixelToast #7
Posted 20 November 2012 - 04:08 PM

http.request("http://www.google.com/search?q=define+greece")
local response
while true do
local event,url,sourceText=os.pullEvent()
if event == "http_success" then
  response=sourceText.readAll()
  break
elseif event == "http_failure" then
  error("http_failure")
end
end
local t,a=string.find(response,'<td valign="top" style="padding-bottom:5px;padding-top:5px"><table class="ts"><tr><td>')
if not t then
print("Definition not found.")
else
local b,t=string.find(response,'</td>',a)
print(string.sub(response,a+1,b-1))
end
it errors http_failure,
but when you change line 1 to

http.request("http://www.google.com/")
it will say definition not found instead of erroring
dissy #8
Posted 20 November 2012 - 06:14 PM
Does anyone happen to know what HTTP_USER_AGENT is sent by the HTTP API?
If it contains Java or Lua, it is quite possible google is blocking search requests from it. They do that for a lot of scripting languages.

I've run into that problem with both Java and TCL, when using the default user agent. I had to change the user agent to match FireFox (Which unfortunately is against the Google terms of service) to get non-error result back.
Espen #9
Posted 21 November 2012 - 12:39 AM
Does anyone happen to know what HTTP_USER_AGENT is sent by the HTTP API?
If it contains Java or Lua, it is quite possible google is blocking search requests from it. They do that for a lot of scripting languages.

I've run into that problem with both Java and TCL, when using the default user agent. I had to change the user agent to match FireFox (Which unfortunately is against the Google terms of service) to get non-error result back.
It sends a Java UserAgent information.
You can find out the exact string by using the HTTP API on, e.g. http://whatsmyuseragent.com/ and then looking for "Your User Agent" in the response body.
PixelToast #10
Posted 21 November 2012 - 05:07 AM
It sends a Java UserAgent information.
You can find out the exact string by using the HTTP API on, e.g. http://whatsmyuseragent.com/ and then looking for "Your User Agent" in the response body.
the user agent is " java/1.7.0_07 "
with a simple recode of the program i posted above:
Spoiler

http.request("http://whatsmyuseragent.com/")
local response
while true do
local event,url,sourceText=os.pullEvent()
if event == "http_success" then
  response=sourceText.readAll()
  break
elseif event == "http_failure" then
  error("http_failure")
end
end
local t,a=string.find(response,'<strong>Your User Agent:</strong>')
if not t then
print("Tag not found.")
else
local b,t=string.find(response,'<br /><br />',t)
print(string.sub(response,a+1,b-1))
end
dissy #11
Posted 21 November 2012 - 12:30 PM
Sorry it's taken so long for me to verify this, but Google search does indeed block the Java user agent.
I've confirmed this using wget and the -U option (to specify a user agent)

This is with a FireFox user agent:

$ wget -U "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0" http://www.google.com/search?q=kittens
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `search?q=kittens'

This is the exact same query made 5 seconds later with the Java agent:

$ wget -U "Java/1.7.0_07" http://www.google.com/search?q=kittens
HTTP request sent, awaiting response... 403 Forbidden
2012-11-20 18:25:14 ERROR 403: Forbidden.

Unfortunately there doesn't appear to be any way to "fix" this directly in Lua, short of using a different search engine.
This might help: http://en.wikipedia.org/wiki/List_of_search_engines
PixelToast #12
Posted 21 November 2012 - 04:07 PM
Sorry it's taken so long for me to verify this, but Google search does indeed block the Java user agent.
I've confirmed this using wget and the -U option (to specify a user agent)

This is with a FireFox user agent:

$ wget -U "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0" http://www.google.com/search?q=kittens
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `search?q=kittens'

This is the exact same query made 5 seconds later with the Java agent:

$ wget -U "Java/1.7.0_07" http://www.google.com/search?q=kittens
HTTP request sent, awaiting response... 403 Forbidden
2012-11-20 18:25:14 ERROR 403: Forbidden.

Unfortunately there doesn't appear to be any way to "fix" this directly in Lua, short of using a different search engine.
This might help: http://en.wikipedia...._search_engines
none of those seem to work :s
many of them dont define things
some (like bing) aren't accurate

bah, lemme see if a proxy works >_>
dissy #13
Posted 21 November 2012 - 04:44 PM
Do you know PHP or Perl, and have a web-server you can put scripts on?
If so you could code a little proxy, where the CGI queries Google with an IE or Firefox user agent. Then your Lua program will hit your cgi/php…

The only other way I know of to do this (a huge pita) is to signup for their developer API.
With a developer API, they give you an API key, and you include that as part of the search query to a special URL.
They give you 100 queries per day for free, and then you have to pay to enable more queries ($5 per 1000 extra queries)

https://developers.google.com/custom-search/v1/overview

After all that mess, you can use a URL like this:


https:// www.googleapis.com/customsearch/v1?key={INSERT-YOUR-KEY}&cx=017576662512468239146:omuauf_lfve&q={SEARCH-TERM}

They also have a JSON interface, if you wanted to go that route.
PixelToast #14
Posted 21 November 2012 - 05:14 PM
nah, id rather learn php and host it on moi xampp server :3
Espen #15
Posted 21 November 2012 - 06:20 PM
I've just posted a [topic='6259']suggestion[/topic] about including Java-side setRequestProperties on the connection.
This would not only would solve the User-Agent problem, but also allow one to set any header values, which enables a lot of other functions like e.g. sending cookies, etc.