You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nidhi malik <ni...@gmail.com> on 2008/01/03 08:17:30 UTC

Http 407 error

I am sending my Hadoop file and I apllied also patch559V0.5

at the time of fetching I am getting this messages
---------------------------------------------------------
Fetcher: starting
Fetcher: segment: crawl/segments/20080103125023
Fetcher: threads: 10
fetching http://www.w3schools.com/
http.proxy.host = netmon.iitb.ac.in
http.proxy.port = 80
http.timeout = 100000
http.content.limit = 65536
http.agent = digi/Nutch-0.9 (digvijay; http://www.google.com;
digvijayy@it.iitb.ac.in)
protocol.plugin.check.blocking = true
protocol.plugin.check.robots = true
fetcher.server.delay = 5000
http.max.delays = 100
Configured Client
fetch of http://www.w3schools.com/ failed with: Http code=407, url=
http://www.w3schools.com/
Fetcher: done

----------------------------------------------------------------------------

Re: Http 407 error

Posted by Susam Pal <su...@gmail.com>.
This information is not enough to understand the problem.The log you
have sent seems to be the messages that appear on the console, whereas
I had requested for 'logs/hadoop.log' file.

The log in this file is usually in this format:-

2008-01-03 00:00:16,652 INFO  fetcher.Fetcher - fetching http://www.example.com/
2008-01-03 00:00:17,029 INFO  fetcher.Fetcher - fetching http://www.example.net/

Please send the following information:-

1. The Nutch version you are using. (NUTCH-559v0.5 was generated
against the trunk. If you are using Nutch-0.9, the patch might not go
smoothly. You might have to manually compare whether the patch went
through nicely.)

2. It would be better if you also send the output of your patch command.

3. The relevant logs from 'log/hadoop.log' with DEBUG enabled. Please
make sure before sending that the log file has the DEBUG lines.

4. The output of a sample HTTP query to your proxy server with netcat
or telnet. For example:-

$ nc -v 192.168.101.1 80
intproxy [192.168.101.1] 80 (www) open
GET http://www.google.com/ HTTP/1.0
Host: www.google.com

HTTP/1.1 407 Proxy Authentication Required ( The Server requires
authorization to fulfill the request. Access to the Web Proxy filter
is denied.  )
Via: 1.1 INTPROXY
Proxy-Authenticate: Negotiate
Proxy-Authenticate: Kerberos
Proxy-Authenticate: NTLM
Proxy-Authenticate: Basic realm="INTPROXY"
Connection: Keep-Alive
Proxy-Connection: Keep-Alive
Pragma: no-cache
Cache-Control: no-cache
Content-Type: text/html
Content-Length: 4119

Only the reponse header is enough as shown above. No need to send the
complete response.

5. The values of 'http.proxy.realm' property you have used in your
'conf/nutch-site.xml'. (I assume you have provided the correct host,
port, username and password in the other http.proxy.* properties.
Ideally, ou should also set the http.agent.host property properly
though I have never found this to cause a problem.)

Regards,
Susam Pal

On Jan 3, 2008 12:47 PM, Nidhi malik <ni...@gmail.com> wrote:
> I am sending my Hadoop file and I apllied also patch559V0.5
>
> at the time of fetching I am getting this messages
> ---------------------------------------------------------
> Fetcher: starting
> Fetcher: segment: crawl/segments/20080103125023
> Fetcher: threads: 10
> fetching http://www.w3schools.com/
> http.proxy.host = netmon.iitb.ac.in
> http.proxy.port = 80
> http.timeout = 100000
> http.content.limit = 65536
> http.agent = digi/Nutch-0.9 (digvijay; http://www.google.com;
> digvijayy@it.iitb.ac.in)
> protocol.plugin.check.blocking = true
> protocol.plugin.check.robots = true
> fetcher.server.delay = 5000
> http.max.delays = 100
> Configured Client
> fetch of http://www.w3schools.com/ failed with: Http code=407, url=
> http://www.w3schools.com/
> Fetcher: done
>
> ----------------------------------------------------------------------------