You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Savannah Beckett <sa...@yahoo.com> on 2010/09/18 09:25:56 UTC

java.net.UnknownHostException and Timeout during Fetching?

When I try to fetch links, I got following error for many links:
fetch of 
http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11
 failed with: java.net.UnknownHostException: seeker.dice.com

seeker.dice.com is not my host.  I checked the link, and it works in the 
browser.  I got a bunch of timeout for other links too during the fetch, but 
they all work in browser.  Why?
Thanks.


      

Re: java.net.UnknownHostException and Timeout during Fetching?

Posted by Markus Jelsma <ma...@buyways.nl>.
Also, pay attention to your LAN's infrastructure. Several consumer routers 
(which we sometimes find in corporate environments) cannot cope with a high 
number of concurrent connections, opening and closing all the time.

On Monday 20 September 2010 05:21:37 Savannah Beckett wrote:
> I also get a bunch of Socket timeout exceptions in bunch of my fetches
>  after bunch of successful fetches.  I checked those link in browser and
>  they work in browser.  Is the problem at my end or at the other
>  destination end?  Is it possible that the destination server is block some
>  of my fetches because I did too many too fast?
> 
> 
> 
> 
> ________________________________
> From: Mike Baranczak <mb...@gmail.com>
> To: user@nutch.apache.org
> Sent: Sun, September 19, 2010 7:27:42 PM
> Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
> 
> Reducing the number of threads might help, but 10 threads total doesn't
>  seem like that much to begin with. I think a better solution would be to
>  run your own private DNS server (preferably on the same machine as Nutch,
>  or at least on the same local network).
> 
> -MB
> 
> On Sep 19, 2010, at 10:08 PM, Savannah Beckett wrote:
> > There is no slave, I have only one server.  I tried both OpenDns and
> > Comcast dns
> >
> > servers, and they both have same problem.  I found that if I use slower
> > connection, less of these UnknownHostException happened.  I have the
> > following
> >
> > setting and I am fetching from one host only for now.  Are you suggesting
> > using
> >
> > less threads per host? 
> >
> >
> > <property>
> >  <name>fetcher.threads.per.host</name>
> >  <value>10</value>
> >  <description>This number is the maximum number of threads that
> >    should be allowed to access a host at one time.</description>
> > </property>
> >
> >
> >
> > ________________________________
> > From: Ken Krugler <kk...@transpac.com>
> > To: user@nutch.apache.org
> > Sent: Sun, September 19, 2010 6:17:26 PM
> > Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
> >
> > Hi Savannah,
> >
> > On Sep 18, 2010, at 12:25am, Savannah Beckett wrote:
> >> When I try to fetch links, I got following error for many links:
> >> fetch of
> >>http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/
> >>50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11 1
> >> 1
> >> failed with: java.net.UnknownHostException: seeker.dice.com
> >>
> >> seeker.dice.com is not my host.  I checked the link, and it works in the
> >> browser.  I got a bunch of timeout for other links too during the fetch,
> >> but they all work in browser.  Why?
> >
> > This can happen when Nutch overwhelms whatever DNS server your slave(s)
> > are using.
> >
> > E.g. you can get this if running locally with more than a few threads.
> >
> > -- Ken
> >
> > --------------------------
> > Ken Krugler
> > +1 530-210-6378
> > http://bixolabs.com
> > e l a s t i c  w e b  m i n i n g
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: java.net.UnknownHostException and Timeout during Fetching?

Posted by Mike Baranczak <mb...@gmail.com>.
Maybe, but this would be a separate problem from the UnknownHostExceptions. There's a simpler possibility - that your connection to the internet is just flaky. This would explain both of the errors.

-MB



On Sep 19, 2010, at 11:21 PM, Savannah Beckett wrote:

> I also get a bunch of Socket timeout exceptions in bunch of my fetches after bunch of successful fetches.  I checked those link in browser and they work in browser.  Is the problem at my end or at the other destination end?  Is it possible that the destination server is block some of my fetches because I did too many too fast?
> 
> From: Mike Baranczak <mb...@gmail.com>
> To: user@nutch.apache.org
> Sent: Sun, September 19, 2010 7:27:42 PM
> Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
> 
> Reducing the number of threads might help, but 10 threads total doesn't seem like that much to begin with. I think a better solution would be to run your own private DNS server (preferably on the same machine as Nutch, or at least on the same local network).
> 
> -MB
> 
> 
> 
> On Sep 19, 2010, at 10:08 PM, Savannah Beckett wrote:
> 
> > There is no slave, I have only one server.  I tried both OpenDns and Comcast dns 
> > servers, and they both have same problem.  I found that if I use slower 
> > connection, less of these UnknownHostException happened.  I have the following 
> > setting and I am fetching from one host only for now.  Are you suggesting using 
> > less threads per host?  
> > 
> > 
> > <property>
> >  <name>fetcher.threads.per.host</name>
> >  <value>10</value>
> >  <description>This number is the maximum number of threads that
> >    should be allowed to access a host at one time.</description>
> > </property>
> > 
> > 
> > 
> > ________________________________
> > From: Ken Krugler <kk...@transpac.com>
> > To: user@nutch.apache.org
> > Sent: Sun, September 19, 2010 6:17:26 PM
> > Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
> > 
> > Hi Savannah,
> > 
> > On Sep 18, 2010, at 12:25am, Savannah Beckett wrote:
> > 
> >> When I try to fetch links, I got following error for many links:
> >> fetch of
> >> http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11
> >> 1
> >> failed with: java.net.UnknownHostException: seeker.dice.com
> >> 
> >> seeker.dice.com is not my host.  I checked the link, and it works in the
> >> browser.  I got a bunch of timeout for other links too during the fetch, but
> >> they all work in browser.  Why?
> > 
> > This can happen when Nutch overwhelms whatever DNS server your slave(s) are 
> > using.
> > 
> > E.g. you can get this if running locally with more than a few threads.
> > 
> > -- Ken
> > 
> > --------------------------
> > Ken Krugler
> > +1 530-210-6378
> > http://bixolabs.com
> > e l a s t i c  w e b  m i n i n g
> > 
> > 
> 
> 
> 


Re: java.net.UnknownHostException and Timeout during Fetching?

Posted by Savannah Beckett <sa...@yahoo.com>.
I also get a bunch of Socket timeout exceptions in bunch of my fetches after 
bunch of successful fetches.  I checked those link in browser and they work in 
browser.  Is the problem at my end or at the other destination end?  Is it 
possible that the destination server is block some of my fetches because I did 
too many too fast?




________________________________
From: Mike Baranczak <mb...@gmail.com>
To: user@nutch.apache.org
Sent: Sun, September 19, 2010 7:27:42 PM
Subject: Re: java.net.UnknownHostException and Timeout during Fetching?

Reducing the number of threads might help, but 10 threads total doesn't seem 
like that much to begin with. I think a better solution would be to run your own 
private DNS server (preferably on the same machine as Nutch, or at least on the 
same local network).

-MB



On Sep 19, 2010, at 10:08 PM, Savannah Beckett wrote:

> There is no slave, I have only one server.  I tried both OpenDns and Comcast 
>dns 
>
> servers, and they both have same problem.  I found that if I use slower 
> connection, less of these UnknownHostException happened.  I have the following 

> setting and I am fetching from one host only for now.  Are you suggesting using 
>
> less threads per host?  
> 
> 
> <property>
>  <name>fetcher.threads.per.host</name>
>  <value>10</value>
>  <description>This number is the maximum number of threads that
>    should be allowed to access a host at one time.</description>
> </property>
> 
> 
> 
> ________________________________
> From: Ken Krugler <kk...@transpac.com>
> To: user@nutch.apache.org
> Sent: Sun, September 19, 2010 6:17:26 PM
> Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
> 
> Hi Savannah,
> 
> On Sep 18, 2010, at 12:25am, Savannah Beckett wrote:
> 
>> When I try to fetch links, I got following error for many links:
>> fetch of
>>http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11
>>1
>> 1
>> failed with: java.net.UnknownHostException: seeker.dice.com
>> 
>> seeker.dice.com is not my host.  I checked the link, and it works in the
>> browser.  I got a bunch of timeout for other links too during the fetch, but
>> they all work in browser.  Why?
> 
> This can happen when Nutch overwhelms whatever DNS server your slave(s) are 
> using.
> 
> E.g. you can get this if running locally with more than a few threads.
> 
> -- Ken
> 
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c  w e b  m i n i n g
> 
> 


      

Re: java.net.UnknownHostException and Timeout during Fetching?

Posted by Mike Baranczak <mb...@gmail.com>.
Reducing the number of threads might help, but 10 threads total doesn't seem like that much to begin with. I think a better solution would be to run your own private DNS server (preferably on the same machine as Nutch, or at least on the same local network).

-MB



On Sep 19, 2010, at 10:08 PM, Savannah Beckett wrote:

> There is no slave, I have only one server.  I tried both OpenDns and Comcast dns 
> servers, and they both have same problem.  I found that if I use slower 
> connection, less of these UnknownHostException happened.  I have the following 
> setting and I am fetching from one host only for now.  Are you suggesting using 
> less threads per host?  
> 
> 
> <property>
>   <name>fetcher.threads.per.host</name>
>   <value>10</value>
>   <description>This number is the maximum number of threads that
>     should be allowed to access a host at one time.</description>
> </property>
> 
> 
> 
> ________________________________
> From: Ken Krugler <kk...@transpac.com>
> To: user@nutch.apache.org
> Sent: Sun, September 19, 2010 6:17:26 PM
> Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
> 
> Hi Savannah,
> 
> On Sep 18, 2010, at 12:25am, Savannah Beckett wrote:
> 
>> When I try to fetch links, I got following error for many links:
>> fetch of
>> http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11
>> 1
>> failed with: java.net.UnknownHostException: seeker.dice.com
>> 
>> seeker.dice.com is not my host.  I checked the link, and it works in the
>> browser.  I got a bunch of timeout for other links too during the fetch, but
>> they all work in browser.  Why?
> 
> This can happen when Nutch overwhelms whatever DNS server your slave(s) are 
> using.
> 
> E.g. you can get this if running locally with more than a few threads.
> 
> -- Ken
> 
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c  w e b  m i n i n g
> 
> 


Re: java.net.UnknownHostException and Timeout during Fetching?

Posted by Savannah Beckett <sa...@yahoo.com>.
There is no slave, I have only one server.  I tried both OpenDns and Comcast dns 
servers, and they both have same problem.  I found that if I use slower 
connection, less of these UnknownHostException happened.  I have the following 
setting and I am fetching from one host only for now.  Are you suggesting using 
less threads per host?  


<property>
  <name>fetcher.threads.per.host</name>
  <value>10</value>
  <description>This number is the maximum number of threads that
    should be allowed to access a host at one time.</description>
</property>



________________________________
From: Ken Krugler <kk...@transpac.com>
To: user@nutch.apache.org
Sent: Sun, September 19, 2010 6:17:26 PM
Subject: Re: java.net.UnknownHostException and Timeout during Fetching?

Hi Savannah,

On Sep 18, 2010, at 12:25am, Savannah Beckett wrote:

> When I try to fetch links, I got following error for many links:
> fetch of
>http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11
>1
> failed with: java.net.UnknownHostException: seeker.dice.com
> 
> seeker.dice.com is not my host.  I checked the link, and it works in the
> browser.  I got a bunch of timeout for other links too during the fetch, but
> they all work in browser.  Why?

This can happen when Nutch overwhelms whatever DNS server your slave(s) are 
using.

E.g. you can get this if running locally with more than a few threads.

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c  w e b  m i n i n g


      

Re: java.net.UnknownHostException and Timeout during Fetching?

Posted by Ken Krugler <kk...@transpac.com>.
Hi Savannah,

On Sep 18, 2010, at 12:25am, Savannah Beckett wrote:

> When I try to fetch links, I got following error for many links:
> fetch of
> http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11
> failed with: java.net.UnknownHostException: seeker.dice.com
>
> seeker.dice.com is not my host.  I checked the link, and it works in  
> the
> browser.  I got a bunch of timeout for other links too during the  
> fetch, but
> they all work in browser.  Why?

This can happen when Nutch overwhelms whatever DNS server your  
slave(s) are using.

E.g. you can get this if running locally with more than a few threads.

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g