You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Savannah Beckett <sa...@yahoo.com> on 2010/09/18 09:25:56 UTC
java.net.UnknownHostException and Timeout during Fetching?
When I try to fetch links, I got following error for many links:
fetch of
http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11
failed with: java.net.UnknownHostException: seeker.dice.com
seeker.dice.com is not my host. I checked the link, and it works in the
browser. I got a bunch of timeout for other links too during the fetch, but
they all work in browser. Why?
Thanks.
Re: java.net.UnknownHostException and Timeout during Fetching?
Posted by Markus Jelsma <ma...@buyways.nl>.
Also, pay attention to your LAN's infrastructure. Several consumer routers
(which we sometimes find in corporate environments) cannot cope with a high
number of concurrent connections, opening and closing all the time.
On Monday 20 September 2010 05:21:37 Savannah Beckett wrote:
> I also get a bunch of Socket timeout exceptions in bunch of my fetches
> after bunch of successful fetches. I checked those link in browser and
> they work in browser. Is the problem at my end or at the other
> destination end? Is it possible that the destination server is block some
> of my fetches because I did too many too fast?
>
>
>
>
> ________________________________
> From: Mike Baranczak <mb...@gmail.com>
> To: user@nutch.apache.org
> Sent: Sun, September 19, 2010 7:27:42 PM
> Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
>
> Reducing the number of threads might help, but 10 threads total doesn't
> seem like that much to begin with. I think a better solution would be to
> run your own private DNS server (preferably on the same machine as Nutch,
> or at least on the same local network).
>
> -MB
>
> On Sep 19, 2010, at 10:08 PM, Savannah Beckett wrote:
> > There is no slave, I have only one server. I tried both OpenDns and
> > Comcast dns
> >
> > servers, and they both have same problem. I found that if I use slower
> > connection, less of these UnknownHostException happened. I have the
> > following
> >
> > setting and I am fetching from one host only for now. Are you suggesting
> > using
> >
> > less threads per host?
> >
> >
> > <property>
> > <name>fetcher.threads.per.host</name>
> > <value>10</value>
> > <description>This number is the maximum number of threads that
> > should be allowed to access a host at one time.</description>
> > </property>
> >
> >
> >
> > ________________________________
> > From: Ken Krugler <kk...@transpac.com>
> > To: user@nutch.apache.org
> > Sent: Sun, September 19, 2010 6:17:26 PM
> > Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
> >
> > Hi Savannah,
> >
> > On Sep 18, 2010, at 12:25am, Savannah Beckett wrote:
> >> When I try to fetch links, I got following error for many links:
> >> fetch of
> >>http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/
> >>50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11 1
> >> 1
> >> failed with: java.net.UnknownHostException: seeker.dice.com
> >>
> >> seeker.dice.com is not my host. I checked the link, and it works in the
> >> browser. I got a bunch of timeout for other links too during the fetch,
> >> but they all work in browser. Why?
> >
> > This can happen when Nutch overwhelms whatever DNS server your slave(s)
> > are using.
> >
> > E.g. you can get this if running locally with more than a few threads.
> >
> > -- Ken
> >
> > --------------------------
> > Ken Krugler
> > +1 530-210-6378
> > http://bixolabs.com
> > e l a s t i c w e b m i n i n g
>
Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Re: java.net.UnknownHostException and Timeout during Fetching?
Posted by Mike Baranczak <mb...@gmail.com>.
Maybe, but this would be a separate problem from the UnknownHostExceptions. There's a simpler possibility - that your connection to the internet is just flaky. This would explain both of the errors.
-MB
On Sep 19, 2010, at 11:21 PM, Savannah Beckett wrote:
> I also get a bunch of Socket timeout exceptions in bunch of my fetches after bunch of successful fetches. I checked those link in browser and they work in browser. Is the problem at my end or at the other destination end? Is it possible that the destination server is block some of my fetches because I did too many too fast?
>
> From: Mike Baranczak <mb...@gmail.com>
> To: user@nutch.apache.org
> Sent: Sun, September 19, 2010 7:27:42 PM
> Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
>
> Reducing the number of threads might help, but 10 threads total doesn't seem like that much to begin with. I think a better solution would be to run your own private DNS server (preferably on the same machine as Nutch, or at least on the same local network).
>
> -MB
>
>
>
> On Sep 19, 2010, at 10:08 PM, Savannah Beckett wrote:
>
> > There is no slave, I have only one server. I tried both OpenDns and Comcast dns
> > servers, and they both have same problem. I found that if I use slower
> > connection, less of these UnknownHostException happened. I have the following
> > setting and I am fetching from one host only for now. Are you suggesting using
> > less threads per host?
> >
> >
> > <property>
> > <name>fetcher.threads.per.host</name>
> > <value>10</value>
> > <description>This number is the maximum number of threads that
> > should be allowed to access a host at one time.</description>
> > </property>
> >
> >
> >
> > ________________________________
> > From: Ken Krugler <kk...@transpac.com>
> > To: user@nutch.apache.org
> > Sent: Sun, September 19, 2010 6:17:26 PM
> > Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
> >
> > Hi Savannah,
> >
> > On Sep 18, 2010, at 12:25am, Savannah Beckett wrote:
> >
> >> When I try to fetch links, I got following error for many links:
> >> fetch of
> >> http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11
> >> 1
> >> failed with: java.net.UnknownHostException: seeker.dice.com
> >>
> >> seeker.dice.com is not my host. I checked the link, and it works in the
> >> browser. I got a bunch of timeout for other links too during the fetch, but
> >> they all work in browser. Why?
> >
> > This can happen when Nutch overwhelms whatever DNS server your slave(s) are
> > using.
> >
> > E.g. you can get this if running locally with more than a few threads.
> >
> > -- Ken
> >
> > --------------------------
> > Ken Krugler
> > +1 530-210-6378
> > http://bixolabs.com
> > e l a s t i c w e b m i n i n g
> >
> >
>
>
>
Re: java.net.UnknownHostException and Timeout during Fetching?
Posted by Savannah Beckett <sa...@yahoo.com>.
I also get a bunch of Socket timeout exceptions in bunch of my fetches after
bunch of successful fetches. I checked those link in browser and they work in
browser. Is the problem at my end or at the other destination end? Is it
possible that the destination server is block some of my fetches because I did
too many too fast?
________________________________
From: Mike Baranczak <mb...@gmail.com>
To: user@nutch.apache.org
Sent: Sun, September 19, 2010 7:27:42 PM
Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
Reducing the number of threads might help, but 10 threads total doesn't seem
like that much to begin with. I think a better solution would be to run your own
private DNS server (preferably on the same machine as Nutch, or at least on the
same local network).
-MB
On Sep 19, 2010, at 10:08 PM, Savannah Beckett wrote:
> There is no slave, I have only one server. I tried both OpenDns and Comcast
>dns
>
> servers, and they both have same problem. I found that if I use slower
> connection, less of these UnknownHostException happened. I have the following
> setting and I am fetching from one host only for now. Are you suggesting using
>
> less threads per host?
>
>
> <property>
> <name>fetcher.threads.per.host</name>
> <value>10</value>
> <description>This number is the maximum number of threads that
> should be allowed to access a host at one time.</description>
> </property>
>
>
>
> ________________________________
> From: Ken Krugler <kk...@transpac.com>
> To: user@nutch.apache.org
> Sent: Sun, September 19, 2010 6:17:26 PM
> Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
>
> Hi Savannah,
>
> On Sep 18, 2010, at 12:25am, Savannah Beckett wrote:
>
>> When I try to fetch links, I got following error for many links:
>> fetch of
>>http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11
>>1
>> 1
>> failed with: java.net.UnknownHostException: seeker.dice.com
>>
>> seeker.dice.com is not my host. I checked the link, and it works in the
>> browser. I got a bunch of timeout for other links too during the fetch, but
>> they all work in browser. Why?
>
> This can happen when Nutch overwhelms whatever DNS server your slave(s) are
> using.
>
> E.g. you can get this if running locally with more than a few threads.
>
> -- Ken
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c w e b m i n i n g
>
>
Re: java.net.UnknownHostException and Timeout during Fetching?
Posted by Mike Baranczak <mb...@gmail.com>.
Reducing the number of threads might help, but 10 threads total doesn't seem like that much to begin with. I think a better solution would be to run your own private DNS server (preferably on the same machine as Nutch, or at least on the same local network).
-MB
On Sep 19, 2010, at 10:08 PM, Savannah Beckett wrote:
> There is no slave, I have only one server. I tried both OpenDns and Comcast dns
> servers, and they both have same problem. I found that if I use slower
> connection, less of these UnknownHostException happened. I have the following
> setting and I am fetching from one host only for now. Are you suggesting using
> less threads per host?
>
>
> <property>
> <name>fetcher.threads.per.host</name>
> <value>10</value>
> <description>This number is the maximum number of threads that
> should be allowed to access a host at one time.</description>
> </property>
>
>
>
> ________________________________
> From: Ken Krugler <kk...@transpac.com>
> To: user@nutch.apache.org
> Sent: Sun, September 19, 2010 6:17:26 PM
> Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
>
> Hi Savannah,
>
> On Sep 18, 2010, at 12:25am, Savannah Beckett wrote:
>
>> When I try to fetch links, I got following error for many links:
>> fetch of
>> http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11
>> 1
>> failed with: java.net.UnknownHostException: seeker.dice.com
>>
>> seeker.dice.com is not my host. I checked the link, and it works in the
>> browser. I got a bunch of timeout for other links too during the fetch, but
>> they all work in browser. Why?
>
> This can happen when Nutch overwhelms whatever DNS server your slave(s) are
> using.
>
> E.g. you can get this if running locally with more than a few threads.
>
> -- Ken
>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c w e b m i n i n g
>
>
Re: java.net.UnknownHostException and Timeout during Fetching?
Posted by Savannah Beckett <sa...@yahoo.com>.
There is no slave, I have only one server. I tried both OpenDns and Comcast dns
servers, and they both have same problem. I found that if I use slower
connection, less of these UnknownHostException happened. I have the following
setting and I am fetching from one host only for now. Are you suggesting using
less threads per host?
<property>
<name>fetcher.threads.per.host</name>
<value>10</value>
<description>This number is the maximum number of threads that
should be allowed to access a host at one time.</description>
</property>
________________________________
From: Ken Krugler <kk...@transpac.com>
To: user@nutch.apache.org
Sent: Sun, September 19, 2010 6:17:26 PM
Subject: Re: java.net.UnknownHostException and Timeout during Fetching?
Hi Savannah,
On Sep 18, 2010, at 12:25am, Savannah Beckett wrote:
> When I try to fetch links, I got following error for many links:
> fetch of
>http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11
>1
> failed with: java.net.UnknownHostException: seeker.dice.com
>
> seeker.dice.com is not my host. I checked the link, and it works in the
> browser. I got a bunch of timeout for other links too during the fetch, but
> they all work in browser. Why?
This can happen when Nutch overwhelms whatever DNS server your slave(s) are
using.
E.g. you can get this if running locally with more than a few threads.
-- Ken
--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g
Re: java.net.UnknownHostException and Timeout during Fetching?
Posted by Ken Krugler <kk...@transpac.com>.
Hi Savannah,
On Sep 18, 2010, at 12:25am, Savannah Beckett wrote:
> When I try to fetch links, I got following error for many links:
> fetch of
> http://seeker.dice.com/jobsearch/servlet/JobSearch?op=101&dockey=xml/5/0/50c94073b712f76f38a62436a9844b52@endecaindex&c=1&source=11
> failed with: java.net.UnknownHostException: seeker.dice.com
>
> seeker.dice.com is not my host. I checked the link, and it works in
> the
> browser. I got a bunch of timeout for other links too during the
> fetch, but
> they all work in browser. Why?
This can happen when Nutch overwhelms whatever DNS server your
slave(s) are using.
E.g. you can get this if running locally with more than a few threads.
-- Ken
--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g