You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Elwin <ma...@gmail.com> on 2006/04/12 09:55:44 UTC

java.net.SocketTimeoutException: Read timed out

When I use the httpclient.HttpResponse to get http content in nutch, I often
get SocketTimeoutExceptions.
Can I solve this problem by enlarging the value of http.timeout in conf
file?

Re: java.net.SocketTimeoutException: Read timed out

Posted by Elwin <ma...@gmail.com>.
Oh. Thank you very much.

在06-4-14,Raghavendra Prabhu <rr...@gmail.com> 写道:
>
> Hi Elwin
>
> Just switch it to protocol-http in the conf file. (nutch-default.xml file)
>
> If you dont want to use threaded thing, change the number of threads in
> the
> configuration file.
>
> Have a limited number of threads fetching (Like as doug said)
>
> Rgds
> Prabhu
>
> On 4/14/06, Elwin <ma...@gmail.com> wrote:
> >
> > Hi Raghavendra
> >
> > Then how to use protocol-http instead of protocol-httpclient?
> > Can I still use HttpResponse?
> >
> > 在 06-4-13,Raghavendra Prabhu<rr...@gmail.com> 写道:
> > > Hi Doug
> > >
> > > I am not sure whether this problem is entirely with bandwidth starving
> > >
> > > In some cases, having the protocol as protocol-http instead of
> > > protocol-httpclient seems to be fixing the problem.
> > >
> > > I am not sure but the above thing seemed to fix the problem
> > >
> > > Rgds
> > > Prabhu
> > >
> > >
> > > On 4/13/06, Elwin <ma...@gmail.com> wrote:
> > > >
> > > > In fact I'm not using the fetcher of nutch and I just call the
> > > > HttpResponse
> > > > in my own code, which is not multi-thread.
> > > >
> > > > 2006/4/13, Doug Cutting <cu...@apache.org>:
> > > > >
> > > > > Elwin wrote:
> > > > > > When I use the httpclient.HttpResponse to get http content in
> > nutch, I
> > > > > often
> > > > > > get SocketTimeoutExceptions.
> > > > > > Can I solve this problem by enlarging the value of http.timeoutin
> > > > conf
> > > > > > file?
> > > > >
> > > > > Perhaps, if you're working with slow sites.  But, more likely,
> > you're
> > > > > using too many fetcher threads and exceeding your available
> > bandwidth,
> > > > > causing threads to starve and timeout.
> > > > >
> > > > > Doug
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > 《盖世豪侠》好评如潮,让无线收视居高不下,
> > > > 无线高兴之余,仍未重用。周星驰岂是池中物,
> > > > 喜剧天分既然崭露,当然不甘心受冷落,于是
> > > > 转投电影界,在大银幕上一展风采。无线既得
> > > > 千里马,又失千里马,当然后悔莫及。
> > > >
> > >
> >
> >
> > --
> > 《盖世豪侠》好评如潮,让无线收视居高不下,
> > 无线高兴之余,仍未重用。周星驰岂是池中物,
> > 喜剧天分既然崭露,当然不甘心受冷落,于是
> > 转投电影界,在大银幕上一展风采。无线既得
> > 千里马,又失千里马,当然后悔莫及。
> >
>



--
《盖世豪侠》好评如潮,让无线收视居高不下,
无线高兴之余,仍未重用。周星驰岂是池中物,
喜剧天分既然崭露,当然不甘心受冷落,于是
转投电影界,在大银幕上一展风采。无线既得
千里马,又失千里马,当然后悔莫及。

Re: java.net.SocketTimeoutException: Read timed out

Posted by Raghavendra Prabhu <rr...@gmail.com>.
Hi Elwin

Just switch it to protocol-http in the conf file. (nutch-default.xml file)

If you dont want to use threaded thing, change the number of threads in the
configuration file.

Have a limited number of threads fetching (Like as doug said)

Rgds
Prabhu

On 4/14/06, Elwin <ma...@gmail.com> wrote:
>
> Hi Raghavendra
>
> Then how to use protocol-http instead of protocol-httpclient?
> Can I still use HttpResponse?
>
> 在 06-4-13,Raghavendra Prabhu<rr...@gmail.com> 写道:
> > Hi Doug
> >
> > I am not sure whether this problem is entirely with bandwidth starving
> >
> > In some cases, having the protocol as protocol-http instead of
> > protocol-httpclient seems to be fixing the problem.
> >
> > I am not sure but the above thing seemed to fix the problem
> >
> > Rgds
> > Prabhu
> >
> >
> > On 4/13/06, Elwin <ma...@gmail.com> wrote:
> > >
> > > In fact I'm not using the fetcher of nutch and I just call the
> > > HttpResponse
> > > in my own code, which is not multi-thread.
> > >
> > > 2006/4/13, Doug Cutting <cu...@apache.org>:
> > > >
> > > > Elwin wrote:
> > > > > When I use the httpclient.HttpResponse to get http content in
> nutch, I
> > > > often
> > > > > get SocketTimeoutExceptions.
> > > > > Can I solve this problem by enlarging the value of http.timeout in
> > > conf
> > > > > file?
> > > >
> > > > Perhaps, if you're working with slow sites.  But, more likely,
> you're
> > > > using too many fetcher threads and exceeding your available
> bandwidth,
> > > > causing threads to starve and timeout.
> > > >
> > > > Doug
> > > >
> > >
> > >
> > >
> > > --
> > > 《盖世豪侠》好评如潮,让无线收视居高不下,
> > > 无线高兴之余,仍未重用。周星驰岂是池中物,
> > > 喜剧天分既然崭露,当然不甘心受冷落,于是
> > > 转投电影界,在大银幕上一展风采。无线既得
> > > 千里马,又失千里马,当然后悔莫及。
> > >
> >
>
>
> --
> 《盖世豪侠》好评如潮,让无线收视居高不下,
> 无线高兴之余,仍未重用。周星驰岂是池中物,
> 喜剧天分既然崭露,当然不甘心受冷落,于是
> 转投电影界,在大银幕上一展风采。无线既得
> 千里马,又失千里马,当然后悔莫及。
>

Re: java.net.SocketTimeoutException: Read timed out

Posted by Elwin <ma...@gmail.com>.
Hi Raghavendra

 Then how to use protocol-http instead of protocol-httpclient?
Can I still use HttpResponse?

在 06-4-13,Raghavendra Prabhu<rr...@gmail.com> 写道:
> Hi Doug
>
> I am not sure whether this problem is entirely with bandwidth starving
>
> In some cases, having the protocol as protocol-http instead of
> protocol-httpclient seems to be fixing the problem.
>
> I am not sure but the above thing seemed to fix the problem
>
> Rgds
> Prabhu
>
>
> On 4/13/06, Elwin <ma...@gmail.com> wrote:
> >
> > In fact I'm not using the fetcher of nutch and I just call the
> > HttpResponse
> > in my own code, which is not multi-thread.
> >
> > 2006/4/13, Doug Cutting <cu...@apache.org>:
> > >
> > > Elwin wrote:
> > > > When I use the httpclient.HttpResponse to get http content in nutch, I
> > > often
> > > > get SocketTimeoutExceptions.
> > > > Can I solve this problem by enlarging the value of http.timeout in
> > conf
> > > > file?
> > >
> > > Perhaps, if you're working with slow sites.  But, more likely, you're
> > > using too many fetcher threads and exceeding your available bandwidth,
> > > causing threads to starve and timeout.
> > >
> > > Doug
> > >
> >
> >
> >
> > --
> > 《盖世豪侠》好评如潮,让无线收视居高不下,
> > 无线高兴之余,仍未重用。周星驰岂是池中物,
> > 喜剧天分既然崭露,当然不甘心受冷落,于是
> > 转投电影界,在大银幕上一展风采。无线既得
> > 千里马,又失千里马,当然后悔莫及。
> >
>


--
《盖世豪侠》好评如潮,让无线收视居高不下,
无线高兴之余,仍未重用。周星驰岂是池中物,
喜剧天分既然崭露,当然不甘心受冷落,于是
转投电影界,在大银幕上一展风采。无线既得
千里马,又失千里马,当然后悔莫及。

FileNotFoundException on crawl

Posted by Michael Levy <Lu...@gmail.com>.
I'm getting the error below. When I look at the rootUrlFile value, it
seems as though it is trying to read a file named "urls.txt -dir
crawled" rather than recognizing the -dir parameter. Any ideas? This is
running on Solaris 9, if that makes any difference.

If I merely run "bin/nutch crawl urls.txt" this problem doesn't occur.

Thanks!


bash-2.05# bin/nutch crawl urls.txt -dir crawled
060414 155951 parsing
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-default.xml
060414 155951 parsing
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/crawl-tool.xml
060414 155951 parsing
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-site.xml
060414 155951 No FS indicated, using default:local
060414 155951 crawl started in: crawl-20060414155951
060414 155951 rootUrlFile = urls.txt -dir crawled
060414 155951 threads = 10
060414 155951 depth = 5
060414 155952 Created webdb at
LocalFS,/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/crawl-20060414155951/db
Exception in thread "main" java.io.FileNotFoundException: urls.txt -dir
crawled (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:106)
at java.io.FileReader.<init>(FileReader.java:55)
at org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:372)
at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)


Re: java.net.SocketTimeoutException: Read timed out

Posted by Raghavendra Prabhu <rr...@gmail.com>.
Hi Doug

I am not sure whether this problem is entirely with bandwidth starving

In some cases, having the protocol as protocol-http instead of
protocol-httpclient seems to be fixing the problem.

I am not sure but the above thing seemed to fix the problem

Rgds
Prabhu


On 4/13/06, Elwin <ma...@gmail.com> wrote:
>
> In fact I'm not using the fetcher of nutch and I just call the
> HttpResponse
> in my own code, which is not multi-thread.
>
> 2006/4/13, Doug Cutting <cu...@apache.org>:
> >
> > Elwin wrote:
> > > When I use the httpclient.HttpResponse to get http content in nutch, I
> > often
> > > get SocketTimeoutExceptions.
> > > Can I solve this problem by enlarging the value of http.timeout in
> conf
> > > file?
> >
> > Perhaps, if you're working with slow sites.  But, more likely, you're
> > using too many fetcher threads and exceeding your available bandwidth,
> > causing threads to starve and timeout.
> >
> > Doug
> >
>
>
>
> --
> 《盖世豪侠》好评如潮,让无线收视居高不下,
> 无线高兴之余,仍未重用。周星驰岂是池中物,
> 喜剧天分既然崭露,当然不甘心受冷落,于是
> 转投电影界,在大银幕上一展风采。无线既得
> 千里马,又失千里马,当然后悔莫及。
>

Re: java.net.SocketTimeoutException: Read timed out

Posted by Elwin <ma...@gmail.com>.
In fact I'm not using the fetcher of nutch and I just call the HttpResponse
in my own code, which is not multi-thread.

2006/4/13, Doug Cutting <cu...@apache.org>:
>
> Elwin wrote:
> > When I use the httpclient.HttpResponse to get http content in nutch, I
> often
> > get SocketTimeoutExceptions.
> > Can I solve this problem by enlarging the value of http.timeout in conf
> > file?
>
> Perhaps, if you're working with slow sites.  But, more likely, you're
> using too many fetcher threads and exceeding your available bandwidth,
> causing threads to starve and timeout.
>
> Doug
>



--
《盖世豪侠》好评如潮,让无线收视居高不下,
无线高兴之余,仍未重用。周星驰岂是池中物,
喜剧天分既然崭露,当然不甘心受冷落,于是
转投电影界,在大银幕上一展风采。无线既得
千里马,又失千里马,当然后悔莫及。

Re: java.net.SocketTimeoutException: Read timed out

Posted by Doug Cutting <cu...@apache.org>.
Elwin wrote:
> When I use the httpclient.HttpResponse to get http content in nutch, I often
> get SocketTimeoutExceptions.
> Can I solve this problem by enlarging the value of http.timeout in conf
> file?

Perhaps, if you're working with slow sites.  But, more likely, you're 
using too many fetcher threads and exceeding your available bandwidth, 
causing threads to starve and timeout.

Doug