You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ted Yu <yu...@gmail.com> on 2010/02/12 00:25:00 UTC

SocketTimeoutException

Hi,
Our crawling is based on nutchbase.
I see a lot of the following in our logs:

2010-02-11 15:19:52,046 INFO com.rialto.nutchbase.fetcher.Fetcher:
fetch of http://www.hoovers.com/companyindex/Texas/Carrollton/Financial_Markets_and_Investing-1.html
failed with: java.net.SocketTimeoutException: Read timed out
2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http:
java.net.SocketTimeoutException: Read timed out
2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
java.net.SocketInputStream.socketRead0(Native Method)
2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
java.net.SocketInputStream.read(SocketInputStream.java:129)
2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
java.io.BufferedInputStream.read(BufferedInputStream.java:237)
2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
java.io.FilterInputStream.read(FilterInputStream.java:66)
2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
java.io.PushbackInputStream.read(PushbackInputStream.java:122)
2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
org.apache.nutch.protocol.http.HttpResponse.readLine(HttpResponse.java:323)
2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
org.apache.nutch.protocol.http.HttpResponse.parseStatusLine(HttpResponse.java:237)
2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:145)
2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
org.apache.nutch.protocol.http.Http.getResponse(Http.java:69)
2010-02-11 15:19:54,088 ERROR org.apache.nutch.protocol.http.Http: at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:225)
2010-02-11 15:19:54,088 ERROR org.apache.nutch.protocol.http.Http: at
com.rialto.nutchbase.fetcher.FetcherReducer$FetcherThread.run(FetcherReducer.java:464)

Your suggestion for reducing them is welcome.

Re: SocketTimeoutException

Posted by "Andreas P. Koenzen" <ak...@gmail.com>.
Hello,

Just increase the HTTP Timeout time in nutch-site.xml.

<property>
         <name>http.timeout</name>
         <value>Your value in milliseconds.</value>
         <description></description>
</property>

Best regards,

---
Andreas P. Koenzen

On 11/02/2010, at 08:25 p.m., Ted Yu wrote:

> Hi,
> Our crawling is based on nutchbase.
> I see a lot of the following in our logs:
>
> 2010-02-11 15:19:52,046 INFO com.rialto.nutchbase.fetcher.Fetcher:
> fetch of http://www.hoovers.com/companyindex/Texas/Carrollton/Financial_Markets_and_Investing-1.html
> failed with: java.net.SocketTimeoutException: Read timed out
> 2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http:
> java.net.SocketTimeoutException: Read timed out
> 2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
> java.net.SocketInputStream.socketRead0(Native Method)
> 2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
> java.net.SocketInputStream.read(SocketInputStream.java:129)
> 2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
> java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> 2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
> java.io.FilterInputStream.read(FilterInputStream.java:66)
> 2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
> java.io.PushbackInputStream.read(PushbackInputStream.java:122)
> 2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
> org
> .apache.nutch.protocol.http.HttpResponse.readLine(HttpResponse.java: 
> 323)
> 2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
> org 
> .apache 
> .nutch.protocol.http.HttpResponse.parseStatusLine(HttpResponse.java: 
> 237)
> 2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
> org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java: 
> 145)
> 2010-02-11 15:19:54,087 ERROR org.apache.nutch.protocol.http.Http: at
> org.apache.nutch.protocol.http.Http.getResponse(Http.java:69)
> 2010-02-11 15:19:54,088 ERROR org.apache.nutch.protocol.http.Http: at
> org 
> .apache 
> .nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:225)
> 2010-02-11 15:19:54,088 ERROR org.apache.nutch.protocol.http.Http: at
> com.rialto.nutchbase.fetcher.FetcherReducer 
> $FetcherThread.run(FetcherReducer.java:464)
>
> Your suggestion for reducing them is welcome.