You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Marseld Dedgjonaj <ma...@ikubinfo.com> on 2011/01/10 17:40:23 UTC

Read time out exception during fetch process

Hello everyone,

I installed nutch to crawl all links of a single website.

When I crawl with default values of fetcher.threads.fetch(10) and
fetcher.threads.per.host(1) parameters works fine, but performance is not
good? (The most part of CPU and Bandwith is not used)

if I change these parameters in nutch-site.xml to have a better usage of the
recourses:

 

<property>

  <name>fetcher.threads.fetch</name>

  <value>80</value>

</property>

 

<property> 

  <name>fetcher.threads.per.host</name>

  <value>80</value>

</property>

 

I will having a "fetch of http://www.mysite.com/... failed with:
java.net.SocketTimeoutException: Read timed out" exception for each url.

 

I have CPU: core 2 quad and 8 GB of RAM

Please any advice what to do?

 

Thanks in advance 

Marseldi

 



<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Gjeni <b>Pun&euml; t&euml; Mir&euml;</b> dhe <b>t&euml; Mir&euml; p&euml;r Pun&euml;</b>... Vizitoni: <a target="_blank" href="http://www.punaime.al/">www.punaime.al</a></span></p>
<p><a target="_blank" href="http://www.punaime.al/"><span style="text-decoration: none;"><img width="165" height="31" border="0" alt="punaime" src="http://www.ikub.al/images/punaime.al_small.png" /></span></a></p>