You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Marseld Dedgjonaj <ma...@ikubinfo.com> on 2011/01/10 17:40:23 UTC
Read time out exception during fetch process
Hello everyone,
I installed nutch to crawl all links of a single website.
When I crawl with default values of fetcher.threads.fetch(10) and
fetcher.threads.per.host(1) parameters works fine, but performance is not
good? (The most part of CPU and Bandwith is not used)
if I change these parameters in nutch-site.xml to have a better usage of the
recourses:
<property>
<name>fetcher.threads.fetch</name>
<value>80</value>
</property>
<property>
<name>fetcher.threads.per.host</name>
<value>80</value>
</property>
I will having a "fetch of http://www.mysite.com/... failed with:
java.net.SocketTimeoutException: Read timed out" exception for each url.
I have CPU: core 2 quad and 8 GB of RAM
Please any advice what to do?
Thanks in advance
Marseldi
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Gjeni <b>Punë të Mirë</b> dhe <b>të Mirë për Punë</b>... Vizitoni: <a target="_blank" href="http://www.punaime.al/">www.punaime.al</a></span></p>
<p><a target="_blank" href="http://www.punaime.al/"><span style="text-decoration: none;"><img width="165" height="31" border="0" alt="punaime" src="http://www.ikub.al/images/punaime.al_small.png" /></span></a></p>