You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by reddibabu <re...@gmail.com> on 2014/03/26 07:28:10 UTC

Re: Parse benchmark/performance

Hi Folks,

I am running nutch 1.7 on Linux box. I have given threads 100, depth 100 and
topN 100. My Nutch crawl command took 2 and half hours to crawl 9,100 urls
and index them into Solr. It means, for every url it takes almost 1 second. 

I have updated only one entry in "conf/nutch-default.xml", changed
"fetcher.server.delay" value from 5 sec to 0.5 sec. Here, fetcher is taking
little more time compared to parser while crawling.

Is there any more configurations required to crawl and index more url's per
second.

@Roland Von Herget: You mentioned that you achieved performance of
20urls/sec, please let me know what are all the configuration you have done
to achieve this.

@amuseme: You mentioned that you achieved performance of 1500pages/28 sec,
please let me know what are all the configurations have you did.

@ytthet: You mentioned that you achieved performance of 53 pages/sec, please
let me know what are all the configurations have you did.

Please provide me updates so that I can improve my applications performance.

Thanks,
ReddiBabu



--
View this message in context: http://lucene.472066.n3.nabble.com/Parse-benchmark-performance-tp4045827p4127085.html
Sent from the Nutch - User mailing list archive at Nabble.com.