You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by zhengping deng <de...@hotmail.com> on 2008/09/11 16:54:20 UTC
how to improve nutch crawl speed?
hi, I am running the nutch-0.8.1 on 4 machines (HP DL380 G4, 2c/4G) with 1 namenode, 3 datanodes. I am using them to crawl the internat now. But I find it is too slow. I have spent nearly 8 hours to crawl, but the size of crawl directory is only 1.1G on HDFS. If I use single machine nutch to crawl, it could be more than 10G. If I use larbin, it can more than 100G on one of my machine. What is the problem? How can I improve the speed of nutch in distributed mode. Can nutch/hadoop be really competitive in SE? My main config is blow: hadoop-site.xml: mapred.map.tasks 15 mapred.reduce.tasks 15 dfs.replication 2 nutch-site.xml: fetcher.threads.fetch 500 fetcher.threads.per.host 20 parser.threads.parse 500 Thank you for all your help.Mark Deng
Discover the new Windows Vista Learn more!
_________________________________________________________________
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE
RE: how to improve nutch crawl speed?
Posted by Edward Quick <ed...@hotmail.com>.
I'm not expert on Nutch but if this is the same problem I had, try setting fetcher.server.delay to 0.1 in nutch-site.xml
Ed.
> From: dengzhengping123@hotmail.com
> To: nutch-user@lucene.apache.org
> Subject: how to improve nutch crawl speed?
> Date: Thu, 11 Sep 2008 14:54:20 +0000
>
>
> hi, I am running the nutch-0.8.1 on 4 machines (HP DL380 G4, 2c/4G) with 1 namenode, 3 datanodes. I am using them to crawl the internat now. But I find it is too slow. I have spent nearly 8 hours to crawl, but the size of crawl directory is only 1.1G on HDFS. If I use single machine nutch to crawl, it could be more than 10G. If I use larbin, it can more than 100G on one of my machine. What is the problem? How can I improve the speed of nutch in distributed mode. Can nutch/hadoop be really competitive in SE? My main config is blow: hadoop-site.xml: mapred.map.tasks 15 mapred.reduce.tasks 15 dfs.replication 2 nutch-site.xml: fetcher.threads.fetch 500 fetcher.threads.per.host 20 parser.threads.parse 500 Thank you for all your help.Mark Deng
>
> Discover the new Windows Vista Learn more!
> _________________________________________________________________
> Discover the new Windows Vista
> http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE
_________________________________________________________________
Get all your favourite content with the slick new MSN Toolbar - FREE
http://clk.atdmt.com/UKM/go/111354027/direct/01/