You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Tejas Patil <te...@gmail.com> on 2013/02/02 19:51:18 UTC

Re: increase the number of fetches at agiven time on nutch 1.6 or 2.1

Hey Peter,
My hardware was a cluster of high-end production machines (RAM and CPU
specs were 100 times better than a normal desktop PC). I think if you
procure EC2 instances of alteast type "medium", you can expect better perf.

I have no idea about who is faster among nutch 2.1 and 1.6. I want to know
it too :) Can anyone from the @dev or @user comment on that ?

Thanks,
Tejas Patil


On Thu, Jan 31, 2013 at 12:09 AM, peterbarretto
<pe...@gmail.com>wrote:

> Hi Tejas,
>
> I am currently running nutch 1.6 on windows 7, pentium dual core 2.8Ghz, 2
> GB ram
> I will be using amazon ec2 servers later for crawling.
>
> What was ur hardware when you ran 4 million urls with 80Gb data?
>
> Will nutch 2.1 give a faster crawl speed than 1.6?
>
>
> Tejas Patil wrote
> > I had ran crawls with topN as large as 4 million while having crawldb of
> > ~80 GB. It worked fine without any such issue.
> > Maybe the hardware / cluster you have is not capable of handling load
> > above
> > 500. Note that if topN is low, then no matter how many fetcher threads
> you
> > create, you wont be able to increase #crawls. Also, as there is a
> > considerable amount of time spent in generate and update phase, overall
> > crawl rate will be low. If you are planning to use the same machine, you
> > will have to work with lower values (and thus expect lower crawl rate).
> >
> > thanks,
> > Tejas Patil
> >
> >
> > On Wed, Jan 30, 2013 at 8:06 PM, Lewis John Mcgibbney <
>
> > lewis.mcgibbney@
>
> >> wrote:
> >
> >> You are not getting very many URLs!
> >>
> >> On Tue, Jan 29, 2013 at 8:29 PM, peterbarretto &lt;
>
> > peterbarretto08@
>
> > &gt; >wrote:
> >>
> >> >
> >> > 2013-01-29 08:44:35,014 INFO  crawl.CrawlDbReader - TOTAL urls: 96404
> >> >
> >> > 2013-01-29 08:44:35,018 INFO  crawl.CrawlDbReader - status 1
> >> > (db_unfetched):
> >> > 85672
> >> >
> >>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/increase-the-number-of-fetches-at-agiven-time-on-nutch-1-6-or-2-1-tp4036499p4037637.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>