You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Weder Carlos Vieira <we...@gmail.com> on 2013/07/31 18:19:10 UTC

Revaluation

Hello

Testing nutch today I could see that nutch is a little slow. This is
because it is reviewing the urls already reviewed? checking for updates?

Anyone knows if I can change it?  Change nutch to find out just news urls
to parse?


Thanks
Weder

Re: Revaluation

Posted by Ahme Emre Aladağ <em...@agmlab.com>.
It will be checking how long has been since the last fetch. So there will be a check which causes a natural delay. 

But 7 minutes for 50 URLs might be too much, did you investigate which URLS are they? Could they be large PDF files or could your bandwidth be limited? Could you detect the bottleneck except for checking already seen URLs?



----- Orijinal Mesaj -----
Kimden: "Weder Carlos Vieira" <we...@gmail.com>
Kime: user@nutch.apache.org
Gönderilenler: 31 Temmuz Çarşamba 2013 19:26:55
Konu: Re: Revaluation

I running this command below inside a linux script.

bin/nutch generate -topN 50
bin/nutch fetch -all
bin/nutch parse -all
bin/nutch updatedb

This takes 7 minutes to run...


Tks



On Wed, Jul 31, 2013 at 1:19 PM, Weder Carlos Vieira <weder.vierra@gmail.com
> wrote:

> Hello
>
> Testing nutch today I could see that nutch is a little slow. This is
> because it is reviewing the urls already reviewed? checking for updates?
>
> Anyone knows if I can change it?  Change nutch to find out just news urls
> to parse?
>
>
> Thanks
> Weder
>

Re: Revaluation

Posted by Weder Carlos Vieira <we...@gmail.com>.
I running this command below inside a linux script.

bin/nutch generate -topN 50
bin/nutch fetch -all
bin/nutch parse -all
bin/nutch updatedb

This takes 7 minutes to run...


Tks



On Wed, Jul 31, 2013 at 1:19 PM, Weder Carlos Vieira <weder.vierra@gmail.com
> wrote:

> Hello
>
> Testing nutch today I could see that nutch is a little slow. This is
> because it is reviewing the urls already reviewed? checking for updates?
>
> Anyone knows if I can change it?  Change nutch to find out just news urls
> to parse?
>
>
> Thanks
> Weder
>