You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Weder Carlos Vieira <we...@gmail.com> on 2013/07/31 18:19:10 UTC
Revaluation
Hello
Testing nutch today I could see that nutch is a little slow. This is
because it is reviewing the urls already reviewed? checking for updates?
Anyone knows if I can change it? Change nutch to find out just news urls
to parse?
Thanks
Weder
Re: Revaluation
Posted by Ahme Emre Aladağ <em...@agmlab.com>.
It will be checking how long has been since the last fetch. So there will be a check which causes a natural delay.
But 7 minutes for 50 URLs might be too much, did you investigate which URLS are they? Could they be large PDF files or could your bandwidth be limited? Could you detect the bottleneck except for checking already seen URLs?
----- Orijinal Mesaj -----
Kimden: "Weder Carlos Vieira" <we...@gmail.com>
Kime: user@nutch.apache.org
Gönderilenler: 31 Temmuz Çarşamba 2013 19:26:55
Konu: Re: Revaluation
I running this command below inside a linux script.
bin/nutch generate -topN 50
bin/nutch fetch -all
bin/nutch parse -all
bin/nutch updatedb
This takes 7 minutes to run...
Tks
On Wed, Jul 31, 2013 at 1:19 PM, Weder Carlos Vieira <weder.vierra@gmail.com
> wrote:
> Hello
>
> Testing nutch today I could see that nutch is a little slow. This is
> because it is reviewing the urls already reviewed? checking for updates?
>
> Anyone knows if I can change it? Change nutch to find out just news urls
> to parse?
>
>
> Thanks
> Weder
>
Re: Revaluation
Posted by Weder Carlos Vieira <we...@gmail.com>.
I running this command below inside a linux script.
bin/nutch generate -topN 50
bin/nutch fetch -all
bin/nutch parse -all
bin/nutch updatedb
This takes 7 minutes to run...
Tks
On Wed, Jul 31, 2013 at 1:19 PM, Weder Carlos Vieira <weder.vierra@gmail.com
> wrote:
> Hello
>
> Testing nutch today I could see that nutch is a little slow. This is
> because it is reviewing the urls already reviewed? checking for updates?
>
> Anyone knows if I can change it? Change nutch to find out just news urls
> to parse?
>
>
> Thanks
> Weder
>