You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by carmmello <ca...@globo.com> on 2006/09/18 19:54:54 UTC

Is that true?

I have been trying Nutch, since its version 0.3, sometimes with some problems.  Now I am using the 0.7.2 release and I`m really happy with it, to the point where I have about 1,100,000 pages indexed in a site that deals with quality and environment.
But a new version means, at least in principle, a better product.  So I went to try the Nutch 0.8, in the same single computer (Athlon 2400+, 1 gig Ram, about 4Mbits connection, 53 threads), same seed sites (but on a folder, as per tutuorial)).  I  used a depth of 2, just to try the new version (instead of 4 or 5, that I usually do), but when I went for the log, I was really terrified:  the fetching was horribly slow!   With Nutch 0.7.2 I got about 9 pages per second and in Nutch 0.8 sometimes it was necessary about 3 seconds por the fetching of a single page!  Roughly speaking, the fetching speed was reduced bay a factor of 20!
So, that is may question:
Is that true, or do I have made some big mistake?
Thanks

Re: Is that true?

Posted by Michael Wechner <mi...@wyona.com>.
Sami Siren wrote:

> Your observations are correct, 0.8 has some serious problems and we'll be
> putting 0.8.1 out pretty soon to fix also the performance problem you
> describe.


just to clarify 0.8.1-dev is actual

http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8/

right?

If so wouldn't it make sense to rename this to

http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8.X

?

Without being pushy, but when do you think 0.8.1 is going to be released 
resp. what do you think is missing?

Thanks

Michi


> -- 
> Sami Siren
>
> 2006/9/18, carmmello <ca...@globo.com>:
>
>>
>> I have been trying Nutch, since its version 0.3, sometimes with some
>> problems.  Now I am using the 0.7.2 release and I`m really happy with 
>> it,
>> to the point where I have about 1,100,000 pages indexed in a site 
>> that deals
>> with quality and environment.
>> But a new version means, at least in principle, a better product.  So I
>> went to try the Nutch 0.8, in the same single computer (Athlon 2400+, 1
>> gig Ram, about 4Mbits connection, 53 threads), same seed sites (but on a
>> folder, as per tutuorial)).  I  used a depth of 2, just to try the new
>> version (instead of 4 or 5, that I usually do), but when I went for 
>> the log,
>> I was really terrified:  the fetching was horribly slow!   With Nutch
>> 0.7.2 I got about 9 pages per second and in Nutch 0.8 sometimes it was
>> necessary about 3 seconds por the fetching of a single page!  Roughly
>> speaking, the fetching speed was reduced bay a factor of 20!
>> So, that is may question:
>> Is that true, or do I have made some big mistake?
>> Thanks
>>
>


-- 
Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya
http://www.wyona.com                      http://lenya.apache.org
michael.wechner@wyona.com                        michi@apache.org
+41 44 272 91 61


Re: Is that true?

Posted by Sami Siren <ss...@gmail.com>.
Your observations are correct, 0.8 has some serious problems and we'll be
putting 0.8.1 out pretty soon to fix also the performance problem you
describe.

--
 Sami Siren

2006/9/18, carmmello <ca...@globo.com>:
>
> I have been trying Nutch, since its version 0.3, sometimes with some
> problems.  Now I am using the 0.7.2 release and I`m really happy with it,
> to the point where I have about 1,100,000 pages indexed in a site that deals
> with quality and environment.
> But a new version means, at least in principle, a better product.  So I
> went to try the Nutch 0.8, in the same single computer (Athlon 2400+, 1
> gig Ram, about 4Mbits connection, 53 threads), same seed sites (but on a
> folder, as per tutuorial)).  I  used a depth of 2, just to try the new
> version (instead of 4 or 5, that I usually do), but when I went for the log,
> I was really terrified:  the fetching was horribly slow!   With Nutch
> 0.7.2 I got about 9 pages per second and in Nutch 0.8 sometimes it was
> necessary about 3 seconds por the fetching of a single page!  Roughly
> speaking, the fetching speed was reduced bay a factor of 20!
> So, that is may question:
> Is that true, or do I have made some big mistake?
> Thanks
>