You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by carmmello <ca...@globo.com> on 2006/09/18 19:54:54 UTC
Is that true?
I have been trying Nutch, since its version 0.3, sometimes with some problems. Now I am using the 0.7.2 release and I`m really happy with it, to the point where I have about 1,100,000 pages indexed in a site that deals with quality and environment.
But a new version means, at least in principle, a better product. So I went to try the Nutch 0.8, in the same single computer (Athlon 2400+, 1 gig Ram, about 4Mbits connection, 53 threads), same seed sites (but on a folder, as per tutuorial)). I used a depth of 2, just to try the new version (instead of 4 or 5, that I usually do), but when I went for the log, I was really terrified: the fetching was horribly slow! With Nutch 0.7.2 I got about 9 pages per second and in Nutch 0.8 sometimes it was necessary about 3 seconds por the fetching of a single page! Roughly speaking, the fetching speed was reduced bay a factor of 20!
So, that is may question:
Is that true, or do I have made some big mistake?
Thanks
Re: Is that true?
Posted by Michael Wechner <mi...@wyona.com>.
Sami Siren wrote:
> Your observations are correct, 0.8 has some serious problems and we'll be
> putting 0.8.1 out pretty soon to fix also the performance problem you
> describe.
just to clarify 0.8.1-dev is actual
http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8/
right?
If so wouldn't it make sense to rename this to
http://svn.apache.org/repos/asf/lucene/nutch/branches/branch-0.8.X
?
Without being pushy, but when do you think 0.8.1 is going to be released
resp. what do you think is missing?
Thanks
Michi
> --
> Sami Siren
>
> 2006/9/18, carmmello <ca...@globo.com>:
>
>>
>> I have been trying Nutch, since its version 0.3, sometimes with some
>> problems. Now I am using the 0.7.2 release and I`m really happy with
>> it,
>> to the point where I have about 1,100,000 pages indexed in a site
>> that deals
>> with quality and environment.
>> But a new version means, at least in principle, a better product. So I
>> went to try the Nutch 0.8, in the same single computer (Athlon 2400+, 1
>> gig Ram, about 4Mbits connection, 53 threads), same seed sites (but on a
>> folder, as per tutuorial)). I used a depth of 2, just to try the new
>> version (instead of 4 or 5, that I usually do), but when I went for
>> the log,
>> I was really terrified: the fetching was horribly slow! With Nutch
>> 0.7.2 I got about 9 pages per second and in Nutch 0.8 sometimes it was
>> necessary about 3 seconds por the fetching of a single page! Roughly
>> speaking, the fetching speed was reduced bay a factor of 20!
>> So, that is may question:
>> Is that true, or do I have made some big mistake?
>> Thanks
>>
>
--
Michael Wechner
Wyona - Open Source Content Management - Apache Lenya
http://www.wyona.com http://lenya.apache.org
michael.wechner@wyona.com michi@apache.org
+41 44 272 91 61
Re: Is that true?
Posted by Sami Siren <ss...@gmail.com>.
Your observations are correct, 0.8 has some serious problems and we'll be
putting 0.8.1 out pretty soon to fix also the performance problem you
describe.
--
Sami Siren
2006/9/18, carmmello <ca...@globo.com>:
>
> I have been trying Nutch, since its version 0.3, sometimes with some
> problems. Now I am using the 0.7.2 release and I`m really happy with it,
> to the point where I have about 1,100,000 pages indexed in a site that deals
> with quality and environment.
> But a new version means, at least in principle, a better product. So I
> went to try the Nutch 0.8, in the same single computer (Athlon 2400+, 1
> gig Ram, about 4Mbits connection, 53 threads), same seed sites (but on a
> folder, as per tutuorial)). I used a depth of 2, just to try the new
> version (instead of 4 or 5, that I usually do), but when I went for the log,
> I was really terrified: the fetching was horribly slow! With Nutch
> 0.7.2 I got about 9 pages per second and in Nutch 0.8 sometimes it was
> necessary about 3 seconds por the fetching of a single page! Roughly
> speaking, the fetching speed was reduced bay a factor of 20!
> So, that is may question:
> Is that true, or do I have made some big mistake?
> Thanks
>