You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2006/05/11 13:21:18 UTC

Re: [Nutch-dev] Re: svn commit: r405565 - in /lucene/nutch/trunk/src: java/org/apache/nutch/searcher/ test/org/apache/nutch/searcher/ web/jsp/

On May 11, 2006, at 3:36 AM, Jérôme Charron wrote:

> Actually, the clustering uses the summaries as input. I assumes it  
> would
> provides some better results if it takes the whole documents  
> content. no?
> I assumes that clustering uses the summaries instead of documents  
> content
> for some performances purpose.
> But there is a (bad) side effect : since the size of the summaries is
> configurable, the clustering "quality" will vary depending on the  
> summaries
> size configuration. I really found this very confusing : when folks  
> adjust
> this parameter it is only for front-end consideration (they want to  
> display
> a long or a short summary), but certainly not for clustering reasons.
>
> What you and others thinks about this?

Bob Carpenter of alias-i had this to say when I brought up this very  
idea:

http://article.gmane.org/gmane.comp.jakarta.lucene.devel/12599

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


Re: [Nutch-dev] Re: svn commit: r405565 - in /lucene/nutch/trunk/src: java/org/apache/nutch/searcher/ test/org/apache/nutch/searcher/ web/jsp/

Posted by Jérôme Charron <je...@gmail.com>.
> Bob Carpenter of alias-i had this to say when I brought up this very
> idea:
> http://article.gmane.org/gmane.comp.jakarta.lucene.devel/12599

Thanks for you response Marvin.
But finally my question is : shouldn't the nutch clustering uses some
fixed size snippets instead of the configurable displayed size?

Jérôme

-- 
http://motrech.free.fr/
http://www.frutch.org/