You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Dawid Weiss <da...@cs.put.poznan.pl> on 2008/03/27 08:58:08 UTC

Re: [?? Probable Spam] 答复: kMeans

Carrot2 is for clustering web search results -- it's not exactly the same thing.

D.

shunkai.fu wrote:
> There is one project called Carrot2 focusing on this topic already.
> 
> -----邮件原件-----
> 发件人: Marko Novakovic [mailto:atisha34@yahoo.com] 
> 发送时间: 2008年3月27日 7:03
> 收件人: mahout-dev@lucene.apache.org
> 主题: kMeans
> 
> Is good idea to apply project for integrating kMeans
> algorithm to clustering web pages?
> 
> 
>  
> ____________________________________________________________________________
> ________
> Never miss a thing.  Make Yahoo your home page. 
> http://www.yahoo.com/r/hs
> 

Re: kMeans

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.
Hi Marko,

> Is it acceptable solution for Google Summer of Code?

I don't think it's an acceptable project for Mahout -- Mahout goals are in large 
data set processing, supported by Map-Reduce. Clustering search results is 
usually in-memory, on-line clustering with few information sources (titles, 
snippets) and the resulting high noise.

That said, what I envisage could be done is to work on data structures that 
could _support_ sensible on-line faceting/clustering of search results, 
similarly to what Google supposedly does behind the scenes to reorder search 
results (similar concept clustering). Building semantic relationships between 
terms or detecting frequently recurring phrases with significantly different 
meanings is definitely interesting and challenging (if not done naively), 
especially on large scale.

Dawid

Re: kMeans

Posted by Marko Novakovic <at...@yahoo.com>.
Is it acceptable solution for Google Summer of Code?

--- Dawid Weiss <da...@cs.put.poznan.pl> wrote:

> 
> Carrot2 is for clustering web search results -- it's
> not exactly the same thing.
> 
> D.
> 
> shunkai.fu wrote:
> > There is one project called Carrot2 focusing on
> this topic already.
> > 
> > -----邮件原件-----
> > 发件人: Marko Novakovic
> [mailto:atisha34@yahoo.com] 
> > 发送时间: 2008年3月27日 7:03
> > 收件人: mahout-dev@lucene.apache.org
> > 主题: kMeans
> > 
> > Is good idea to apply project for integrating
> kMeans
> > algorithm to clustering web pages?
> > 
> > 
> >  
> >
>
____________________________________________________________________________
> > ________
> > Never miss a thing.  Make Yahoo your home page. 
> > http://www.yahoo.com/r/hs
> > 
> 



      ____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs