You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by sharath jagannath <sh...@gmail.com> on 2011/02/01 19:28:04 UTC

Incremental data stream clustering.

Hey All,

Another new bie to mahout.
I want to implement a system that clusters incoming data stream.
went through mahout clustering tutorials but I am still not sure how to
handle dynamic evolution of the cluster in mahout.
To be specific, I am trying to cluster the content from a RSS feed and not
sure how I should be using mahout to achieve it, are mahout clustering
algorithms incremental?

I was looking at interfaces like weka's incremental cluster in mahout to
achieve this and I am lost :D.
All help is much appreciated.


Thanks,
Sharath

Re: Incremental data stream clustering.

Posted by sharath jagannath <sh...@gmail.com>.
Thank you Vineet.
I will try it out but I am not sure how similar to articles will be in each
pass.
Anyways, I would love to see how it performs.

Thanks,
Sharath

On Tue, Feb 1, 2011 at 10:59 AM, vineet yadav
<vi...@gmail.com>wrote:

> Hi Sarath,
> In mahout k-mean clustering, sequence file of  initial cluster center is
> passed as a argument. You can run k-mean clustering algorithm
> incrementally.
> During each pass of k-mean clustering, you can pass cluster which are
> computed in earlier  stage of k-mean clustering  as initial cluster
> centers.
> But you need to make sure documents/posts in each pass are related for
> better result.
> Thanks
> Vineet Yadav
>
> On Tue, Feb 1, 2011 at 11:58 PM, sharath jagannath <
> sharathjagannath@gmail.com> wrote:
>
> > Hey All,
> >
> > Another new bie to mahout.
> > I want to implement a system that clusters incoming data stream.
> > went through mahout clustering tutorials but I am still not sure how to
> > handle dynamic evolution of the cluster in mahout.
> > To be specific, I am trying to cluster the content from a RSS feed and
> not
> > sure how I should be using mahout to achieve it, are mahout clustering
> > algorithms incremental?
> >
> > I was looking at interfaces like weka's incremental cluster in mahout to
> > achieve this and I am lost :D.
> > All help is much appreciated.
> >
> >
> > Thanks,
> > Sharath
> >
>

Re: Incremental data stream clustering.

Posted by vineet yadav <vi...@gmail.com>.
Hi Sarath,
In mahout k-mean clustering, sequence file of  initial cluster center is
passed as a argument. You can run k-mean clustering algorithm incrementally.
During each pass of k-mean clustering, you can pass cluster which are
computed in earlier  stage of k-mean clustering  as initial cluster centers.
But you need to make sure documents/posts in each pass are related for
better result.
Thanks
Vineet Yadav

On Tue, Feb 1, 2011 at 11:58 PM, sharath jagannath <
sharathjagannath@gmail.com> wrote:

> Hey All,
>
> Another new bie to mahout.
> I want to implement a system that clusters incoming data stream.
> went through mahout clustering tutorials but I am still not sure how to
> handle dynamic evolution of the cluster in mahout.
> To be specific, I am trying to cluster the content from a RSS feed and not
> sure how I should be using mahout to achieve it, are mahout clustering
> algorithms incremental?
>
> I was looking at interfaces like weka's incremental cluster in mahout to
> achieve this and I am lost :D.
> All help is much appreciated.
>
>
> Thanks,
> Sharath
>