You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Ahmet Ylmaz <ah...@yahoo.com> on 2013/04/08 23:16:37 UTC

In-memory kmeans clustering

Hi,

It seems to be that in-memory kmeans clustering is removed from Mahout 0.7.

Does this mean that it is no longer possible to do in-memory kmeans clustering with Mahout?
Or, is Hadoop based kmeans clustering the only option?


Thanks
Ahmet

Re: In-memory kmeans clustering

Posted by Ahmet Ylmaz <ah...@yahoo.com>.

Thanks, we will try MapReduce version as you described




________________________________
 From: Dan Filimon <da...@gmail.com>
To: user@mahout.apache.org 
Sent: Wednesday, April 10, 2013 1:19 PM
Subject: Re: In-memory kmeans clustering
 
Thanks! I actually didn't know you can do that. :)ha


On Tue, Apr 9, 2013 at 7:22 PM, Johannes Schulte <johannes.schulte@gmail.com
> wrote:

> dataPoints can be in memory or from disk, and you can sample the dataPoints
> for initialClusters.
>
>
> On Tue, Apr 9, 2013 at 6:16 PM, Johannes Schulte <
> johannes.schulte@gmail.com
> > wrote:
>
> > Hi,
> > this worked for me without having to fiddle with map reduce classes
> >
> >  List<Cluster> initialClusters = new ArrayList<Cluster>();
> >
> >         Iterable<Vector> dataPoints = Lists.newArrayList();
> >
> >
> >         ClusterClassifier prior =
> >
> >                 new ClusterClassifier(initialClusters,
> >
> >                         new KMeansClusteringPolicy(0.01));
> >
> >
> >         ClusterClassifier clustered =
> newClusterIterator().iterate(dataPoints, prior, 10);
> >
> >         List<Cluster> finalClusters = clustered.getModels();
> >
> >
> > On Tue, Apr 9, 2013 at 4:29 PM, Dan Filimon <dangeorge.filimon@gmail.com
> >wrote:
> >
> >> Apologies for not getting back to you more quickly!
> >>
> >> You can use Mahout KMeansDriver and have it run locally (so not as a
> >> MapReduce, but locally).
> >> There's a static method KMeansDriver.run() whose last argument is
> >> runSequential. You need to set this to true.
> >>
> >> The thing is it will still read and write the vectors to disk. Is this
> >> okay?
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Apr 9, 2013 at 5:24 PM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >>
> >> > This seems surprising.
> >> >
> >> > I don't think we removed it.
> >> >
> >> > Does anybody know better than I?
> >> >
> >> >
> >> > On Mon, Apr 8, 2013 at 2:16 PM, Ahmet Ylmaz <
> >> ahmetyilmazefendi@yahoo.com
> >> > >wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > It seems to be that in-memory kmeans clustering is removed from
> Mahout
> >> > 0.7.
> >> > >
> >> > > Does this mean that it is no longer possible to do in-memory kmeans
> >> > > clustering with Mahout?
> >> > > Or, is Hadoop based kmeans clustering the only option?
> >> > >
> >> > >
> >> > > Thanks
> >> > > Ahmet
> >> > >
> >> >
> >>
> >
> >
>

Re: In-memory kmeans clustering

Posted by Dan Filimon <da...@gmail.com>.

Thanks! I actually didn't know you can do that. :)


On Tue, Apr 9, 2013 at 7:22 PM, Johannes Schulte <johannes.schulte@gmail.com
> wrote:

> dataPoints can be in memory or from disk, and you can sample the dataPoints
> for initialClusters.
>
>
> On Tue, Apr 9, 2013 at 6:16 PM, Johannes Schulte <
> johannes.schulte@gmail.com
> > wrote:
>
> > Hi,
> > this worked for me without having to fiddle with map reduce classes
> >
> >  List<Cluster> initialClusters = new ArrayList<Cluster>();
> >
> >         Iterable<Vector> dataPoints = Lists.newArrayList();
> >
> >
> >         ClusterClassifier prior =
> >
> >                 new ClusterClassifier(initialClusters,
> >
> >                         new KMeansClusteringPolicy(0.01));
> >
> >
> >         ClusterClassifier clustered =
> newClusterIterator().iterate(dataPoints, prior, 10);
> >
> >         List<Cluster> finalClusters = clustered.getModels();
> >
> >
> > On Tue, Apr 9, 2013 at 4:29 PM, Dan Filimon <dangeorge.filimon@gmail.com
> >wrote:
> >
> >> Apologies for not getting back to you more quickly!
> >>
> >> You can use Mahout KMeansDriver and have it run locally (so not as a
> >> MapReduce, but locally).
> >> There's a static method KMeansDriver.run() whose last argument is
> >> runSequential. You need to set this to true.
> >>
> >> The thing is it will still read and write the vectors to disk. Is this
> >> okay?
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Apr 9, 2013 at 5:24 PM, Ted Dunning <te...@gmail.com>
> >> wrote:
> >>
> >> > This seems surprising.
> >> >
> >> > I don't think we removed it.
> >> >
> >> > Does anybody know better than I?
> >> >
> >> >
> >> > On Mon, Apr 8, 2013 at 2:16 PM, Ahmet Ylmaz <
> >> ahmetyilmazefendi@yahoo.com
> >> > >wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > It seems to be that in-memory kmeans clustering is removed from
> Mahout
> >> > 0.7.
> >> > >
> >> > > Does this mean that it is no longer possible to do in-memory kmeans
> >> > > clustering with Mahout?
> >> > > Or, is Hadoop based kmeans clustering the only option?
> >> > >
> >> > >
> >> > > Thanks
> >> > > Ahmet
> >> > >
> >> >
> >>
> >
> >
>

Re: In-memory kmeans clustering

Posted by Johannes Schulte <jo...@gmail.com>.

dataPoints can be in memory or from disk, and you can sample the dataPoints
for initialClusters.


On Tue, Apr 9, 2013 at 6:16 PM, Johannes Schulte <johannes.schulte@gmail.com
> wrote:

> Hi,
> this worked for me without having to fiddle with map reduce classes
>
>  List<Cluster> initialClusters = new ArrayList<Cluster>();
>
>         Iterable<Vector> dataPoints = Lists.newArrayList();
>
>
>         ClusterClassifier prior =
>
>                 new ClusterClassifier(initialClusters,
>
>                         new KMeansClusteringPolicy(0.01));
>
>
>         ClusterClassifier clustered = newClusterIterator().iterate(dataPoints, prior, 10);
>
>         List<Cluster> finalClusters = clustered.getModels();
>
>
> On Tue, Apr 9, 2013 at 4:29 PM, Dan Filimon <da...@gmail.com>wrote:
>
>> Apologies for not getting back to you more quickly!
>>
>> You can use Mahout KMeansDriver and have it run locally (so not as a
>> MapReduce, but locally).
>> There's a static method KMeansDriver.run() whose last argument is
>> runSequential. You need to set this to true.
>>
>> The thing is it will still read and write the vectors to disk. Is this
>> okay?
>>
>>
>>
>>
>>
>> On Tue, Apr 9, 2013 at 5:24 PM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>> > This seems surprising.
>> >
>> > I don't think we removed it.
>> >
>> > Does anybody know better than I?
>> >
>> >
>> > On Mon, Apr 8, 2013 at 2:16 PM, Ahmet Ylmaz <
>> ahmetyilmazefendi@yahoo.com
>> > >wrote:
>> >
>> > > Hi,
>> > >
>> > > It seems to be that in-memory kmeans clustering is removed from Mahout
>> > 0.7.
>> > >
>> > > Does this mean that it is no longer possible to do in-memory kmeans
>> > > clustering with Mahout?
>> > > Or, is Hadoop based kmeans clustering the only option?
>> > >
>> > >
>> > > Thanks
>> > > Ahmet
>> > >
>> >
>>
>
>

Re: In-memory kmeans clustering

Posted by Johannes Schulte <jo...@gmail.com>.

Hi,
this worked for me without having to fiddle with map reduce classes

 List<Cluster> initialClusters = new ArrayList<Cluster>();

        Iterable<Vector> dataPoints = Lists.newArrayList();


        ClusterClassifier prior =

                new ClusterClassifier(initialClusters,

                        new KMeansClusteringPolicy(0.01));


        ClusterClassifier clustered =
newClusterIterator().iterate(dataPoints, prior, 10);

        List<Cluster> finalClusters = clustered.getModels();


On Tue, Apr 9, 2013 at 4:29 PM, Dan Filimon <da...@gmail.com>wrote:

> Apologies for not getting back to you more quickly!
>
> You can use Mahout KMeansDriver and have it run locally (so not as a
> MapReduce, but locally).
> There's a static method KMeansDriver.run() whose last argument is
> runSequential. You need to set this to true.
>
> The thing is it will still read and write the vectors to disk. Is this
> okay?
>
>
>
>
>
> On Tue, Apr 9, 2013 at 5:24 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > This seems surprising.
> >
> > I don't think we removed it.
> >
> > Does anybody know better than I?
> >
> >
> > On Mon, Apr 8, 2013 at 2:16 PM, Ahmet Ylmaz <ahmetyilmazefendi@yahoo.com
> > >wrote:
> >
> > > Hi,
> > >
> > > It seems to be that in-memory kmeans clustering is removed from Mahout
> > 0.7.
> > >
> > > Does this mean that it is no longer possible to do in-memory kmeans
> > > clustering with Mahout?
> > > Or, is Hadoop based kmeans clustering the only option?
> > >
> > >
> > > Thanks
> > > Ahmet
> > >
> >
>

Re: In-memory kmeans clustering

Posted by Dan Filimon <da...@gmail.com>.

Apologies for not getting back to you more quickly!

You can use Mahout KMeansDriver and have it run locally (so not as a
MapReduce, but locally).
There's a static method KMeansDriver.run() whose last argument is
runSequential. You need to set this to true.

The thing is it will still read and write the vectors to disk. Is this okay?

On Tue, Apr 9, 2013 at 5:24 PM, Ted Dunning <te...@gmail.com> wrote:

> This seems surprising.
>
> I don't think we removed it.
>
> Does anybody know better than I?
>
>
> On Mon, Apr 8, 2013 at 2:16 PM, Ahmet Ylmaz <ahmetyilmazefendi@yahoo.com
> >wrote:
>
> > Hi,
> >
> > It seems to be that in-memory kmeans clustering is removed from Mahout
> 0.7.
> >
> > Does this mean that it is no longer possible to do in-memory kmeans
> > clustering with Mahout?
> > Or, is Hadoop based kmeans clustering the only option?
> >
> >
> > Thanks
> > Ahmet
> >
>

Re: In-memory kmeans clustering

Posted by Ted Dunning <te...@gmail.com>.

This seems surprising.

I don't think we removed it.

Does anybody know better than I?

On Mon, Apr 8, 2013 at 2:16 PM, Ahmet Ylmaz <ah...@yahoo.com>wrote:

> Hi,
>
> It seems to be that in-memory kmeans clustering is removed from Mahout 0.7.
>
> Does this mean that it is no longer possible to do in-memory kmeans
> clustering with Mahout?
> Or, is Hadoop based kmeans clustering the only option?
>
>
> Thanks
> Ahmet
>