You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by FRANCISCO XAVIER SUMBA TORAL <xa...@ucuenca.ec> on 2016/05/22 03:46:24 UTC

Clustering options

Hi,

Since clustering algorithms are deprecated in mahout samsara. How can I
make use of mahout to run a clustering algorithm. Basically, I use mahout
to cluster paper's keywords. I take a bunch of keywords and I cluster them
to find groups of related keywords. How can I update my code to mahout
samsara any suggestion?

Cheers

Re: Clustering options

Posted by FRANCISCO XAVIER SUMBA TORAL <xa...@ucuenca.ec>.
Actually, I was confused between MLlib and Samsara, thanks Pat. I was already reading the documentation of MLlib. 

Cheers.


> On May 24, 2016, at 10:48, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
> Mahout Samsara is more about rolling your own algo, though it has already implemented several as examples. If you want to build your own clustering you will find a lot of what you need in the R-like DSL. 
> 
> But if you want something already built you may want to look at Spark’s MLlib kmeans.
> 
> People often ask; what is the difference between Mahout and MLlib? MLlib is a collection of algos, Mahout is an optimized tensor math engine with many extensions and several algos. You can’t do the matrix A’B in MLlib because it’s not an algo, it’s a bit of math—a very useful bit.
> 
> 
> On May 23, 2016, at 8:10 PM, FRANCISCO XAVIER SUMBA TORAL <xa...@ucuenca.ec> wrote:
> 
> Hi Dmitriy,
> 
> Thanks for your clarification.
> 
> Cheers.
> 
> 
>> On May 23, 2016, at 12:00, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>> 
>> Xavier,
>> there are no exact equivalents in public domain to algorithms existed for
>> MR clustering as of yet. My understanding some of them are on the roadmap
>> though.
>> 
>> depending on the level of sophistication you require, some of them are very
>> easy to build though.
>> 
>> On Sat, May 21, 2016 at 8:46 PM, FRANCISCO XAVIER SUMBA TORAL <
>> xavier.sumba93@ucuenca.ec> wrote:
>> 
>>> Hi,
>>> 
>>> Since clustering algorithms are deprecated in mahout samsara. How can I
>>> make use of mahout to run a clustering algorithm. Basically, I use mahout
>>> to cluster paper's keywords. I take a bunch of keywords and I cluster them
>>> to find groups of related keywords. How can I update my code to mahout
>>> samsara any suggestion?
>>> 
>>> Cheers
>>> 
> 
> 


Re: Clustering options

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Mahout Samsara is more about rolling your own algo, though it has already implemented several as examples. If you want to build your own clustering you will find a lot of what you need in the R-like DSL. 

But if you want something already built you may want to look at Spark’s MLlib kmeans.

People often ask; what is the difference between Mahout and MLlib? MLlib is a collection of algos, Mahout is an optimized tensor math engine with many extensions and several algos. You can’t do the matrix A’B in MLlib because it’s not an algo, it’s a bit of math—a very useful bit.


On May 23, 2016, at 8:10 PM, FRANCISCO XAVIER SUMBA TORAL <xa...@ucuenca.ec> wrote:

Hi Dmitriy,

Thanks for your clarification.

Cheers.


> On May 23, 2016, at 12:00, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> 
> Xavier,
> there are no exact equivalents in public domain to algorithms existed for
> MR clustering as of yet. My understanding some of them are on the roadmap
> though.
> 
> depending on the level of sophistication you require, some of them are very
> easy to build though.
> 
> On Sat, May 21, 2016 at 8:46 PM, FRANCISCO XAVIER SUMBA TORAL <
> xavier.sumba93@ucuenca.ec> wrote:
> 
>> Hi,
>> 
>> Since clustering algorithms are deprecated in mahout samsara. How can I
>> make use of mahout to run a clustering algorithm. Basically, I use mahout
>> to cluster paper's keywords. I take a bunch of keywords and I cluster them
>> to find groups of related keywords. How can I update my code to mahout
>> samsara any suggestion?
>> 
>> Cheers
>> 



Re: Clustering options

Posted by FRANCISCO XAVIER SUMBA TORAL <xa...@ucuenca.ec>.
Hi Dmitriy,

Thanks for your clarification.

Cheers.


> On May 23, 2016, at 12:00, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> 
> Xavier,
> there are no exact equivalents in public domain to algorithms existed for
> MR clustering as of yet. My understanding some of them are on the roadmap
> though.
> 
> depending on the level of sophistication you require, some of them are very
> easy to build though.
> 
> On Sat, May 21, 2016 at 8:46 PM, FRANCISCO XAVIER SUMBA TORAL <
> xavier.sumba93@ucuenca.ec> wrote:
> 
>> Hi,
>> 
>> Since clustering algorithms are deprecated in mahout samsara. How can I
>> make use of mahout to run a clustering algorithm. Basically, I use mahout
>> to cluster paper's keywords. I take a bunch of keywords and I cluster them
>> to find groups of related keywords. How can I update my code to mahout
>> samsara any suggestion?
>> 
>> Cheers
>> 


Re: Clustering options

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Xavier,
there are no exact equivalents in public domain to algorithms existed for
MR clustering as of yet. My understanding some of them are on the roadmap
though.

depending on the level of sophistication you require, some of them are very
easy to build though.

On Sat, May 21, 2016 at 8:46 PM, FRANCISCO XAVIER SUMBA TORAL <
xavier.sumba93@ucuenca.ec> wrote:

> Hi,
>
> Since clustering algorithms are deprecated in mahout samsara. How can I
> make use of mahout to run a clustering algorithm. Basically, I use mahout
> to cluster paper's keywords. I take a bunch of keywords and I cluster them
> to find groups of related keywords. How can I update my code to mahout
> samsara any suggestion?
>
> Cheers
>