You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Ankit Sharma <an...@gmail.com> on 2014/09/10 19:14:58 UTC

Seeking advice: Contribute in Mahout ML algorithm implementation

Hello,

I have been an user of Mahout for quite sometime now and got really exited
when I heard mahout is moving to Spark. Today I played around with Linear
Regression example and browsed some of the spark Machine Learning(ML) code.
It was really interesting to see how intuitive the entire process is.

I have background in data science model building and I would like to
contribute in the development process. So, I would like to get some advice
on what has already been completed on ML side and from where I can start?

I have couple of ideas like I can start with either some classification
algorithm like SVM or build(enhance) some simple building blocks. You can
throw in your suggestions and I'll be see which one falls into my domain,
and try to work on them.

thanks & best regards,

Ankit Sharma
Data Science Professional
_______________________
Mobile: +91-9632383141
Email: ankitksharma11588@gmail.com
Skype: aksharma11588
LinkedIn <http://in.linkedin.com/in/aks11588/> | Digg Data
<http://www.diggdata.in/> | about.me <http://about.me/ankitksharma>

Re: Seeking advice: Contribute in Mahout ML algorithm implementation

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
we are talking about binary operators, so these are combinations of types,
but for most part, yes, this is currently working poorly for most part for
combinations including at least one sparse operand type.

On Wed, Sep 10, 2014 at 11:54 AM, peng <pc...@uowmail.edu.au> wrote:

> Is it mostly sparse or non-sparse? Both of them are single-node library so
> they seems not possible to use directly.
>
>
> On 09/10/2014 01:29 PM, Dmitriy Lyubimov wrote:
>
>> The biggest problem today (in my opinion) is mahout-math.
>>
>> (1) cost/type based optimization of matrix-matrix multiplication
>> (2) cost/type based optimization of elementwise matrix-matrix operations
>>
>> There is already some work done there, especially in the realm of
>> vector-vector opreations, so matrix-matrix operations that work with
>> matrices backed by a set of vectors, should naturally benefit from that.
>>
>> Other two noble goals have been:
>>
>> (3) jBLAS backed matrices, including a part of (1) and (2)
>> (4) JCuda backed matrices, including as a part of (1) and (2)
>>
>> Otherwise, if you are interested in writing yet-another quasi-algebraic
>> solver methodology, it is a second priority but would be welcome provided
>> you provide references to principled approach and its adaptation to scaled
>> operations strategy, for review, and as long as long as preferrably this
>> method is not yet part of MLib in spark.
>>
>> -d
>>
>>
>>
>> On Wed, Sep 10, 2014 at 10:14 AM, Ankit Sharma <
>> ankitksharma11588@gmail.com>
>> wrote:
>>
>>  Hello,
>>>
>>> I have been an user of Mahout for quite sometime now and got really
>>> exited
>>> when I heard mahout is moving to Spark. Today I played around with Linear
>>> Regression example and browsed some of the spark Machine Learning(ML)
>>> code.
>>> It was really interesting to see how intuitive the entire process is.
>>>
>>> I have background in data science model building and I would like to
>>> contribute in the development process. So, I would like to get some
>>> advice
>>> on what has already been completed on ML side and from where I can start?
>>>
>>> I have couple of ideas like I can start with either some classification
>>> algorithm like SVM or build(enhance) some simple building blocks. You can
>>> throw in your suggestions and I'll be see which one falls into my domain,
>>> and try to work on them.
>>>
>>> thanks & best regards,
>>>
>>> Ankit Sharma
>>> Data Science Professional
>>> _______________________
>>> Mobile: +91-9632383141
>>> Email: ankitksharma11588@gmail.com
>>> Skype: aksharma11588
>>> LinkedIn <http://in.linkedin.com/in/aks11588/> | Digg Data
>>> <http://www.diggdata.in/> | about.me <http://about.me/ankitksharma>
>>>
>>>
>

Re: Seeking advice: Contribute in Mahout ML algorithm implementation

Posted by peng <pc...@uowmail.edu.au>.
Is it mostly sparse or non-sparse? Both of them are single-node library 
so they seems not possible to use directly.

On 09/10/2014 01:29 PM, Dmitriy Lyubimov wrote:
> The biggest problem today (in my opinion) is mahout-math.
>
> (1) cost/type based optimization of matrix-matrix multiplication
> (2) cost/type based optimization of elementwise matrix-matrix operations
>
> There is already some work done there, especially in the realm of
> vector-vector opreations, so matrix-matrix operations that work with
> matrices backed by a set of vectors, should naturally benefit from that.
>
> Other two noble goals have been:
>
> (3) jBLAS backed matrices, including a part of (1) and (2)
> (4) JCuda backed matrices, including as a part of (1) and (2)
>
> Otherwise, if you are interested in writing yet-another quasi-algebraic
> solver methodology, it is a second priority but would be welcome provided
> you provide references to principled approach and its adaptation to scaled
> operations strategy, for review, and as long as long as preferrably this
> method is not yet part of MLib in spark.
>
> -d
>
>
>
> On Wed, Sep 10, 2014 at 10:14 AM, Ankit Sharma <an...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I have been an user of Mahout for quite sometime now and got really exited
>> when I heard mahout is moving to Spark. Today I played around with Linear
>> Regression example and browsed some of the spark Machine Learning(ML) code.
>> It was really interesting to see how intuitive the entire process is.
>>
>> I have background in data science model building and I would like to
>> contribute in the development process. So, I would like to get some advice
>> on what has already been completed on ML side and from where I can start?
>>
>> I have couple of ideas like I can start with either some classification
>> algorithm like SVM or build(enhance) some simple building blocks. You can
>> throw in your suggestions and I'll be see which one falls into my domain,
>> and try to work on them.
>>
>> thanks & best regards,
>>
>> Ankit Sharma
>> Data Science Professional
>> _______________________
>> Mobile: +91-9632383141
>> Email: ankitksharma11588@gmail.com
>> Skype: aksharma11588
>> LinkedIn <http://in.linkedin.com/in/aks11588/> | Digg Data
>> <http://www.diggdata.in/> | about.me <http://about.me/ankitksharma>
>>


Re: Seeking advice: Contribute in Mahout ML algorithm implementation

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
No. This algorithm can be one of algorithms to work with, but complexity is
also affected by matrix structure ability to access random element and
produce non-zero elements on an iteration. more detailed discussion is
currently going on PR 44 https://github.com/apache/mahout/pull/44

On Thu, Sep 11, 2014 at 12:24 AM, Ankit Sharma <an...@gmail.com>
wrote:

> Hi Dmitriy,
>
> Did you mean something like implementing *"Strassen algorithm"* for matrix
> multiplication?
>
> thanks & best regards,
>
> Ankit
>
> On Wed, Sep 10, 2014 at 10:59 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > The biggest problem today (in my opinion) is mahout-math.
> >
> > (1) cost/type based optimization of matrix-matrix multiplication
> > (2) cost/type based optimization of elementwise matrix-matrix operations
> >
> > There is already some work done there, especially in the realm of
> > vector-vector opreations, so matrix-matrix operations that work with
> > matrices backed by a set of vectors, should naturally benefit from that.
> >
> > Other two noble goals have been:
> >
> > (3) jBLAS backed matrices, including a part of (1) and (2)
> > (4) JCuda backed matrices, including as a part of (1) and (2)
> >
> > Otherwise, if you are interested in writing yet-another quasi-algebraic
> > solver methodology, it is a second priority but would be welcome provided
> > you provide references to principled approach and its adaptation to
> scaled
> > operations strategy, for review, and as long as long as preferrably this
> > method is not yet part of MLib in spark.
> >
> > -d
> >
> >
> >
> > On Wed, Sep 10, 2014 at 10:14 AM, Ankit Sharma <
> > ankitksharma11588@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > I have been an user of Mahout for quite sometime now and got really
> > exited
> > > when I heard mahout is moving to Spark. Today I played around with
> Linear
> > > Regression example and browsed some of the spark Machine Learning(ML)
> > code.
> > > It was really interesting to see how intuitive the entire process is.
> > >
> > > I have background in data science model building and I would like to
> > > contribute in the development process. So, I would like to get some
> > advice
> > > on what has already been completed on ML side and from where I can
> start?
> > >
> > > I have couple of ideas like I can start with either some classification
> > > algorithm like SVM or build(enhance) some simple building blocks. You
> can
> > > throw in your suggestions and I'll be see which one falls into my
> domain,
> > > and try to work on them.
> > >
> > > thanks & best regards,
> > >
> > > Ankit Sharma
> > > Data Science Professional
> > > _______________________
> > > Mobile: +91-9632383141
> > > Email: ankitksharma11588@gmail.com
> > > Skype: aksharma11588
> > > LinkedIn <http://in.linkedin.com/in/aks11588/> | Digg Data
> > > <http://www.diggdata.in/> | about.me <http://about.me/ankitksharma>
> > >
> >
>

Re: Seeking advice: Contribute in Mahout ML algorithm implementation

Posted by Ankit Sharma <an...@gmail.com>.
Sure Udaykiran, why not! Let finalize on what we should implement though
this mail chain.

regards,


thanks & best regards,

Ankit Sharma
Data Science Professional
_______________________
Mobile: +91-9632383141
Email: ankitksharma11588@gmail.com
Skype: aksharma11588
LinkedIn <http://in.linkedin.com/in/aks11588/> | Digg Data
<http://www.diggdata.in/> | about.me <http://about.me/ankitksharma>



On Thu, Sep 11, 2014 at 7:31 PM, uday kiran <ma...@gmail.com>
wrote:

> Hi Ankit,
>
> I am also interested in contributing to open source projects.
>
> I am waiting for the opportunity.I am a java developer having 4 years of
> experience.
>
> If you wish to include me in any development related issues and new
> requirements i am ready work.
>
> Regards,
> Udaykiran M
>
> On Thu, Sep 11, 2014 at 8:24 AM, Ankit Sharma <ankitksharma11588@gmail.com
> > wrote:
>
>> Hi Dmitriy,
>>
>> Did you mean something like implementing *"Strassen algorithm"* for matrix
>>
>> multiplication?
>>
>> thanks & best regards,
>>
>> Ankit
>>
>> On Wed, Sep 10, 2014 at 10:59 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> wrote:
>>
>> > The biggest problem today (in my opinion) is mahout-math.
>> >
>> > (1) cost/type based optimization of matrix-matrix multiplication
>> > (2) cost/type based optimization of elementwise matrix-matrix operations
>> >
>> > There is already some work done there, especially in the realm of
>> > vector-vector opreations, so matrix-matrix operations that work with
>> > matrices backed by a set of vectors, should naturally benefit from that.
>> >
>> > Other two noble goals have been:
>> >
>> > (3) jBLAS backed matrices, including a part of (1) and (2)
>> > (4) JCuda backed matrices, including as a part of (1) and (2)
>> >
>> > Otherwise, if you are interested in writing yet-another quasi-algebraic
>> > solver methodology, it is a second priority but would be welcome
>> provided
>> > you provide references to principled approach and its adaptation to
>> scaled
>> > operations strategy, for review, and as long as long as preferrably this
>> > method is not yet part of MLib in spark.
>> >
>> > -d
>> >
>> >
>> >
>> > On Wed, Sep 10, 2014 at 10:14 AM, Ankit Sharma <
>> > ankitksharma11588@gmail.com>
>> > wrote:
>> >
>> > > Hello,
>> > >
>> > > I have been an user of Mahout for quite sometime now and got really
>> > exited
>> > > when I heard mahout is moving to Spark. Today I played around with
>> Linear
>> > > Regression example and browsed some of the spark Machine Learning(ML)
>> > code.
>> > > It was really interesting to see how intuitive the entire process is.
>> > >
>> > > I have background in data science model building and I would like to
>> > > contribute in the development process. So, I would like to get some
>> > advice
>> > > on what has already been completed on ML side and from where I can
>> start?
>> > >
>> > > I have couple of ideas like I can start with either some
>> classification
>> > > algorithm like SVM or build(enhance) some simple building blocks. You
>> can
>> > > throw in your suggestions and I'll be see which one falls into my
>> domain,
>> > > and try to work on them.
>> > >
>> > > thanks & best regards,
>> > >
>> > > Ankit Sharma
>> > > Data Science Professional
>> > > _______________________
>> > > Mobile: +91-9632383141
>> > > Email: ankitksharma11588@gmail.com
>> > > Skype: aksharma11588
>> > > LinkedIn <http://in.linkedin.com/in/aks11588/> | Digg Data
>> > > <http://www.diggdata.in/> | about.me <http://about.me/ankitksharma>
>> > >
>> >
>>
>
>

Re: Seeking advice: Contribute in Mahout ML algorithm implementation

Posted by Ankit Sharma <an...@gmail.com>.
Hi Dmitriy,

Did you mean something like implementing *"Strassen algorithm"* for matrix
multiplication?

thanks & best regards,

Ankit

On Wed, Sep 10, 2014 at 10:59 PM, Dmitriy Lyubimov <dl...@gmail.com>
wrote:

> The biggest problem today (in my opinion) is mahout-math.
>
> (1) cost/type based optimization of matrix-matrix multiplication
> (2) cost/type based optimization of elementwise matrix-matrix operations
>
> There is already some work done there, especially in the realm of
> vector-vector opreations, so matrix-matrix operations that work with
> matrices backed by a set of vectors, should naturally benefit from that.
>
> Other two noble goals have been:
>
> (3) jBLAS backed matrices, including a part of (1) and (2)
> (4) JCuda backed matrices, including as a part of (1) and (2)
>
> Otherwise, if you are interested in writing yet-another quasi-algebraic
> solver methodology, it is a second priority but would be welcome provided
> you provide references to principled approach and its adaptation to scaled
> operations strategy, for review, and as long as long as preferrably this
> method is not yet part of MLib in spark.
>
> -d
>
>
>
> On Wed, Sep 10, 2014 at 10:14 AM, Ankit Sharma <
> ankitksharma11588@gmail.com>
> wrote:
>
> > Hello,
> >
> > I have been an user of Mahout for quite sometime now and got really
> exited
> > when I heard mahout is moving to Spark. Today I played around with Linear
> > Regression example and browsed some of the spark Machine Learning(ML)
> code.
> > It was really interesting to see how intuitive the entire process is.
> >
> > I have background in data science model building and I would like to
> > contribute in the development process. So, I would like to get some
> advice
> > on what has already been completed on ML side and from where I can start?
> >
> > I have couple of ideas like I can start with either some classification
> > algorithm like SVM or build(enhance) some simple building blocks. You can
> > throw in your suggestions and I'll be see which one falls into my domain,
> > and try to work on them.
> >
> > thanks & best regards,
> >
> > Ankit Sharma
> > Data Science Professional
> > _______________________
> > Mobile: +91-9632383141
> > Email: ankitksharma11588@gmail.com
> > Skype: aksharma11588
> > LinkedIn <http://in.linkedin.com/in/aks11588/> | Digg Data
> > <http://www.diggdata.in/> | about.me <http://about.me/ankitksharma>
> >
>

Re: Seeking advice: Contribute in Mahout ML algorithm implementation

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
The biggest problem today (in my opinion) is mahout-math.

(1) cost/type based optimization of matrix-matrix multiplication
(2) cost/type based optimization of elementwise matrix-matrix operations

There is already some work done there, especially in the realm of
vector-vector opreations, so matrix-matrix operations that work with
matrices backed by a set of vectors, should naturally benefit from that.

Other two noble goals have been:

(3) jBLAS backed matrices, including a part of (1) and (2)
(4) JCuda backed matrices, including as a part of (1) and (2)

Otherwise, if you are interested in writing yet-another quasi-algebraic
solver methodology, it is a second priority but would be welcome provided
you provide references to principled approach and its adaptation to scaled
operations strategy, for review, and as long as long as preferrably this
method is not yet part of MLib in spark.

-d



On Wed, Sep 10, 2014 at 10:14 AM, Ankit Sharma <an...@gmail.com>
wrote:

> Hello,
>
> I have been an user of Mahout for quite sometime now and got really exited
> when I heard mahout is moving to Spark. Today I played around with Linear
> Regression example and browsed some of the spark Machine Learning(ML) code.
> It was really interesting to see how intuitive the entire process is.
>
> I have background in data science model building and I would like to
> contribute in the development process. So, I would like to get some advice
> on what has already been completed on ML side and from where I can start?
>
> I have couple of ideas like I can start with either some classification
> algorithm like SVM or build(enhance) some simple building blocks. You can
> throw in your suggestions and I'll be see which one falls into my domain,
> and try to work on them.
>
> thanks & best regards,
>
> Ankit Sharma
> Data Science Professional
> _______________________
> Mobile: +91-9632383141
> Email: ankitksharma11588@gmail.com
> Skype: aksharma11588
> LinkedIn <http://in.linkedin.com/in/aks11588/> | Digg Data
> <http://www.diggdata.in/> | about.me <http://about.me/ankitksharma>
>