You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Grant Ingersoll <gs...@apache.org> on 2008/03/16 14:50:37 UTC

Hama contribution, [was Re: [jira] Commented: (MAHOUT-16) Hama contrib package for the mahout]

Just to clarify things a little bit:

The Lucene PMC is the appropriate PMC that makes decisions about  
committers.  While there isn't any explicit rules on becoming a  
committer, the criteria is generally:

1)  Active participant in the community, both patch-wise and  
discussion-wise for some reasonable amount of time
2) Patches are high quality, unit tested and easy to apply/verify.  In  
other words, the author works to minimize the limited resources of the  
committer.  The author also stays on top of the patch as issues come up.
3) The person is pleasant to work with and polite.

Now, since Mahout is young, the bar is somewhat lower, especially in  
terms of length of time being around.

So, I guess the takeaway is, that if you view Hama as your project  
that is managed by you and your team, then I am not so sure it is the  
right fit.  I am not sure if I am reading your intent here correctly,  
so please clarify.  The way I understand the language of this Issue  
and the subsequent comments is that you want Hama to be a fairly  
standalone sub project, right?  Hence the list of committers, etc.   
However, by us committing this code, we are saying that we are willing  
to maintain it as a community and that the community sets the  
direction of where it goes and your committership doesn't necessarily  
follow from this patch.  We generally don't elect committers on the  
basis of one patch, although we do sometimes make contrib areas that  
do have separate committers for just that area, but I tend to favor  
full committership.

One thing I am curious about, is why isn't Hama proposed as a  
subproject to Hadoop?  It seems like it is a better logical fit there,  
since it has more uses than just ML and would thus receive a wider  
audience and more opportunity to grow.  Mahout could then take  
advantage of it from there.

Cheers,
Grant

On Mar 16, 2008, at 7:06 AM, Edward Yoon (JIRA) wrote:

>
>    [ https://issues.apache.org/jira/browse/MAHOUT-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579185 
> #action_12579185 ]
>
> Edward Yoon commented on MAHOUT-16:
> -----------------------------------
>
> I'm not sure whether hama will become a contrib of the mahout, If  
> proposal goes through the mahout PMC i would ask for a contrib  
> committer privilege of the mahout project to enable me to manage our  
> project and our members and our issues. I would appreciate any  
> advice you could give me.
>
>
>> Hama contrib package for the mahout
>> -----------------------------------
>>
>>                Key: MAHOUT-16
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-16
>>            Project: Mahout
>>         Issue Type: New Feature
>>        Environment: All environment
>>           Reporter: Edward Yoon
>>        Attachments: hama.tar.gz
>>
>>
>> *Introduction*
>> Hama will develop a high-performance and large-scale parallel  
>> matrix computational package based on Hadoop Map/Reduce. It will be  
>> useful for a massively large-scale Numerical Analysis and Data  
>> Mining, which need the intensive computation power of matrix  
>> inversion, e.g. linear regression, PCA, SVM and etc. It will be  
>> also useful for many scientific applications, e.g. physics  
>> computations, linear algebra, computational fluid dynamics,  
>> statistics, graphic rendering and many more.
>> Hama approach proposes the use of 3-dimensional Row and Column  
>> (Qualifier), Time space and multi-dimensional Columnfamilies of  
>> Hbase (BigTable Clone), which is able to store large sparse and  
>> various type of matrices (e.g. Triangular Matrix, 3D Matrix, and  
>> etc.). its auto-partitioned sparsity sub-structure will be  
>> efficiently managed and serviced by Hbase. Row and Column  
>> operations can be done in linear-time, where several algorithms,  
>> such as structured Gaussian elimination or iterative methods, run  
>> in O(the number of non-zero elements in the matrix / number of  
>> mappers) time on Hadoop Map/Reduce.
>> So, it has a strong relationship with the mahout project, and it  
>> would be great if the "hama" can become a contrib project of the  
>> mahout.
>> *Current Status*
>> In its current state, the 'hama' is buggy and needs filling out,  
>> but generalized matrix interface and basic linear algebra  
>> operations was implemented within a large prototype system. In the  
>> future, We need new parallel algorithms based on Map/Reduce for  
>> performance of heavy decompositions and factorizations. It also  
>> needs tools to compose an arbitrary matrix only with certain data  
>> filtered from hbase array structure.
>> It would be great if we can collaboration with the mahout members.
>> *Members*
>> The initial set of committers includes folks from the Hadoop &  
>> Hbase communities, and We have a master's (or Ph.D) degrees in the  
>> mathematics and computer science.
>> - Edward Yoon (edward AT udanax DOT org)
>> - Chanwit Kaewkasi (chanwit AT gmail DOT com)
>> - Min Cha (minslovey AT gmail DOT com)
>> - Antonio Suh (bluesvm AT gmail DOT com)
>> At least, I and Min Cha will be involved full-time with this work.
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>

Re: Hama contribution, [was Re: [jira] Commented: (MAHOUT-16) Hama contrib package for the mahout]

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.

I would still wait a bit until the code we have is actually put to use. Having 
some real-time applications and demos is the best way to convince people the 
project has a future.

D.

Grant Ingersoll wrote:
> What I would do is ask on Hadoop if there is interest in making it a 
> subproject.  In doing so, keep in mind that it still requires their 
> committers to be willing to take it on (although it should help greatly 
> that you are saying you have 3 or 4 people already who are willing to 
> maintain it.)  You might also benefit at looking at an incubation 
> proposal.  Since some of the code already exists, you _may_ need to do a 
> more full fledged software grant and/or go through incubation.
> 
> -Grant
> 
> On Mar 17, 2008, at 3:53 AM, Min Cha wrote:
> 
>> Hello, Isabel Drost.
>>
>> I`m Min Cha, a fellow worker of edward. Glad to meet you.
>> I  think Hama should aim to general purpose rathen than it becomes a 
>> piece
>> in Mahout.
>>
>> So, I would rather like to be a Hadoop sub project.
>> +1
>>
>> p.s
>> However, if Hahout members want, we could provide Hama dependency library
>> and do to apply feedbacks or additonal requirements from Hahout.
>> I think it`s a good model for collaborating together and improving 
>> Hama and
>> Hahout.
>>
>> 2008/3/17, Isabel Drost <ap...@isabel-drost.de>:
>>>
>>> On Monday 17 March 2008, edward yoon wrote:
>>>> However, basically, i'd like to make the hama package which is a 
>>>> general
>>>> purpose matrix package. So i prefer the area under the mahout as a
>>>> sub-sub-project.
>>>
>>>
>>> If you really prefer your own project over integrating your code with 
>>> the
>>> Mahout codebase, I would also suggest to go for a Hadoop sub project. 
>>> Just
>>> as
>>> Grant already pointed out, the use cases of your matrix package are much
>>> more
>>> general than just machine learning.
>>>
>>>
>>>
>>>> After much discuss with hadoop PMC and mahout-dev, I looked over the
>>>> matter.
>>>
>>>
>>> What was your conclusion? Where do you and the other hama developers see
>>> your
>>> package? Would you rather like to be general enough to be a hadoop sub
>>> project, or would you rather like to put special focus on the 
>>> requirements
>>> of
>>> the machine learning community on your package?
>>>
>>> Isabel
>>>
>>>
>>>
>>> -- 
>>> He who is known as an early riser need not get up until noon.
>>>  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>>>  /,`.-'`'    -.  ;-;;,_
>>>  |,4-  ) )-,_..;\ (  `'-'
>>> '---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>
>>>
>>>
> 
> --------------------------
> Grant Ingersoll
> http://www.lucenebootcamp.com
> Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
> 
> 
> 
> 
>

Re: Hama contribution, [was Re: [jira] Commented: (MAHOUT-16) Hama contrib package for the mahout]

Posted by Grant Ingersoll <gs...@apache.org>.

What I would do is ask on Hadoop if there is interest in making it a  
subproject.  In doing so, keep in mind that it still requires their  
committers to be willing to take it on (although it should help  
greatly that you are saying you have 3 or 4 people already who are  
willing to maintain it.)  You might also benefit at looking at an  
incubation proposal.  Since some of the code already exists, you _may_  
need to do a more full fledged software grant and/or go through  
incubation.

-Grant

On Mar 17, 2008, at 3:53 AM, Min Cha wrote:

> Hello, Isabel Drost.
>
> I`m Min Cha, a fellow worker of edward. Glad to meet you.
> I  think Hama should aim to general purpose rathen than it becomes a  
> piece
> in Mahout.
>
> So, I would rather like to be a Hadoop sub project.
> +1
>
> p.s
> However, if Hahout members want, we could provide Hama dependency  
> library
> and do to apply feedbacks or additonal requirements from Hahout.
> I think it`s a good model for collaborating together and improving  
> Hama and
> Hahout.
>
> 2008/3/17, Isabel Drost <ap...@isabel-drost.de>:
>>
>> On Monday 17 March 2008, edward yoon wrote:
>>> However, basically, i'd like to make the hama package which is a  
>>> general
>>> purpose matrix package. So i prefer the area under the mahout as a
>>> sub-sub-project.
>>
>>
>> If you really prefer your own project over integrating your code  
>> with the
>> Mahout codebase, I would also suggest to go for a Hadoop sub  
>> project. Just
>> as
>> Grant already pointed out, the use cases of your matrix package are  
>> much
>> more
>> general than just machine learning.
>>
>>
>>
>>> After much discuss with hadoop PMC and mahout-dev, I looked over the
>>> matter.
>>
>>
>> What was your conclusion? Where do you and the other hama  
>> developers see
>> your
>> package? Would you rather like to be general enough to be a hadoop  
>> sub
>> project, or would you rather like to put special focus on the  
>> requirements
>> of
>> the machine learning community on your package?
>>
>> Isabel
>>
>>
>>
>> --
>> He who is known as an early riser need not get up until noon.
>>  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>>  /,`.-'`'    -.  ;-;;,_
>>  |,4-  ) )-,_..;\ (  `'-'
>> '---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>
>>
>>

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Hama contribution, [was Re: [jira] Commented: (MAHOUT-16) Hama contrib package for the mahout]

Posted by Min Cha <mi...@gmail.com>.

Hello, Isabel Drost.

I`m Min Cha, a fellow worker of edward. Glad to meet you.
I  think Hama should aim to general purpose rathen than it becomes a piece
in Mahout.

So, I would rather like to be a Hadoop sub project.
+1

p.s
However, if Hahout members want, we could provide Hama dependency library
and do to apply feedbacks or additonal requirements from Hahout.
I think it`s a good model for collaborating together and improving Hama and
Hahout.

2008/3/17, Isabel Drost <ap...@isabel-drost.de>:
>
> On Monday 17 March 2008, edward yoon wrote:
> > However, basically, i'd like to make the hama package which is a general
> > purpose matrix package. So i prefer the area under the mahout as a
> > sub-sub-project.
>
>
> If you really prefer your own project over integrating your code with the
> Mahout codebase, I would also suggest to go for a Hadoop sub project. Just
> as
> Grant already pointed out, the use cases of your matrix package are much
> more
> general than just machine learning.
>
>
>
> > After much discuss with hadoop PMC and mahout-dev, I looked over the
> > matter.
>
>
> What was your conclusion? Where do you and the other hama developers see
> your
> package? Would you rather like to be general enough to be a hadoop sub
> project, or would you rather like to put special focus on the requirements
> of
> the machine learning community on your package?
>
> Isabel
>
>
>
> --
> He who is known as an early riser need not get up until noon.
>   |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>   /,`.-'`'    -.  ;-;;,_
>   |,4-  ) )-,_..;\ (  `'-'
> '---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>
>
>

Re: Hama contribution, [was Re: [jira] Commented: (MAHOUT-16) Hama contrib package for the mahout]

Posted by edward yoon <ed...@udanax.org>.

Well, I don't think matrix needs of learning maching methods for data
mining will be specified from the general matrix package.

But, I'm +1 for the hadoop sub-project.
what do other members think?

Thanks,
Edward.

On 3/17/08, Isabel Drost <ap...@isabel-drost.de> wrote:
> On Monday 17 March 2008, edward yoon wrote:
> > However, basically, i'd like to make the hama package which is a general
> > purpose matrix package. So i prefer the area under the mahout as a
> > sub-sub-project.
>
> If you really prefer your own project over integrating your code with the
> Mahout codebase, I would also suggest to go for a Hadoop sub project. Just as
> Grant already pointed out, the use cases of your matrix package are much more
> general than just machine learning.
>
>
> > After much discuss with hadoop PMC and mahout-dev, I looked over the
> > matter.
>
> What was your conclusion? Where do you and the other hama developers see your
> package? Would you rather like to be general enough to be a hadoop sub
> project, or would you rather like to put special focus on the requirements of
> the machine learning community on your package?
>
> Isabel
>
>
> --
> He who is known as an early riser need not get up until noon.
>  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>  /,`.-'`'    -.  ;-;;,_
>  |,4-  ) )-,_..;\ (  `'-'
> '---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>
>
>


-- 
B. Regards,
Edward yoon @ NHN, corp.

Re: Hama contribution, [was Re: [jira] Commented: (MAHOUT-16) Hama contrib package for the mahout]

Posted by Isabel Drost <ap...@isabel-drost.de>.

On Monday 17 March 2008, edward yoon wrote:
> However, basically, i'd like to make the hama package which is a general
> purpose matrix package. So i prefer the area under the mahout as a
> sub-sub-project. 

If you really prefer your own project over integrating your code with the 
Mahout codebase, I would also suggest to go for a Hadoop sub project. Just as 
Grant already pointed out, the use cases of your matrix package are much more 
general than just machine learning.

> After much discuss with hadoop PMC and mahout-dev, I looked over the
> matter.

What was your conclusion? Where do you and the other hama developers see your 
package? Would you rather like to be general enough to be a hadoop sub 
project, or would you rather like to put special focus on the requirements of 
the machine learning community on your package?

Isabel

-- 
He who is known as an early riser need not get up until noon.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: Hama contribution, [was Re: [jira] Commented: (MAHOUT-16) Hama contrib package for the mahout]

Posted by edward yoon <ed...@udanax.org>.

Thanks for your advice.

I understand what you are say. Then we can trying to manage the
contrib package using JIRA patches with discuss of community for
collaboration. However, basically, i'd like to make the hama package
which is a general purpose matrix package. So i prefer the area under
the mahout as a sub-sub-project.

> One thing I am curious about, is why isn't Hama proposed as a
> subproject to Hadoop?  It seems like it is a better logical fit there,
> since it has more uses than just ML and would thus receive a wider
> audience and more opportunity to grow.  Mahout could then take
> advantage of it from there.

After much discuss with hadoop PMC and mahout-dev, I looked over the matter.
 I would appreciate any advice you could give me.

Thanks,
Edward.

On 3/16/08, Grant Ingersoll <gs...@apache.org> wrote:
> Just to clarify things a little bit:
>
> The Lucene PMC is the appropriate PMC that makes decisions about
> committers.  While there isn't any explicit rules on becoming a
> committer, the criteria is generally:
>
> 1)  Active participant in the community, both patch-wise and
> discussion-wise for some reasonable amount of time
> 2) Patches are high quality, unit tested and easy to apply/verify.  In
> other words, the author works to minimize the limited resources of the
> committer.  The author also stays on top of the patch as issues come up.
> 3) The person is pleasant to work with and polite.
>
> Now, since Mahout is young, the bar is somewhat lower, especially in
> terms of length of time being around.
>
> So, I guess the takeaway is, that if you view Hama as your project
> that is managed by you and your team, then I am not so sure it is the
> right fit.  I am not sure if I am reading your intent here correctly,
> so please clarify.  The way I understand the language of this Issue
> and the subsequent comments is that you want Hama to be a fairly
> standalone sub project, right?  Hence the list of committers, etc.
> However, by us committing this code, we are saying that we are willing
> to maintain it as a community and that the community sets the
> direction of where it goes and your committership doesn't necessarily
> follow from this patch.  We generally don't elect committers on the
> basis of one patch, although we do sometimes make contrib areas that
> do have separate committers for just that area, but I tend to favor
> full committership.
>
> One thing I am curious about, is why isn't Hama proposed as a
> subproject to Hadoop?  It seems like it is a better logical fit there,
> since it has more uses than just ML and would thus receive a wider
> audience and more opportunity to grow.  Mahout could then take
> advantage of it from there.
>
> Cheers,
> Grant
>
> On Mar 16, 2008, at 7:06 AM, Edward Yoon (JIRA) wrote:
>
> >
> >    [ https://issues.apache.org/jira/browse/MAHOUT-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579185
> > #action_12579185 ]
> >
> > Edward Yoon commented on MAHOUT-16:
> > -----------------------------------
> >
> > I'm not sure whether hama will become a contrib of the mahout, If
> > proposal goes through the mahout PMC i would ask for a contrib
> > committer privilege of the mahout project to enable me to manage our
> > project and our members and our issues. I would appreciate any
> > advice you could give me.
> >
> >
> >> Hama contrib package for the mahout
> >> -----------------------------------
> >>
> >>                Key: MAHOUT-16
> >>                URL: https://issues.apache.org/jira/browse/MAHOUT-16
> >>            Project: Mahout
> >>         Issue Type: New Feature
> >>        Environment: All environment
> >>           Reporter: Edward Yoon
> >>        Attachments: hama.tar.gz
> >>
> >>
> >> *Introduction*
> >> Hama will develop a high-performance and large-scale parallel
> >> matrix computational package based on Hadoop Map/Reduce. It will be
> >> useful for a massively large-scale Numerical Analysis and Data
> >> Mining, which need the intensive computation power of matrix
> >> inversion, e.g. linear regression, PCA, SVM and etc. It will be
> >> also useful for many scientific applications, e.g. physics
> >> computations, linear algebra, computational fluid dynamics,
> >> statistics, graphic rendering and many more.
> >> Hama approach proposes the use of 3-dimensional Row and Column
> >> (Qualifier), Time space and multi-dimensional Columnfamilies of
> >> Hbase (BigTable Clone), which is able to store large sparse and
> >> various type of matrices (e.g. Triangular Matrix, 3D Matrix, and
> >> etc.). its auto-partitioned sparsity sub-structure will be
> >> efficiently managed and serviced by Hbase. Row and Column
> >> operations can be done in linear-time, where several algorithms,
> >> such as structured Gaussian elimination or iterative methods, run
> >> in O(the number of non-zero elements in the matrix / number of
> >> mappers) time on Hadoop Map/Reduce.
> >> So, it has a strong relationship with the mahout project, and it
> >> would be great if the "hama" can become a contrib project of the
> >> mahout.
> >> *Current Status*
> >> In its current state, the 'hama' is buggy and needs filling out,
> >> but generalized matrix interface and basic linear algebra
> >> operations was implemented within a large prototype system. In the
> >> future, We need new parallel algorithms based on Map/Reduce for
> >> performance of heavy decompositions and factorizations. It also
> >> needs tools to compose an arbitrary matrix only with certain data
> >> filtered from hbase array structure.
> >> It would be great if we can collaboration with the mahout members.
> >> *Members*
> >> The initial set of committers includes folks from the Hadoop &
> >> Hbase communities, and We have a master's (or Ph.D) degrees in the
> >> mathematics and computer science.
> >> - Edward Yoon (edward AT udanax DOT org)
> >> - Chanwit Kaewkasi (chanwit AT gmail DOT com)
> >> - Min Cha (minslovey AT gmail DOT com)
> >> - Antonio Suh (bluesvm AT gmail DOT com)
> >> At least, I and Min Cha will be involved full-time with this work.
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
>
>
>
>
>
>


-- 
B. Regards,
Edward yoon @ NHN, corp.