You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Dmitriy Lyubimov (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/12/08 20:38:40 UTC

[jira] [Issue Comment Edited] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165469#comment-13165469 ] 

Dmitriy Lyubimov edited comment on MAHOUT-880 at 12/8/11 7:36 PM:
------------------------------------------------------------------

Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 

Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 

this is a fundamental technique in SSVD (and seems to become even more prominent with PCA efficiency tricks).
                
      was (Author: dlyubimov):
    Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 

Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 

this is a fundamental technique and SSVD (and seems to become even more prominent with PCA efficiency tricks).
                  
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Issue Comment Edited] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by Raphael Cendrillon <ce...@gmail.com>.
Thanks Dmitry. That makes sense. Let me update the code and get back to you. 

On Dec 8, 2011, at 11:38 AM, "Dmitriy Lyubimov (Issue Comment Edited) (JIRA)" <ji...@apache.org> wrote:

> 
>    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165469#comment-13165469 ] 
> 
> Dmitriy Lyubimov edited comment on MAHOUT-880 at 12/8/11 7:36 PM:
> ------------------------------------------------------------------
> 
> Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 
> 
> Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 
> 
> this is a fundamental technique in SSVD (and seems to become even more prominent with PCA efficiency tricks).
> 
>      was (Author: dlyubimov):
>    Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 
> 
> Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 
> 
> this is a fundamental technique and SSVD (and seems to become even more prominent with PCA efficiency tricks).
> 
>> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
>> ----------------------------------------------------------------------------------------
>> 
>>                Key: MAHOUT-880
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-880
>>            Project: Mahout
>>         Issue Type: New Feature
>>         Components: Math
>>   Affects Versions: 0.6
>>           Reporter: Wangda Tan
>>           Priority: Minor
>>             Labels: DistributedRowMatrix
>>        Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>> 
>> 
>> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
>> 1) Addition, Subtraction 
>> 2) Norm (like norm-1, norm-2, norm-frobenius)
>> 3) Matrix compare
>> 4) Get lower triangle, upper triangle and diagonal
>> 5) Get identity and zero matrix
>> 6) Put two or matrix to together: A = [A1, A2]
>> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
>> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
>> I want to know is there any plan to do this, if so, I can make some efforts to implement these.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 

Re: [jira] [Issue Comment Edited] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by Raphael Cendrillon <ce...@gmail.com>.
Hi Dmitry,

I've pulled this out as a separate issue under MAHOUT-923. Could you please take a look?

Thanks!

On Dec 8, 2011, at 11:38 AM, "Dmitriy Lyubimov (Issue Comment Edited) (JIRA)" <ji...@apache.org> wrote:

> 
>    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165469#comment-13165469 ] 
> 
> Dmitriy Lyubimov edited comment on MAHOUT-880 at 12/8/11 7:36 PM:
> ------------------------------------------------------------------
> 
> Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 
> 
> Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 
> 
> this is a fundamental technique in SSVD (and seems to become even more prominent with PCA efficiency tricks).
> 
>      was (Author: dlyubimov):
>    Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 
> 
> Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 
> 
> this is a fundamental technique and SSVD (and seems to become even more prominent with PCA efficiency tricks).
> 
>> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
>> ----------------------------------------------------------------------------------------
>> 
>>                Key: MAHOUT-880
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-880
>>            Project: Mahout
>>         Issue Type: New Feature
>>         Components: Math
>>   Affects Versions: 0.6
>>           Reporter: Wangda Tan
>>           Priority: Minor
>>             Labels: DistributedRowMatrix
>>        Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>> 
>> 
>> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
>> 1) Addition, Subtraction 
>> 2) Norm (like norm-1, norm-2, norm-frobenius)
>> 3) Matrix compare
>> 4) Get lower triangle, upper triangle and diagonal
>> 5) Get identity and zero matrix
>> 6) Put two or matrix to together: A = [A1, A2]
>> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
>> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
>> I want to know is there any plan to do this, if so, I can make some efforts to implement these.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
>