You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Wangda Tan (Created) (JIRA)" <ji...@apache.org> on 2011/11/11 09:40:51 UTC

[jira] [Created] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
----------------------------------------------------------------------------------------

                 Key: MAHOUT-880
                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
             Project: Mahout
          Issue Type: New Feature
          Components: Math
    Affects Versions: 0.6
            Reporter: Wangda Tan
            Priority: Minor


I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
1) Addition, Subtraction 
2) Norm (like norm-1, norm-2, norm-frobenius)
3) Matrix compare
4) Get lower triangle, upper triangle and diagonal
5) Get identity and zero matrix
6) Put two or matrix to together: A = [A1, A2]
7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by Raphael Cendrillon <ce...@gmail.com>.
Hi Jake,

If you have a chance could you take a look at the new version of the diff
at:

reviews.apache.org/r/2955/

Thanks!

Re: [jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by Dan Brickley <da...@danbri.org>.
On 2 December 2011 19:31, Raphael Cendrillon <ce...@gmail.com> wrote:
> Is this something people would find useful?
>
> How would you like to sparsify the matrix? Using a threshold, or something else like target number of elements per row?

I can't yet swear hand-on-heart that I need this (I was thinking
threshold btw), but here's the path that led me to think it might be
useful:

I first made some nice practical use of RowSimilarityJob with a sparse
matrix of book rows * subject code columns. Later I tried a similar
dataset, but first tried pre-processing it with dimension reduction
(Lanczos in this case). However the reduced form of my data as it came
out of Lanczos was a full matrix. From a quick poke into the data it
looked like it still had a lot of zeros in it, but I didn't yet do the
work to confirm that it could usefully be turned back into sparse
form. Or even count the zeros or near-zeros.

If the scenario makes sense to others, in terms of plugging together
pieces of Mahout, it might be worthwhile. But I don't want to request
it without more experience / experimentation. Does it sound plausible
/ useful?

Dan

> On Dec 2, 2011, at 10:04 AM, Ted Dunning <te...@gmail.com> wrote:
>
>> No.
>>
>> On Fri, Dec 2, 2011 at 4:03 AM, Dan Brickley (Commented) (JIRA) <
>> jira@apache.org> wrote:

>>> https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161562#comment-13161562]
>>>
>>> Dan Brickley commented on MAHOUT-880:
>>> -------------------------------------
>>>
>>> Does Mahout yet have a method to take a large full matrix, and convert it
>>> sparse matrix format (losing zero values or perhaps if it makes sense,
>>> near-zero values also...)?

Re: [jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by Raphael Cendrillon <ce...@gmail.com>.
Is this something people would find useful?

How would you like to sparsify the matrix? Using a threshold, or something else like target number of elements per row?

On Dec 2, 2011, at 10:04 AM, Ted Dunning <te...@gmail.com> wrote:

> No.
> 
> On Fri, Dec 2, 2011 at 4:03 AM, Dan Brickley (Commented) (JIRA) <
> jira@apache.org> wrote:
> 
>> 
>>   [
>> https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161562#comment-13161562]
>> 
>> Dan Brickley commented on MAHOUT-880:
>> -------------------------------------
>> 
>> Does Mahout yet have a method to take a large full matrix, and convert it
>> sparse matrix format (losing zero values or perhaps if it makes sense,
>> near-zero values also...)?
>> 
>>> Add some matrix method(like addition, subtraction, norm ... etc) to
>> DistributedRowMatrix
>>> 
>> ----------------------------------------------------------------------------------------
>>> 
>>>                Key: MAHOUT-880
>>>                URL: https://issues.apache.org/jira/browse/MAHOUT-880
>>>            Project: Mahout
>>>         Issue Type: New Feature
>>>         Components: Math
>>>   Affects Versions: 0.6
>>>           Reporter: Wangda Tan
>>>           Priority: Minor
>>>             Labels: DistributedRowMatrix
>>>        Attachments: MAHOUT-880.patch
>>> 
>>> 
>>> I'm a new to Mahout, I didn't find some basic matrix functions. This
>> make users cannot do many tasks by CLI or API, if user get some result
>> through existing map-reduce matrix operation (like svd), he cannot do
>> farther steps. I make a list for it:
>>> 1) Addition, Subtraction
>>> 2) Norm (like norm-1, norm-2, norm-frobenius)
>>> 3) Matrix compare
>>> 4) Get lower triangle, upper triangle and diagonal
>>> 5) Get identity and zero matrix
>>> 6) Put two or matrix to together: A = [A1, A2]
>>> 7) More linear equations solver method, like Gaussian elimination (maybe
>> it's hard to implement)
>>> 8) import and export CSV, ARFF ... (this will very useful when user want
>> to reuse result from or to other applications like MATLAB)
>>> I want to know is there any plan to do this, if so, I can make some
>> efforts to implement these.
>> 
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators:
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>> 
>> 
>> 

Re: [jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by Ted Dunning <te...@gmail.com>.
No.

On Fri, Dec 2, 2011 at 4:03 AM, Dan Brickley (Commented) (JIRA) <
jira@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161562#comment-13161562]
>
> Dan Brickley commented on MAHOUT-880:
> -------------------------------------
>
> Does Mahout yet have a method to take a large full matrix, and convert it
> sparse matrix format (losing zero values or perhaps if it makes sense,
> near-zero values also...)?
>
> > Add some matrix method(like addition, subtraction, norm ... etc) to
> DistributedRowMatrix
> >
> ----------------------------------------------------------------------------------------
> >
> >                 Key: MAHOUT-880
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
> >             Project: Mahout
> >          Issue Type: New Feature
> >          Components: Math
> >    Affects Versions: 0.6
> >            Reporter: Wangda Tan
> >            Priority: Minor
> >              Labels: DistributedRowMatrix
> >         Attachments: MAHOUT-880.patch
> >
> >
> > I'm a new to Mahout, I didn't find some basic matrix functions. This
> make users cannot do many tasks by CLI or API, if user get some result
> through existing map-reduce matrix operation (like svd), he cannot do
> farther steps. I make a list for it:
> > 1) Addition, Subtraction
> > 2) Norm (like norm-1, norm-2, norm-frobenius)
> > 3) Matrix compare
> > 4) Get lower triangle, upper triangle and diagonal
> > 5) Get identity and zero matrix
> > 6) Put two or matrix to together: A = [A1, A2]
> > 7) More linear equations solver method, like Gaussian elimination (maybe
> it's hard to implement)
> > 8) import and export CSV, ARFF ... (this will very useful when user want
> to reuse result from or to other applications like MATLAB)
> > I want to know is there any plan to do this, if so, I can make some
> efforts to implement these.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160739#comment-13160739 ] 

jiraposter@reviews.apache.org commented on MAHOUT-880:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2955/
-----------------------------------------------------------

(Updated 2011-12-01 08:39:37.868935)


Review request for Ted Dunning, Jake Mannix and Sebastian Schelter.


Changes
-------

A fair bit of refactoring. Added plus() and minus() methods for Matrix-Matrix and Matrix-Vector combinations. Renamed MatrixCovarianceJob() to TimesSelfJob() to improve clarity per Sebastian's suggestion. Moved vector argument to distributed cache and changed class to Vector per Jake's suggestion. Removed MatrixRowAverageJob.java for now.


Summary
-------

Jobs for matrix-vector addition, covariance matrix calculation and row average calculation in DistributedRowMatrix


This addresses bug MAHOUT-880.
    https://issues.apache.org/jira/browse/MAHOUT-880


Diffs (updated)
-----

  trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 1206431 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixMatrixElementwiseJob.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixVectorElementwiseJob.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/TimesSelfJob.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 1206431 

Diff: https://reviews.apache.org/r/2955/diff


Testing
-------

Junit tests for each job


Thanks,

Raphael


                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Issue Comment Edited] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by Raphael Cendrillon <ce...@gmail.com>.
Thanks Dmitry. That makes sense. Let me update the code and get back to you. 

On Dec 8, 2011, at 11:38 AM, "Dmitriy Lyubimov (Issue Comment Edited) (JIRA)" <ji...@apache.org> wrote:

> 
>    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165469#comment-13165469 ] 
> 
> Dmitriy Lyubimov edited comment on MAHOUT-880 at 12/8/11 7:36 PM:
> ------------------------------------------------------------------
> 
> Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 
> 
> Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 
> 
> this is a fundamental technique in SSVD (and seems to become even more prominent with PCA efficiency tricks).
> 
>      was (Author: dlyubimov):
>    Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 
> 
> Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 
> 
> this is a fundamental technique and SSVD (and seems to become even more prominent with PCA efficiency tricks).
> 
>> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
>> ----------------------------------------------------------------------------------------
>> 
>>                Key: MAHOUT-880
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-880
>>            Project: Mahout
>>         Issue Type: New Feature
>>         Components: Math
>>   Affects Versions: 0.6
>>           Reporter: Wangda Tan
>>           Priority: Minor
>>             Labels: DistributedRowMatrix
>>        Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>> 
>> 
>> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
>> 1) Addition, Subtraction 
>> 2) Norm (like norm-1, norm-2, norm-frobenius)
>> 3) Matrix compare
>> 4) Get lower triangle, upper triangle and diagonal
>> 5) Get identity and zero matrix
>> 6) Put two or matrix to together: A = [A1, A2]
>> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
>> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
>> I want to know is there any plan to do this, if so, I can make some efforts to implement these.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 

Re: [jira] [Issue Comment Edited] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by Raphael Cendrillon <ce...@gmail.com>.
Hi Dmitry,

I've pulled this out as a separate issue under MAHOUT-923. Could you please take a look?

Thanks!

On Dec 8, 2011, at 11:38 AM, "Dmitriy Lyubimov (Issue Comment Edited) (JIRA)" <ji...@apache.org> wrote:

> 
>    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165469#comment-13165469 ] 
> 
> Dmitriy Lyubimov edited comment on MAHOUT-880 at 12/8/11 7:36 PM:
> ------------------------------------------------------------------
> 
> Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 
> 
> Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 
> 
> this is a fundamental technique in SSVD (and seems to become even more prominent with PCA efficiency tricks).
> 
>      was (Author: dlyubimov):
>    Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 
> 
> Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 
> 
> this is a fundamental technique and SSVD (and seems to become even more prominent with PCA efficiency tricks).
> 
>> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
>> ----------------------------------------------------------------------------------------
>> 
>>                Key: MAHOUT-880
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-880
>>            Project: Mahout
>>         Issue Type: New Feature
>>         Components: Math
>>   Affects Versions: 0.6
>>           Reporter: Wangda Tan
>>           Priority: Minor
>>             Labels: DistributedRowMatrix
>>        Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>> 
>> 
>> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
>> 1) Addition, Subtraction 
>> 2) Norm (like norm-1, norm-2, norm-frobenius)
>> 3) Matrix compare
>> 4) Get lower triangle, upper triangle and diagonal
>> 5) Get identity and zero matrix
>> 6) Put two or matrix to together: A = [A1, A2]
>> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
>> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
>> I want to know is there any plan to do this, if so, I can make some efforts to implement these.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 

[jira] [Issue Comment Edited] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Dmitriy Lyubimov (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165469#comment-13165469 ] 

Dmitriy Lyubimov edited comment on MAHOUT-880 at 12/8/11 7:36 PM:
------------------------------------------------------------------

Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 

Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 

this is a fundamental technique in SSVD (and seems to become even more prominent with PCA efficiency tricks).
                
      was (Author: dlyubimov):
    Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 

Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 

this is a fundamental technique and SSVD (and seems to become even more prominent with PCA efficiency tricks).
                  
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Wangda Tan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159110#comment-13159110 ] 

Wangda Tan commented on MAHOUT-880:
-----------------------------------

Great work!
I'm working on the norm job, I try to finish it ASAP
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Lance Norskog (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161394#comment-13161394 ] 

Lance Norskog commented on MAHOUT-880:
--------------------------------------

Another problem I've seen in some places is to just pick one of the values when there is an overlap. Options would be to pick the left one, or randomly choose one.
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Raphael Cendrillon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162254#comment-13162254 ] 

Raphael Cendrillon commented on MAHOUT-880:
-------------------------------------------

I'm thinking of building this out a bit more, however first I'd be interested to hear people's thoughts on this, what methods you would find useful for DistributedRowMatrix, and your own use cases.

Personally I've found that the DistributedRowMatrix and MatrixMultiplicationJob classes provide a great foundation for writing MapReduce jobs involving matrices. I think adding a few basic matrix operations, as suggested by Wangda, could be very helpful so that its not necessary to reinvent the wheel / write MapReduce jobs from scratch when doing common linear operations. I also find that being able to do things like matrixA.times(matrixB) makes it easy to quickly build a process by chaining together MR jobs in a very readable form.

I'd be very interested to hear other people's thoughts on this.
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch, MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159090#comment-13159090 ] 

Jake Mannix commented on MAHOUT-880:
------------------------------------

Hi Raphael, 

  Can you create a reviewboard request for this ticket?  (See MAHOUT-888 for details on how)
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by Lance Norskog <go...@gmail.com>.
I was recently looking through code (I think in text vectors) where code
merged very sparse term vectors. If there was a collision, it always picked
the first one. The assumption was that they never happened, so it did not
matter what it did. For symboic vectors, I can see the virtue of randomly
picking one rather than doing arithmetic.

On Thu, Dec 1, 2011 at 7:41 PM, Raphael Cendrillon <cendrillon1978@gmail.com
> wrote:

> Thanks. Thats interesting. In what kind of algorithms have you seen a need
> for this?
>
> If I understand correctly you'd like to randomly pick between the two
> elements, say with equal probability? I think this wouldn't be too
> difficult to implement within the current framework.
>
> By the way, if you're interested in doing a quick review of the code it
> would be really appreciated! It's up on the reviewboard at
> https://reviews.apache.org/r/2955/diff/2/
>
> On 1 Dec, 2011, at 7:30 PM, "Lance Norskog (Commented) (JIRA)" <
> jira@apache.org> wrote:
>
> >
> >    [
> https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161398#comment-13161398]
> >
> > Lance Norskog commented on MAHOUT-880:
> > --------------------------------------
> >
> > Oops sorry. This is about the set of pairwise operators available when
> you combine two or more matrices: plus, minus, mean, etc. Another use case
> is to just use one of the values.
> >
> >> Add some matrix method(like addition, subtraction, norm ... etc) to
> DistributedRowMatrix
> >>
> ----------------------------------------------------------------------------------------
> >>
> >>                Key: MAHOUT-880
> >>                URL: https://issues.apache.org/jira/browse/MAHOUT-880
> >>            Project: Mahout
> >>         Issue Type: New Feature
> >>         Components: Math
> >>   Affects Versions: 0.6
> >>           Reporter: Wangda Tan
> >>           Priority: Minor
> >>             Labels: DistributedRowMatrix
> >>        Attachments: MAHOUT-880.patch
> >>
> >>
> >> I'm a new to Mahout, I didn't find some basic matrix functions. This
> make users cannot do many tasks by CLI or API, if user get some result
> through existing map-reduce matrix operation (like svd), he cannot do
> farther steps. I make a list for it:
> >> 1) Addition, Subtraction
> >> 2) Norm (like norm-1, norm-2, norm-frobenius)
> >> 3) Matrix compare
> >> 4) Get lower triangle, upper triangle and diagonal
> >> 5) Get identity and zero matrix
> >> 6) Put two or matrix to together: A = [A1, A2]
> >> 7) More linear equations solver method, like Gaussian elimination
> (maybe it's hard to implement)
> >> 8) import and export CSV, ARFF ... (this will very useful when user
> want to reuse result from or to other applications like MATLAB)
> >> I want to know is there any plan to do this, if so, I can make some
> efforts to implement these.
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
>



-- 
Lance Norskog
goksron@gmail.com

Re: [jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by Raphael Cendrillon <ce...@gmail.com>.
Thanks. Thats interesting. In what kind of algorithms have you seen a need for this?

If I understand correctly you'd like to randomly pick between the two elements, say with equal probability? I think this wouldn't be too difficult to implement within the current framework. 

By the way, if you're interested in doing a quick review of the code it would be really appreciated! It's up on the reviewboard at https://reviews.apache.org/r/2955/diff/2/

On 1 Dec, 2011, at 7:30 PM, "Lance Norskog (Commented) (JIRA)" <ji...@apache.org> wrote:

> 
>    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161398#comment-13161398 ] 
> 
> Lance Norskog commented on MAHOUT-880:
> --------------------------------------
> 
> Oops sorry. This is about the set of pairwise operators available when you combine two or more matrices: plus, minus, mean, etc. Another use case is to just use one of the values.
> 
>> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
>> ----------------------------------------------------------------------------------------
>> 
>>                Key: MAHOUT-880
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-880
>>            Project: Mahout
>>         Issue Type: New Feature
>>         Components: Math
>>   Affects Versions: 0.6
>>           Reporter: Wangda Tan
>>           Priority: Minor
>>             Labels: DistributedRowMatrix
>>        Attachments: MAHOUT-880.patch
>> 
>> 
>> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
>> 1) Addition, Subtraction 
>> 2) Norm (like norm-1, norm-2, norm-frobenius)
>> 3) Matrix compare
>> 4) Get lower triangle, upper triangle and diagonal
>> 5) Get identity and zero matrix
>> 6) Put two or matrix to together: A = [A1, A2]
>> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
>> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
>> I want to know is there any plan to do this, if so, I can make some efforts to implement these.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Lance Norskog (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161398#comment-13161398 ] 

Lance Norskog commented on MAHOUT-880:
--------------------------------------

Oops sorry. This is about the set of pairwise operators available when you combine two or more matrices: plus, minus, mean, etc. Another use case is to just use one of the values.
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Raphael Cendrillon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raphael Cendrillon updated MAHOUT-880:
--------------------------------------

    Attachment: MAHOUT-880.patch
    
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159097#comment-13159097 ] 

jiraposter@reviews.apache.org commented on MAHOUT-880:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2955/
-----------------------------------------------------------

Review request for Jake Mannix.


Summary
-------

Jobs for matrix-vector addition, covariance matrix calculation and row average calculation in DistributedRowMatrix


This addresses bug MAHOUT-880.
    https://issues.apache.org/jira/browse/MAHOUT-880


Diffs
-----

  trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 1206431 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixCovarianceJob.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowAverageJob.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixVectorAdditionJob.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 1206431 

Diff: https://reviews.apache.org/r/2955/diff


Testing
-------

Junit tests for each job


Thanks,

Raphael


                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161854#comment-13161854 ] 

jiraposter@reviews.apache.org commented on MAHOUT-880:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2955/
-----------------------------------------------------------

(Updated 2011-12-02 21:04:46.828990)


Review request for mahout, Ted Dunning, Jake Mannix, and Sebastian Schelter.


Summary
-------

Jobs for matrix-vector addition, covariance matrix calculation and row average calculation in DistributedRowMatrix


This addresses bug MAHOUT-880.
    https://issues.apache.org/jira/browse/MAHOUT-880


Diffs
-----

  trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 1206431 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixMatrixElementwiseJob.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixVectorElementwiseJob.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/TimesSelfJob.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 1206431 

Diff: https://reviews.apache.org/r/2955/diff


Testing
-------

Junit tests for each job


Thanks,

Raphael


                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Wangda Tan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164940#comment-13164940 ] 

Wangda Tan commented on MAHOUT-880:
-----------------------------------

Hi Ted,
Thanks for your reply, I'll take a look at it
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Raphael Cendrillon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raphael Cendrillon updated MAHOUT-880:
--------------------------------------

    Attachment: MAHOUT-880.patch
    
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch, MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159155#comment-13159155 ] 

jiraposter@reviews.apache.org commented on MAHOUT-880:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2955/#review3552
-----------------------------------------------------------


I'm not seeing the centering of the rows for the covariance computation.


trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixCovarianceJob.java
<https://reviews.apache.org/r/2955/#comment7923>

    Don't we have to center the rows for covariance? Am I missing something or do you assume that the data is already centered?


- Sebastian


On 2011-11-29 05:40:30, Raphael Cendrillon wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2955/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-11-29 05:40:30)
bq.  
bq.  
bq.  Review request for Jake Mannix.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Jobs for matrix-vector addition, covariance matrix calculation and row average calculation in DistributedRowMatrix
bq.  
bq.  
bq.  This addresses bug MAHOUT-880.
bq.      https://issues.apache.org/jira/browse/MAHOUT-880
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 1206431 
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixCovarianceJob.java PRE-CREATION 
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowAverageJob.java PRE-CREATION 
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixVectorAdditionJob.java PRE-CREATION 
bq.    trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 1206431 
bq.  
bq.  Diff: https://reviews.apache.org/r/2955/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Junit tests for each job
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Raphael
bq.  
bq.


                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Dmitriy Lyubimov (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165469#comment-13165469 ] 

Dmitriy Lyubimov commented on MAHOUT-880:
-----------------------------------------

Ideally to optimize this i guess DRM better have a notion that dimensions (or whatever other parameters inside solver) may not be initially known. When this happens, first operation in pipeline (whatever it happens to be) may also employ standard strategies to come up with those in the end. 

Similarly, there's a "post-step" strategy concept: using output and some additional parameters you can re-assemble required knowledge (such as mean or small result of multiplication) in post step by re-combining result of all reducers or separate factors of computation (if it happens to be a small product in the end). 

this is a fundamental technique and SSVD (and seems to become even more prominent with PCA efficiency tricks).
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Raphael Cendrillon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161396#comment-13161396 ] 

Raphael Cendrillon commented on MAHOUT-880:
-------------------------------------------

Hi Lance. Sorry, I don't follow you. Could you expand a bit on this? Is this in response to the issue regarding heavily loading the reducer or something else?
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Jake Mannix (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159075#comment-13159075 ] 

Jake Mannix commented on MAHOUT-880:
------------------------------------

many of these sound great, yes!

I'd have one suggestion, however: DistributedRowMatrix implements the interface VectorIterable, which the interface Matrix extends.  The methods you mention which are already in VectorIterable should just get pulled up into VectorIterable.  

Of course, it requires that we do some careful checking that someone who calls DistributedRowMatrix.minus(DenseMatrix) behaves sensibly.  I would imagine this case would be handled by the fact that there is no sensible reason why you would have a DistributedRowMatrix and a DenseMatrix of the exact same cardinalities (one fits in RAM, but the other needs to live on HDFS?).

Regarding some of these methods: 4) I'm not sure about - do we have uses for these?  If you have a DistributedRowMatrix: a humongous HDFS SequenceFile of Vectors, what exactly are you going to do with the upper triangle of it?  Diagonal I can see, I guess.  Extract a vector of the diagonal from the whole distributed matrix, sure.

6) is actually being looked at in MAHOUT-884

7) we like solvers, yes, but the methods don't go in our matrix classes, they go in separate solver classes, and take matrix (or DistributedRowMatrix) as inputs.

8) also is good and we'd always like more I/O hooks, but again, should be in other classes, and in some ways already 
exists: VectorDumper allows the option of dumping a DistributedRowMatrix from SequenceFile to CSV, and I think we have some support for ARFF as well, somewhere.
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163214#comment-13163214 ] 

jiraposter@reviews.apache.org commented on MAHOUT-880:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2955/
-----------------------------------------------------------

(Updated 2011-12-06 00:26:13.113561)


Review request for mahout, Ted Dunning, Jake Mannix, and Sebastian Schelter.


Changes
-------

Added jobs for calculating column-wise row average of a DistributedRowMatrix


Summary
-------

Jobs for matrix-vector addition, covariance matrix calculation and row average calculation in DistributedRowMatrix


This addresses bug MAHOUT-880.
    https://issues.apache.org/jira/browse/MAHOUT-880


Diffs (updated)
-----

  trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 1210678 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixMatrixElementwiseJob.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowMRJob.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixVectorElementwiseJob.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/TimesSelfJob.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 1210678 

Diff: https://reviews.apache.org/r/2955/diff


Testing
-------

Junit tests for each job


Thanks,

Raphael


                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159439#comment-13159439 ] 

jiraposter@reviews.apache.org commented on MAHOUT-880:
------------------------------------------------------



bq.  On 2011-11-29 08:41:06, Sebastian Schelter wrote:
bq.  > trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixCovarianceJob.java, line 119
bq.  > <https://reviews.apache.org/r/2955/diff/1/?file=60410#file60410line119>
bq.  >
bq.  >     Don't we have to center the rows for covariance? Am I missing something or do you assume that the data is already centered?

Thank you for the feedback Sebastian. 

You're right, we first need to center the rows. I should rename this Job to remove confusion. In general it is just meant to compute x.transpose().times(x).


- Raphael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2955/#review3552
-----------------------------------------------------------


On 2011-11-29 05:40:30, Raphael Cendrillon wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2955/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-11-29 05:40:30)
bq.  
bq.  
bq.  Review request for Jake Mannix.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Jobs for matrix-vector addition, covariance matrix calculation and row average calculation in DistributedRowMatrix
bq.  
bq.  
bq.  This addresses bug MAHOUT-880.
bq.      https://issues.apache.org/jira/browse/MAHOUT-880
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 1206431 
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixCovarianceJob.java PRE-CREATION 
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowAverageJob.java PRE-CREATION 
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixVectorAdditionJob.java PRE-CREATION 
bq.    trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 1206431 
bq.  
bq.  Diff: https://reviews.apache.org/r/2955/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Junit tests for each job
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Raphael
bq.  
bq.


                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Dan Brickley (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161562#comment-13161562 ] 

Dan Brickley commented on MAHOUT-880:
-------------------------------------

Does Mahout yet have a method to take a large full matrix, and convert it sparse matrix format (losing zero values or perhaps if it makes sense, near-zero values also...)?
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Raphael Cendrillon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159063#comment-13159063 ] 

Raphael Cendrillon commented on MAHOUT-880:
-------------------------------------------

I also think it could be useful to add support for a few more standard matrix operations to DistributedRowMatrix. Here's a patch with a few operations to start with. Is there broader interest in this?
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159440#comment-13159440 ] 

jiraposter@reviews.apache.org commented on MAHOUT-880:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2955/
-----------------------------------------------------------

(Updated 2011-11-29 18:44:49.585493)


Review request for Ted Dunning, Jake Mannix and Sebastian Schelter.


Summary
-------

Jobs for matrix-vector addition, covariance matrix calculation and row average calculation in DistributedRowMatrix


This addresses bug MAHOUT-880.
    https://issues.apache.org/jira/browse/MAHOUT-880


Diffs
-----

  trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 1206431 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixCovarianceJob.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowAverageJob.java PRE-CREATION 
  trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixVectorAdditionJob.java PRE-CREATION 
  trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 1206431 

Diff: https://reviews.apache.org/r/2955/diff


Testing
-------

Junit tests for each job


Thanks,

Raphael


                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159493#comment-13159493 ] 

jiraposter@reviews.apache.org commented on MAHOUT-880:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2955/#review3562
-----------------------------------------------------------



trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java
<https://reviews.apache.org/r/2955/#comment7976>

    I'm not sure about this method: you take in a DistributedRowMatrix, which by design is an big huge SequenceFile<IntWritable,VectorWritable>.  Why don't you just take in a Vector, put that in the DistributedCache (or even serialize it into the Configuration, if it's small enough), and use that?  
    
    Passing in a DistributedRowMatrix makes people assume you can put in a real full matrix.



trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowAverageJob.java
<https://reviews.apache.org/r/2955/#comment7977>

    This will force a huge bottleneck of one reducer, will it not?



trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowAverageJob.java
<https://reviews.apache.org/r/2955/#comment7978>

    I think we already have a VectorSummingReducer somewhere, we should re-use that.


- Jake


On 2011-11-29 18:44:49, Raphael Cendrillon wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2955/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-11-29 18:44:49)
bq.  
bq.  
bq.  Review request for Ted Dunning, Jake Mannix and Sebastian Schelter.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Jobs for matrix-vector addition, covariance matrix calculation and row average calculation in DistributedRowMatrix
bq.  
bq.  
bq.  This addresses bug MAHOUT-880.
bq.      https://issues.apache.org/jira/browse/MAHOUT-880
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 1206431 
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixCovarianceJob.java PRE-CREATION 
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowAverageJob.java PRE-CREATION 
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixVectorAdditionJob.java PRE-CREATION 
bq.    trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 1206431 
bq.  
bq.  Diff: https://reviews.apache.org/r/2955/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Junit tests for each job
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Raphael
bq.  
bq.


                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by Raphael Cendrillon <ce...@gmail.com>.
Hi Ted,

Would the SSVD be appropriate for this?

On Dec 7, 2011, at 5:58 PM, "Ted Dunning (Commented) (JIRA)" <ji...@apache.org> wrote:

> 
>    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164925#comment-13164925 ] 
> 
> Ted Dunning commented on MAHOUT-880:
> ------------------------------------
> 
> There are the beginnings of single machine out-of-core SVD operations in MAHOUT-792
> 
>> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
>> ----------------------------------------------------------------------------------------
>> 
>>                Key: MAHOUT-880
>>                URL: https://issues.apache.org/jira/browse/MAHOUT-880
>>            Project: Mahout
>>         Issue Type: New Feature
>>         Components: Math
>>   Affects Versions: 0.6
>>           Reporter: Wangda Tan
>>           Priority: Minor
>>             Labels: DistributedRowMatrix
>>        Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>> 
>> 
>> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
>> 1) Addition, Subtraction 
>> 2) Norm (like norm-1, norm-2, norm-frobenius)
>> 3) Matrix compare
>> 4) Get lower triangle, upper triangle and diagonal
>> 5) Get identity and zero matrix
>> 6) Put two or matrix to together: A = [A1, A2]
>> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
>> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
>> I want to know is there any plan to do this, if so, I can make some efforts to implement these.
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Ted Dunning (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164925#comment-13164925 ] 

Ted Dunning commented on MAHOUT-880:
------------------------------------

There are the beginnings of single machine out-of-core SVD operations in MAHOUT-792
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Raphael Cendrillon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159094#comment-13159094 ] 

Raphael Cendrillon commented on MAHOUT-880:
-------------------------------------------

I'll be glad to. Thanks.
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Dmitriy Lyubimov (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165462#comment-13165462 ] 

Dmitriy Lyubimov commented on MAHOUT-880:
-----------------------------------------

I think rowMeans approach is still suboptimal for my use case (MAHOUT-817). It is possible i don't understand something about DRM though. 

The DRM formation as a solver requires knowledge of num rows and num columns. This is technically never required for any operation in PCA (including colMeans() ) and in many cases also impractical as previous pipeline jobs don't necessarily calculate those. 

Nor does SSVD require preliminary knowledge of matrix dimensions.

Ideally, in PCA flow we want to compute pairs (numRows, sumRows) for each reducer output and then have a front-end routine to finish reducing that to just one mean row.
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Raphael Cendrillon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raphael Cendrillon updated MAHOUT-880:
--------------------------------------

    Comment: was deleted

(was: A few matrix-vector operations for DistributedRowMatrix)
    
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Raphael Cendrillon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raphael Cendrillon updated MAHOUT-880:
--------------------------------------

    Labels: DistributedRowMatrix  (was: )
    Status: Patch Available  (was: Open)

A few matrix-vector operations for DistributedRowMatrix
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160733#comment-13160733 ] 

jiraposter@reviews.apache.org commented on MAHOUT-880:
------------------------------------------------------



bq.  On 2011-11-29 19:56:51, Jake Mannix wrote:
bq.  > trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowAverageJob.java, line 116
bq.  > <https://reviews.apache.org/r/2955/diff/1/?file=60411#file60411line116>
bq.  >
bq.  >     This will force a huge bottleneck of one reducer, will it not?

Thanks for the feedback Jake, it's really appreciated!  I think the load will be distributed somewhat by the combiner at each node. Do you still think this will cause too much of a bottleneck?

Do you have any suggestions for a better way to implement this?


- Raphael


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2955/#review3562
-----------------------------------------------------------


On 2011-11-29 18:44:49, Raphael Cendrillon wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2955/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-11-29 18:44:49)
bq.  
bq.  
bq.  Review request for Ted Dunning, Jake Mannix and Sebastian Schelter.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Jobs for matrix-vector addition, covariance matrix calculation and row average calculation in DistributedRowMatrix
bq.  
bq.  
bq.  This addresses bug MAHOUT-880.
bq.      https://issues.apache.org/jira/browse/MAHOUT-880
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/DistributedRowMatrix.java 1206431 
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixCovarianceJob.java PRE-CREATION 
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixRowAverageJob.java PRE-CREATION 
bq.    trunk/core/src/main/java/org/apache/mahout/math/hadoop/MatrixVectorAdditionJob.java PRE-CREATION 
bq.    trunk/core/src/test/java/org/apache/mahout/math/hadoop/TestDistributedRowMatrix.java 1206431 
bq.  
bq.  Diff: https://reviews.apache.org/r/2955/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Junit tests for each job
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Raphael
bq.  
bq.


                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Raphael Cendrillon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raphael Cendrillon updated MAHOUT-880:
--------------------------------------

    Attachment: MAHOUT-880.patch
    
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Raphael Cendrillon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167309#comment-13167309 ] 

Raphael Cendrillon commented on MAHOUT-880:
-------------------------------------------

Thanks Dmitry. I've pulled the row mean job out as a separate issue under MAHOUT-923. Could you please take a look? 
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Raphael Cendrillon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159082#comment-13159082 ] 

Raphael Cendrillon commented on MAHOUT-880:
-------------------------------------------

Hi Jake. If you get a chance could you take a look through the attached patch? Your feedback would be great.
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAHOUT-880) Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix

Posted by "Wangda Tan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164918#comment-13164918 ] 

Wangda Tan commented on MAHOUT-880:
-----------------------------------

Hi Raphael,
I agree with you, DistributedRowMatrix is a very useful abstract component for us, we can add many useful operations on it, matrix multiplication and matrix transpose jobs are good examples.
I'm now working on the matrix norm, the norm-2 need svd operation, it's really expensive, is there any light weighted method can let us get the biggest singular value?
Thanks,
Wangda
                
> Add some matrix method(like addition, subtraction, norm ... etc) to DistributedRowMatrix
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-880
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-880
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Math
>    Affects Versions: 0.6
>            Reporter: Wangda Tan
>            Priority: Minor
>              Labels: DistributedRowMatrix
>         Attachments: MAHOUT-880.patch, MAHOUT-880.patch, MAHOUT-880.patch
>
>
> I'm a new to Mahout, I didn't find some basic matrix functions. This make users cannot do many tasks by CLI or API, if user get some result through existing map-reduce matrix operation (like svd), he cannot do farther steps. I make a list for it:
> 1) Addition, Subtraction 
> 2) Norm (like norm-1, norm-2, norm-frobenius)
> 3) Matrix compare
> 4) Get lower triangle, upper triangle and diagonal
> 5) Get identity and zero matrix
> 6) Put two or matrix to together: A = [A1, A2]
> 7) More linear equations solver method, like Gaussian elimination (maybe it's hard to implement)
> 8) import and export CSV, ARFF ... (this will very useful when user want to reuse result from or to other applications like MATLAB)
> I want to know is there any plan to do this, if so, I can make some efforts to implement these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira