You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jake Mannix (JIRA)" <ji...@apache.org> on 2009/12/02 00:29:20 UTC

[jira] Created: (MAHOUT-211) DenseMatrix.getRow() and getColumn() return deep copies of the data

DenseMatrix.getRow() and getColumn() return deep copies of the data
-------------------------------------------------------------------

                 Key: MAHOUT-211
                 URL: https://issues.apache.org/jira/browse/MAHOUT-211
             Project: Mahout
          Issue Type: Bug
          Components: Matrix
    Affects Versions: 0.1
         Environment: all
            Reporter: Jake Mannix
            Priority: Minor
             Fix For: 0.3


As mentioned in the summary, DenseMatrix.getRow() returns a new DenseVector(values[row]) instead of just a shallow reference.  This is a) wasteful, and b) inconsistent with what SparseRowMatrix / SparseColumnMatrix do, which means that users who think they can mutate rows by calling getRow() are wrong for DenseMatrix, or conversely think they can modify the row at will are wrong for SparseRowMatrix.  

Either way, we need to be consistent.  I would suggest the contract be that you get a shallow view reference (meaning, it may be an actual reference to a real Vector, or it may be a virtual Vector, dynamically constructed, which is still backed by entries in the matrix), so that calling Matrix.getRow(row).set(col, value) has the same affect as Matrix.set(row, col, value).  If we don't do it this way, we should at least provide an api to get a shallow reference to rows.

Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (MAHOUT-211) DenseMatrix.getRow() and getColumn() return deep copies of the data

Posted by "Jake Mannix (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Mannix reassigned MAHOUT-211:
----------------------------------

    Assignee: Jake Mannix

> DenseMatrix.getRow() and getColumn() return deep copies of the data
> -------------------------------------------------------------------
>
>                 Key: MAHOUT-211
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-211
>             Project: Mahout
>          Issue Type: Bug
>          Components: Matrix
>    Affects Versions: 0.1
>         Environment: all
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Minor
>             Fix For: 0.3
>
>
> As mentioned in the summary, DenseMatrix.getRow() returns a new DenseVector(values[row]) instead of just a shallow reference.  This is a) wasteful, and b) inconsistent with what SparseRowMatrix / SparseColumnMatrix do, which means that users who think they can mutate rows by calling getRow() are wrong for DenseMatrix, or conversely think they can modify the row at will are wrong for SparseRowMatrix.  
> Either way, we need to be consistent.  I would suggest the contract be that you get a shallow view reference (meaning, it may be an actual reference to a real Vector, or it may be a virtual Vector, dynamically constructed, which is still backed by entries in the matrix), so that calling Matrix.getRow(row).set(col, value) has the same affect as Matrix.set(row, col, value).  If we don't do it this way, we should at least provide an api to get a shallow reference to rows.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-211) DenseMatrix.getRow() and getColumn() return deep copies of the data

Posted by "Jake Mannix (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784599#action_12784599 ] 

Jake Mannix commented on MAHOUT-211:
------------------------------------

I guess some thought should also be put into how much should be done to our current implementations, and how much should be done on MAHOUT-204 in the way of replacing our implementations with colt impls.

> DenseMatrix.getRow() and getColumn() return deep copies of the data
> -------------------------------------------------------------------
>
>                 Key: MAHOUT-211
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-211
>             Project: Mahout
>          Issue Type: Bug
>          Components: Matrix
>    Affects Versions: 0.1
>         Environment: all
>            Reporter: Jake Mannix
>            Priority: Minor
>             Fix For: 0.3
>
>
> As mentioned in the summary, DenseMatrix.getRow() returns a new DenseVector(values[row]) instead of just a shallow reference.  This is a) wasteful, and b) inconsistent with what SparseRowMatrix / SparseColumnMatrix do, which means that users who think they can mutate rows by calling getRow() are wrong for DenseMatrix, or conversely think they can modify the row at will are wrong for SparseRowMatrix.  
> Either way, we need to be consistent.  I would suggest the contract be that you get a shallow view reference (meaning, it may be an actual reference to a real Vector, or it may be a virtual Vector, dynamically constructed, which is still backed by entries in the matrix), so that calling Matrix.getRow(row).set(col, value) has the same affect as Matrix.set(row, col, value).  If we don't do it this way, we should at least provide an api to get a shallow reference to rows.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-211) DenseMatrix.getRow() and getColumn() return deep copies of the data

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784557#action_12784557 ] 

Ted Dunning commented on MAHOUT-211:
------------------------------------


The Colt approach is definitely to go full-bore on the shallow reference semantics.  In fact, things like sub-matrix and transpose primarily return views which allows all kinds of interesting compositional coding practices.  As a trivial example, there is no primitive in Colt for setting the diagonal of a matrix to zero.  There is a primitive for getting a view of the diagonal as a vector and there is a primitive for setting all elements of a vector to zero.  These can easily be composed to get the desired result.

In some areas, this style is extremely effective, but it can also be very confusing (there is Matrix.plus, for instance).  I have found that some users need a more traditional API for getting started, but that is easily done.

> DenseMatrix.getRow() and getColumn() return deep copies of the data
> -------------------------------------------------------------------
>
>                 Key: MAHOUT-211
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-211
>             Project: Mahout
>          Issue Type: Bug
>          Components: Matrix
>    Affects Versions: 0.1
>         Environment: all
>            Reporter: Jake Mannix
>            Priority: Minor
>             Fix For: 0.3
>
>
> As mentioned in the summary, DenseMatrix.getRow() returns a new DenseVector(values[row]) instead of just a shallow reference.  This is a) wasteful, and b) inconsistent with what SparseRowMatrix / SparseColumnMatrix do, which means that users who think they can mutate rows by calling getRow() are wrong for DenseMatrix, or conversely think they can modify the row at will are wrong for SparseRowMatrix.  
> Either way, we need to be consistent.  I would suggest the contract be that you get a shallow view reference (meaning, it may be an actual reference to a real Vector, or it may be a virtual Vector, dynamically constructed, which is still backed by entries in the matrix), so that calling Matrix.getRow(row).set(col, value) has the same affect as Matrix.set(row, col, value).  If we don't do it this way, we should at least provide an api to get a shallow reference to rows.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-211) DenseMatrix.getRow() and getColumn() return deep copies of the data

Posted by "Jake Mannix (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jake Mannix resolved MAHOUT-211.
--------------------------------

    Resolution: Fixed

committed an initial pass of this in r901692

> DenseMatrix.getRow() and getColumn() return deep copies of the data
> -------------------------------------------------------------------
>
>                 Key: MAHOUT-211
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-211
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.1
>         Environment: all
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>            Priority: Minor
>             Fix For: 0.3
>
>
> As mentioned in the summary, DenseMatrix.getRow() returns a new DenseVector(values[row]) instead of just a shallow reference.  This is a) wasteful, and b) inconsistent with what SparseRowMatrix / SparseColumnMatrix do, which means that users who think they can mutate rows by calling getRow() are wrong for DenseMatrix, or conversely think they can modify the row at will are wrong for SparseRowMatrix.  
> Either way, we need to be consistent.  I would suggest the contract be that you get a shallow view reference (meaning, it may be an actual reference to a real Vector, or it may be a virtual Vector, dynamically constructed, which is still backed by entries in the matrix), so that calling Matrix.getRow(row).set(col, value) has the same affect as Matrix.set(row, col, value).  If we don't do it this way, we should at least provide an api to get a shallow reference to rows.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.