You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Ted Dunning (JIRA)" <ji...@apache.org> on 2012/10/08 22:52:04 UTC

[jira] [Created] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Ted Dunning created MAHOUT-1091:
-----------------------------------

             Summary: Bug in SequentialAccessSparseVector full iteration
                 Key: MAHOUT-1091
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
             Project: Mahout
          Issue Type: Bug
          Components: Math
            Reporter: Ted Dunning
             Fix For: 0.8


The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paritosh Ranjan updated MAHOUT-1091:
------------------------------------

    Attachment: MAHOUT-1091.patch

Just tried out of curiosity. The patch contains the fix and a junit test demonstrating the fix. 

The math module's test cases pass after this fix. Have not checked the complete build yet. 
                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472584#comment-13472584 ] 

Paritosh Ranjan commented on MAHOUT-1091:
-----------------------------------------

There is a test case in the patch which shows it can catch elements after the last non-zero. 
I hope the test case is correct. If not, can you point me to the error?
                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473863#comment-13473863 ] 

Paritosh Ranjan commented on MAHOUT-1091:
-----------------------------------------

yes, go ahead.



                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472593#comment-13472593 ] 

Ted Dunning commented on MAHOUT-1091:
-------------------------------------

The test that I saw in your patch creates a sparse vector that is MAX_INTEGER in size, but you only verify that the iterator goes through the first few elements.  It is better to create a smaller vector and verify that it scans through all of the elements, both zero and non-zero.

Take a look at the test in my patch.
                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning updated MAHOUT-1091:
--------------------------------

    Attachment: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch

Here is a patch relative to MAHOUT-1086 latest patch.  It includes a test that demonstrates the problem as well as code to repair it.
                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning resolved MAHOUT-1091.
---------------------------------

    Resolution: Fixed

Committed this change.
                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472167#comment-13472167 ] 

Paritosh Ranjan commented on MAHOUT-1091:
-----------------------------------------

The patch I submitted would fail if there are 0 non-zero elements. Still, it was interesting to look into it.
                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471891#comment-13471891 ] 

Paritosh Ranjan edited comment on MAHOUT-1091 at 10/8/12 11:01 PM:
-------------------------------------------------------------------

Just tried out of curiosity. The patch contains the fix and a junit test demonstrating the fix. 

All mahout test cases pass with this fix.
                
      was (Author: paritoshranjan):
    Just tried out of curiosity. The patch contains the fix and a junit test demonstrating the fix. 

The math module's test cases pass after this fix. Have not checked the complete build yet. 
                  
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473946#comment-13473946 ] 

Hudson commented on MAHOUT-1091:
--------------------------------

Integrated in Mahout-Quality #1697 (See [https://builds.apache.org/job/Mahout-Quality/1697/])
    MAHOUT-1091 - Add test to demonstrate broken iterator in SequentialAccessSparseVector (and add fix) (Revision 1396920)

     Result = SUCCESS
tdunning : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396920
Files : 
* /mahout/trunk/math/src/main/java/org/apache/mahout/math/SequentialAccessSparseVector.java
* /mahout/trunk/math/src/test/java/org/apache/mahout/math/AbstractVectorTest.java

                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472614#comment-13472614 ] 

Paritosh Ranjan commented on MAHOUT-1091:
-----------------------------------------

Yes, got the error. The test case is failing now. The way I was creating the SequentialAccessSparseVector was faulty, as I was not using the cardinality and size of the vector properly.

Thanks for clearing the doubts.
                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472599#comment-13472599 ] 

Ted Dunning commented on MAHOUT-1091:
-------------------------------------

For instance, this test:
{code}
    public void testIterators() {
      final T v0 = vectorToTest(20);

      double sum = 0;
      int elements = 0;
      int nonZero = 0;
      for (Vector.Element element : v0) {
        elements++;
        sum += element.get();
        if (element.get() != 0) {
          nonZero++;
        }
      }
  
      int nonZeroIterated = 0;
      final Iterator<Vector.Element> i = v0.iterateNonZero();
      while (i.hasNext()) {
        i.next();
        nonZeroIterated++;
      }
      assertEquals(20, elements);
      assertEquals(v0.size(), elements);
      assertEquals(nonZeroIterated, nonZero);
      assertEquals(v0.zSum(), sum, 0);
    }
{code}
Note how the iteration test verifies that it sees all the elements, but sees only the non-zeros in the non-zero iteration.
                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Dunning updated MAHOUT-1091:
--------------------------------

    Attachment: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch

Ooops.  Correct patch is here.
                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472554#comment-13472554 ] 

Ted Dunning commented on MAHOUT-1091:
-------------------------------------

The patch you submitted also didn't catch all the elements after the last non-zero.

                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-1091) Bug in SequentialAccessSparseVector full iteration

Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473755#comment-13473755 ] 

Ted Dunning commented on MAHOUT-1091:
-------------------------------------

paritosh,

I don't know your timezone, but if I don't hear anything else on this, I will commit shortly.
                
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
>                 Key: MAHOUT-1091
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1091
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>            Reporter: Ted Dunning
>             Fix For: 0.8
>
>         Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero.  This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira