You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Ted Dunning (JIRA)" <ji...@apache.org> on 2012/10/08 22:52:04 UTC
[jira] [Created] (MAHOUT-1091) Bug in SequentialAccessSparseVector
full iteration
Ted Dunning created MAHOUT-1091:
-----------------------------------
Summary: Bug in SequentialAccessSparseVector full iteration
Key: MAHOUT-1091
URL: https://issues.apache.org/jira/browse/MAHOUT-1091
Project: Mahout
Issue Type: Bug
Components: Math
Reporter: Ted Dunning
Fix For: 0.8
The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1091) Bug in SequentialAccessSparseVector
full iteration
Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paritosh Ranjan updated MAHOUT-1091:
------------------------------------
Attachment: MAHOUT-1091.patch
Just tried out of curiosity. The patch contains the fix and a junit test demonstrating the fix.
The math module's test cases pass after this fix. Have not checked the complete build yet.
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1091) Bug in
SequentialAccessSparseVector full iteration
Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472584#comment-13472584 ]
Paritosh Ranjan commented on MAHOUT-1091:
-----------------------------------------
There is a test case in the patch which shows it can catch elements after the last non-zero.
I hope the test case is correct. If not, can you point me to the error?
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1091) Bug in
SequentialAccessSparseVector full iteration
Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473863#comment-13473863 ]
Paritosh Ranjan commented on MAHOUT-1091:
-----------------------------------------
yes, go ahead.
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1091) Bug in
SequentialAccessSparseVector full iteration
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472593#comment-13472593 ]
Ted Dunning commented on MAHOUT-1091:
-------------------------------------
The test that I saw in your patch creates a sparse vector that is MAX_INTEGER in size, but you only verify that the iterator goes through the first few elements. It is better to create a smaller vector and verify that it scans through all of the elements, both zero and non-zero.
Take a look at the test in my patch.
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1091) Bug in SequentialAccessSparseVector
full iteration
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Dunning updated MAHOUT-1091:
--------------------------------
Attachment: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch
Here is a patch relative to MAHOUT-1086 latest patch. It includes a test that demonstrates the problem as well as code to repair it.
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAHOUT-1091) Bug in SequentialAccessSparseVector
full iteration
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Dunning resolved MAHOUT-1091.
---------------------------------
Resolution: Fixed
Committed this change.
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1091) Bug in
SequentialAccessSparseVector full iteration
Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472167#comment-13472167 ]
Paritosh Ranjan commented on MAHOUT-1091:
-----------------------------------------
The patch I submitted would fail if there are 0 non-zero elements. Still, it was interesting to look into it.
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (MAHOUT-1091) Bug in
SequentialAccessSparseVector full iteration
Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471891#comment-13471891 ]
Paritosh Ranjan edited comment on MAHOUT-1091 at 10/8/12 11:01 PM:
-------------------------------------------------------------------
Just tried out of curiosity. The patch contains the fix and a junit test demonstrating the fix.
All mahout test cases pass with this fix.
was (Author: paritoshranjan):
Just tried out of curiosity. The patch contains the fix and a junit test demonstrating the fix.
The math module's test cases pass after this fix. Have not checked the complete build yet.
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1091) Bug in
SequentialAccessSparseVector full iteration
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473946#comment-13473946 ]
Hudson commented on MAHOUT-1091:
--------------------------------
Integrated in Mahout-Quality #1697 (See [https://builds.apache.org/job/Mahout-Quality/1697/])
MAHOUT-1091 - Add test to demonstrate broken iterator in SequentialAccessSparseVector (and add fix) (Revision 1396920)
Result = SUCCESS
tdunning : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1396920
Files :
* /mahout/trunk/math/src/main/java/org/apache/mahout/math/SequentialAccessSparseVector.java
* /mahout/trunk/math/src/test/java/org/apache/mahout/math/AbstractVectorTest.java
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1091) Bug in
SequentialAccessSparseVector full iteration
Posted by "Paritosh Ranjan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472614#comment-13472614 ]
Paritosh Ranjan commented on MAHOUT-1091:
-----------------------------------------
Yes, got the error. The test case is failing now. The way I was creating the SequentialAccessSparseVector was faulty, as I was not using the cardinality and size of the vector properly.
Thanks for clearing the doubts.
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1091) Bug in
SequentialAccessSparseVector full iteration
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472599#comment-13472599 ]
Ted Dunning commented on MAHOUT-1091:
-------------------------------------
For instance, this test:
{code}
public void testIterators() {
final T v0 = vectorToTest(20);
double sum = 0;
int elements = 0;
int nonZero = 0;
for (Vector.Element element : v0) {
elements++;
sum += element.get();
if (element.get() != 0) {
nonZero++;
}
}
int nonZeroIterated = 0;
final Iterator<Vector.Element> i = v0.iterateNonZero();
while (i.hasNext()) {
i.next();
nonZeroIterated++;
}
assertEquals(20, elements);
assertEquals(v0.size(), elements);
assertEquals(nonZeroIterated, nonZero);
assertEquals(v0.zSum(), sum, 0);
}
{code}
Note how the iteration test verifies that it sees all the elements, but sees only the non-zeros in the non-zero iteration.
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAHOUT-1091) Bug in SequentialAccessSparseVector
full iteration
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ted Dunning updated MAHOUT-1091:
--------------------------------
Attachment: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch
Ooops. Correct patch is here.
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1091) Bug in
SequentialAccessSparseVector full iteration
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472554#comment-13472554 ]
Ted Dunning commented on MAHOUT-1091:
-------------------------------------
The patch you submitted also didn't catch all the elements after the last non-zero.
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAHOUT-1091) Bug in
SequentialAccessSparseVector full iteration
Posted by "Ted Dunning (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAHOUT-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473755#comment-13473755 ]
Ted Dunning commented on MAHOUT-1091:
-------------------------------------
paritosh,
I don't know your timezone, but if I don't hear anything else on this, I will commit shortly.
> Bug in SequentialAccessSparseVector full iteration
> --------------------------------------------------
>
> Key: MAHOUT-1091
> URL: https://issues.apache.org/jira/browse/MAHOUT-1091
> Project: Mahout
> Issue Type: Bug
> Components: Math
> Reporter: Ted Dunning
> Fix For: 0.8
>
> Attachments: 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, 0001-MAHOUT-1091-Add-test-to-demonstrate-broken-iterator-.patch, MAHOUT-1091.patch
>
>
> The iterator for the SequentialAccessSparseVector doesn't return any items beyond the last non-zero. This breaks some stuff pretty massively, but hopefully doesn't break much user code since iterating through all elements of a sparse vector is a relatively rare thing to do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira