You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2009/01/19 20:02:01 UTC
[jira] Created: (HBASE-1136) HashFunction inadvertently destroys
some randomness
HashFunction inadvertently destroys some randomness
---------------------------------------------------
Key: HBASE-1136
URL: https://issues.apache.org/jira/browse/HBASE-1136
Project: Hadoop HBase
Issue Type: Bug
Reporter: Jonathan Ellis
the code
for (int i = 0, initval = 0; i < nbHash; i++) {
initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
}
restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
for (int i = 0, initval = 0; i < nbHash; i++) {
initval = hashFunction.hash(b, initval);
result[i] = Math.abs(initval) % maxValue;
}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665245#action_12665245 ]
Jonathan Ellis commented on HBASE-1136:
---------------------------------------
done: https://issues.apache.org/jira/browse/HADOOP-5079
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Fix For: 0.20.0
>
> Attachments: hash.patch
>
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated HBASE-1136:
----------------------------------
Status: Open (was: Patch Available)
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated HBASE-1136:
----------------------------------
Attachment: hbase-testfilter.patch
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Fix For: 0.20.0
>
> Attachments: hash.patch, hbase-testfilter.patch
>
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Reopened: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack reopened HBASE-1136:
--------------------------
Reopening. TestFilter fails. Reverted patch. Can you fix Jonathan?
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Fix For: 0.20.0
>
> Attachments: hash.patch
>
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-1136:
-------------------------
Resolution: Fixed
Fix Version/s: 0.20.0
Status: Resolved (was: Patch Available)
Committed to TRUNK. Thanks for the patch Jonathan.
Please open a new patch against Hadoop with a patch for src/core/org/apache/hadoop/util/bloom/HashFunction.java. HashFunction was moved there by HADOOP-3063 (We'll move to using the hadoop versions in next hbase). I can commit it after letting it air for a while in case objection.
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Fix For: 0.20.0
>
> Attachments: hash.patch
>
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-1136:
-------------------------
Status: Patch Available (was: Open)
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Attachments: hash.patch
>
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665293#action_12665293 ]
Jonathan Ellis commented on HBASE-1136:
---------------------------------------
How do I run just TestFilter, and how do I get it to tell me what the failure details are?
[junit] Running org.onelab.test.TestFilter
[junit] Tests run: 3, Failures: 2, Errors: 0, Time elapsed: 0.019 sec
[junit] Test org.onelab.test.TestFilter FAILED
not super helpful. :)
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Fix For: 0.20.0
>
> Attachments: hash.patch
>
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665307#action_12665307 ]
Jonathan Ellis commented on HBASE-1136:
---------------------------------------
Here is a patch that fixes the test.
I also took out the 'or' test because or-ing two CBFs isn't valid... if you take 0x1|0x1 you get 0x1, but really the combined count should be 2.
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Fix For: 0.20.0
>
> Attachments: hash.patch, hbase-testfilter.patch
>
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665295#action_12665295 ]
stack commented on HBASE-1136:
------------------------------
Sure, np. If you look in the build/test dir.. there is a dump of the test output. Or, "ant -Dtestcase=TestFilter -Dtest.output=yes test" will run just that test and output to console
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Fix For: 0.20.0
>
> Attachments: hash.patch
>
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665298#action_12665298 ]
Jonathan Ellis commented on HBASE-1136:
---------------------------------------
oh, I see what is going on. it's a bad test.
Key key = new StringKey("toto");
Key k2 = new StringKey("lulu");
Key k3 = new StringKey("mama");
bf.add(key);
bf.add(k2);
bf.add(k3);
assertTrue(bf.membershipTest(key));
assertTrue(bf.membershipTest(new StringKey("graknyl")));
graknyl was never added. it's relying on the implementation details of the old (broken) HashFunction.
do you want me to just rip stuff like that out?
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Fix For: 0.20.0
>
> Attachments: hash.patch
>
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665299#action_12665299 ]
stack commented on HBASE-1136:
------------------------------
Do please. I did that myself earlier but then, after changing the two instances, it was failing later where a key was reported not present when it should have been. Thanks Jonathan.
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Fix For: 0.20.0
>
> Attachments: hash.patch
>
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated HBASE-1136:
----------------------------------
Status: Patch Available (was: Open)
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack resolved HBASE-1136.
--------------------------
Resolution: Fixed
Committed (again). Thanks for the fixup Jonathan.
There is no unit test up in Hadoop.
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Fix For: 0.20.0
>
> Attachments: hash.patch, hbase-testfilter.patch
>
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1136) HashFunction inadvertently destroys
some randomness
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated HBASE-1136:
----------------------------------
Attachment: hash.patch
patch against r735771
> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
> Key: HBASE-1136
> URL: https://issues.apache.org/jira/browse/HBASE-1136
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: Jonathan Ellis
> Attachments: hash.patch
>
>
> the code
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
> }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned. This is suboptimal, particularly for larger nbHash and smaller maxValue. Instead, use:
> for (int i = 0, initval = 0; i < nbHash; i++) {
> initval = hashFunction.hash(b, initval);
> result[i] = Math.abs(initval) % maxValue;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.