You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2009/01/19 20:02:01 UTC

[jira] Created: (HBASE-1136) HashFunction inadvertently destroys some randomness

HashFunction inadvertently destroys some randomness
---------------------------------------------------

                 Key: HBASE-1136
                 URL: https://issues.apache.org/jira/browse/HBASE-1136
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: Jonathan Ellis


the code

      for (int i = 0, initval = 0; i < nbHash; i++) {
        initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
      }

restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:

      for (int i = 0, initval = 0; i < nbHash; i++) {
        initval = hashFunction.hash(b, initval);
        result[i] = Math.abs(initval) % maxValue;
      }


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665245#action_12665245 ] 

Jonathan Ellis commented on HBASE-1136:
---------------------------------------

done: https://issues.apache.org/jira/browse/HADOOP-5079

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>             Fix For: 0.20.0
>
>         Attachments: hash.patch
>
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated HBASE-1136:
----------------------------------

    Status: Open  (was: Patch Available)

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated HBASE-1136:
----------------------------------

    Attachment: hbase-testfilter.patch

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>             Fix For: 0.20.0
>
>         Attachments: hash.patch, hbase-testfilter.patch
>
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reopened HBASE-1136:
--------------------------


Reopening.  TestFilter fails.  Reverted patch.  Can you fix Jonathan?

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>             Fix For: 0.20.0
>
>         Attachments: hash.patch
>
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1136:
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.20.0
           Status: Resolved  (was: Patch Available)

Committed to TRUNK.  Thanks for the patch Jonathan.

Please open a new patch against Hadoop with a patch for src/core/org/apache/hadoop/util/bloom/HashFunction.java.  HashFunction was moved there by HADOOP-3063 (We'll move to using the hadoop versions in next hbase).  I can commit it after letting it air for a while in case objection.

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>             Fix For: 0.20.0
>
>         Attachments: hash.patch
>
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1136:
-------------------------

    Status: Patch Available  (was: Open)

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>         Attachments: hash.patch
>
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665293#action_12665293 ] 

Jonathan Ellis commented on HBASE-1136:
---------------------------------------

How do I run just TestFilter, and how do I get it to tell me what the failure details are?

    [junit] Running org.onelab.test.TestFilter
    [junit] Tests run: 3, Failures: 2, Errors: 0, Time elapsed: 0.019 sec
    [junit] Test org.onelab.test.TestFilter FAILED

not super helpful. :)

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>             Fix For: 0.20.0
>
>         Attachments: hash.patch
>
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665307#action_12665307 ] 

Jonathan Ellis commented on HBASE-1136:
---------------------------------------

Here is a patch that fixes the test.

I also took out the 'or' test because or-ing two CBFs isn't valid...  if you take 0x1|0x1 you get 0x1, but really the combined count should be 2.



> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>             Fix For: 0.20.0
>
>         Attachments: hash.patch, hbase-testfilter.patch
>
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665295#action_12665295 ] 

stack commented on HBASE-1136:
------------------------------

Sure, np.  If you look in the build/test dir.. there is a dump of the test output. Or, "ant -Dtestcase=TestFilter -Dtest.output=yes test" will run just that test and output to console

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>             Fix For: 0.20.0
>
>         Attachments: hash.patch
>
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665298#action_12665298 ] 

Jonathan Ellis commented on HBASE-1136:
---------------------------------------

oh, I see what is going on.  it's a bad test.

    Key key = new StringKey("toto");
    Key k2 = new StringKey("lulu");
    Key k3 = new StringKey("mama");
    bf.add(key);
    bf.add(k2);
    bf.add(k3);
    assertTrue(bf.membershipTest(key));
    assertTrue(bf.membershipTest(new StringKey("graknyl")));

graknyl was never added.  it's relying on the implementation details of the old (broken) HashFunction.

do you want me to just rip stuff like that out?

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>             Fix For: 0.20.0
>
>         Attachments: hash.patch
>
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665299#action_12665299 ] 

stack commented on HBASE-1136:
------------------------------

Do please.  I did that myself earlier but then, after changing the two instances, it was failing later where a key was reported not present when it should have been.  Thanks Jonathan.

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>             Fix For: 0.20.0
>
>         Attachments: hash.patch
>
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated HBASE-1136:
----------------------------------

    Status: Patch Available  (was: Open)

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-1136.
--------------------------

    Resolution: Fixed

Committed (again).  Thanks for the fixup Jonathan.

There is no unit test up in Hadoop.

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>             Fix For: 0.20.0
>
>         Attachments: hash.patch, hbase-testfilter.patch
>
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-1136) HashFunction inadvertently destroys some randomness

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated HBASE-1136:
----------------------------------

    Attachment: hash.patch

patch against r735771

> HashFunction inadvertently destroys some randomness
> ---------------------------------------------------
>
>                 Key: HBASE-1136
>                 URL: https://issues.apache.org/jira/browse/HBASE-1136
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>         Attachments: hash.patch
>
>
> the code
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = result[i] = Math.abs(hashFunction.hash(b, initval) % maxValue);
>       }
> restricts initval for the next hash to the [0, maxValue) range of the hash indexes returned.  This is suboptimal, particularly for larger nbHash and smaller maxValue.  Instead, use:
>       for (int i = 0, initval = 0; i < nbHash; i++) {
>         initval = hashFunction.hash(b, initval);
>         result[i] = Math.abs(initval) % maxValue;
>       }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.