You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2010/05/11 07:31:29 UTC

[jira] Created: (HBASE-2531) 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes

32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
----------------------------------------------------------------------------

                 Key: HBASE-2531
                 URL: https://issues.apache.org/jira/browse/HBASE-2531
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack
            Assignee: stack
            Priority: Blocker
             Fix For: 0.20.5, 0.21.0


Kannan tripped over two regionnames that hashed the same:

Here is code demo'ing that his two names hash the same:

{code}
package org;

import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.util.JenkinsHash;


public class Testing {
  public static void main(final String [] args) {
    System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
    System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
  }

  /**
   * @param regionName
   * @return the encodedName
   */
  public static int encodeRegionName(final byte [] regionName) {
    return Math.abs(JenkinsHash.getInstance().hash(regionName, regionName.length, 0));
  }
}
{code}

Need new encoding mechanism.  Will need to migrate old regions to new schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-2531) 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-2531.
--------------------------

    Hadoop Flags: [Reviewed]
    Release Note: Changes format of region name.  Adds an md5 suffix.  Suffix is now the name used as directory name in filesystem.
      Resolution: Fixed

Committed.  Thanks for sweet patch Kannan.

> 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2531
>                 URL: https://issues.apache.org/jira/browse/HBASE-2531
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Kannan Muthukkaruppan
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2531_v2.patch
>
>
> Kannan tripped over two regionnames that hashed the same:
> Here is code demo'ing that his two names hash the same:
> {code}
> package org;
> import org.apache.hadoop.hbase.util.Bytes;
> import org.apache.hadoop.hbase.util.JenkinsHash;
> public class Testing {
>   public static void main(final String [] args) {
>     System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
>     System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
>   }
>   /**
>    * @param regionName
>    * @return the encodedName
>    */
>   public static int encodeRegionName(final byte [] regionName) {
>     return Math.abs(JenkinsHash.getInstance().hash(regionName, regionName.length, 0));
>   }
> }
> {code}
> Need new encoding mechanism.  Will need to migrate old regions to new schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.