You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2010/05/11 07:31:29 UTC
[jira] Created: (HBASE-2531) 32-bit encoding of regionnames
waaaaaaayyyyy too susceptible to hash clashes
32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
----------------------------------------------------------------------------
Key: HBASE-2531
URL: https://issues.apache.org/jira/browse/HBASE-2531
Project: Hadoop HBase
Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
Fix For: 0.20.5, 0.21.0
Kannan tripped over two regionnames that hashed the same:
Here is code demo'ing that his two names hash the same:
{code}
package org;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.hbase.util.JenkinsHash;
public class Testing {
public static void main(final String [] args) {
System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
}
/**
* @param regionName
* @return the encodedName
*/
public static int encodeRegionName(final byte [] regionName) {
return Math.abs(JenkinsHash.getInstance().hash(regionName, regionName.length, 0));
}
}
{code}
Need new encoding mechanism. Will need to migrate old regions to new schema.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-2531) 32-bit encoding of regionnames
waaaaaaayyyyy too susceptible to hash clashes
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack resolved HBASE-2531.
--------------------------
Hadoop Flags: [Reviewed]
Release Note: Changes format of region name. Adds an md5 suffix. Suffix is now the name used as directory name in filesystem.
Resolution: Fixed
Committed. Thanks for sweet patch Kannan.
> 32-bit encoding of regionnames waaaaaaayyyyy too susceptible to hash clashes
> ----------------------------------------------------------------------------
>
> Key: HBASE-2531
> URL: https://issues.apache.org/jira/browse/HBASE-2531
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Kannan Muthukkaruppan
> Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: HBASE-2531_v2.patch
>
>
> Kannan tripped over two regionnames that hashed the same:
> Here is code demo'ing that his two names hash the same:
> {code}
> package org;
> import org.apache.hadoop.hbase.util.Bytes;
> import org.apache.hadoop.hbase.util.JenkinsHash;
> public class Testing {
> public static void main(final String [] args) {
> System.out.println(encodeRegionName(Bytes.toBytes("test1,6838000000,1273541236167")));
> System.out.println(encodeRegionName(Bytes.toBytes("test1,0520100000,1273541610201")));
> }
> /**
> * @param regionName
> * @return the encodedName
> */
> public static int encodeRegionName(final byte [] regionName) {
> return Math.abs(JenkinsHash.getInstance().hash(regionName, regionName.length, 0));
> }
> }
> {code}
> Need new encoding mechanism. Will need to migrate old regions to new schema.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.