You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Li Li <fa...@gmail.com> on 2014/05/08 14:58:01 UTC

how to pre split a table whose row key is MD5(url)?

say I have 4 region server. How to pre split a table using MD5 as row key?

Re: how to pre split a table whose row key is MD5(url)?

Posted by Michael Segel <mi...@hotmail.com>.
Meh.
Did this before my morning coffee. 
You get the idea. ;-) 

On May 12, 2014, at 2:37 AM, Li Li <fa...@gmail.com> wrote:

> thanks. I will try this.
> by the way, byte range is -128 - 127
> 
> On Mon, May 12, 2014 at 6:13 AM, Michael Segel
> <mi...@hotmail.com> wrote:
>> Simple answer… you really can’t.
>> The best thing you can do is to pre split the table in to 4 regions based on splitting the first byte in to 4 equal ranges. (0-63,64-127,128-191,191-255)
>> 
>> And hope that you’ll have an even split.
>> 
>> In theory, over time you will.
>> 
>> 
>> On May 8, 2014, at 1:58 PM, Li Li <fa...@gmail.com> wrote:
>> 
>>> say I have 4 region server. How to pre split a table using MD5 as row key?
>>> 
>> 
> 


Re: how to pre split a table whose row key is MD5(url)?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Or even easier, just use this:
create 't1', 'f1', {NUMREGIONS => 16, SPLITALGO => 'HexStringSplit'}


2014-05-14 10:50 GMT-04:00 Jean-Daniel Cryans <jd...@apache.org>:

> On Tue, May 13, 2014 at 9:58 AM, Liam Slusser <ls...@gmail.com> wrote:
>
> > You can also create a table via the hbase shell with pre-split tables
> like
> > this...
> >
> > Here is a 32-byte split into 16 different regions, using base16 (ie a md5
> > hash) for the key-type.
> >
> > create 't1', {NAME => 'f1'},
> > {SPLITS=> ['10000000000000000000000000000000',
> > '20000000000000000000000000000000',
> > '30000000000000000000000000000000',
> > '40000000000000000000000000000000',
> > '50000000000000000000000000000000',
> > '60000000000000000000000000000000',
> > '70000000000000000000000000000000',
> > '80000000000000000000000000000000',
> > '90000000000000000000000000000000',
> > 'a0000000000000000000000000000000',
> > 'b0000000000000000000000000000000',
> > 'c0000000000000000000000000000000',
> > 'd0000000000000000000000000000000',
> > 'e0000000000000000000000000000000',
> > 'f0000000000000000000000000000000']}
> >
>
> To make this easier to type, you don't even need the 0 padding. Just '1',
> '2', '3', ... 'f' is enough :)
>
>
> >
> > thanks,
> > liam
> >
> >
> >
> > On Tue, May 13, 2014 at 6:49 AM, sudhakara st <sudhakara.st@gmail.com
> > >wrote:
> >
> > > you can pre-splite table using you hex characters string for start key,
> > end
> > > key and using  number of regions to spilit
> > >
> > >
> > >
> >
> **************************************************************************************************************
> > > HTableDescriptor tableDes = new HTableDescriptor(tableName);
> > > tableDes.setValue(HTableDescriptor.SPLIT_POLICY,
> > > KeyPrefixRegionSplitPolicy.class.getName());
> > >
> > >            byte[][] splits =
> > > getHexSplits(SPLIT_START_KEY,SPLIT_END_KEY,NUM_OF_REGION_SPLIT);
> > >             admin.createTable(tableDes, splits);
> > >
> > >
> > >
> >
> ******************************************************************************************************************
> > >  private  byte[][] getHexSplits(String startKey, String endKey, int
> > > numRegions) {
> > >         byte[][] splits = new byte[numRegions - 1][];
> > >         BigInteger lowestKey = new BigInteger(startKey, 8);
> //considering
> > > for first 8bytes to spilte
> > >         BigInteger highestKey = new BigInteger(endKey, 8);
> > >         BigInteger range = highestKey.subtract(lowestKey);
> > >         BigInteger regionIncrement =
> > > range.divide(BigInteger.valueOf(numRegions));
> > >         lowestKey = lowestKey.add(regionIncrement);
> > >         for (int i = 0; i < numRegions - 1; i++) {
> > >             BigInteger key =
> > > lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
> > >             byte[] b = String.format("%016x", key).getBytes();
> > >             splits[i] = b;
> > >         }
> > >         return splits;
> > >     }
> > >
> > >
> > >
> >
> *************************************************************************************************************
> > >
> > >
> > > On Mon, May 12, 2014 at 7:07 AM, Li Li <fa...@gmail.com> wrote:
> > >
> > > > thanks. I will try this.
> > > > by the way, byte range is -128 - 127
> > > >
> > > > On Mon, May 12, 2014 at 6:13 AM, Michael Segel
> > > > <mi...@hotmail.com> wrote:
> > > > > Simple answer… you really can’t.
> > > > > The best thing you can do is to pre split the table in to 4 regions
> > > > based on splitting the first byte in to 4 equal ranges.
> > > > (0-63,64-127,128-191,191-255)
> > > > >
> > > > > And hope that you’ll have an even split.
> > > > >
> > > > > In theory, over time you will.
> > > > >
> > > > >
> > > > > On May 8, 2014, at 1:58 PM, Li Li <fa...@gmail.com> wrote:
> > > > >
> > > > >> say I have 4 region server. How to pre split a table using MD5 as
> > row
> > > > key?
> > > > >>
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > ...sudhakara
> > >
> >
>

Re: how to pre split a table whose row key is MD5(url)?

Posted by Jean-Daniel Cryans <jd...@apache.org>.
On Tue, May 13, 2014 at 9:58 AM, Liam Slusser <ls...@gmail.com> wrote:

> You can also create a table via the hbase shell with pre-split tables like
> this...
>
> Here is a 32-byte split into 16 different regions, using base16 (ie a md5
> hash) for the key-type.
>
> create 't1', {NAME => 'f1'},
> {SPLITS=> ['10000000000000000000000000000000',
> '20000000000000000000000000000000',
> '30000000000000000000000000000000',
> '40000000000000000000000000000000',
> '50000000000000000000000000000000',
> '60000000000000000000000000000000',
> '70000000000000000000000000000000',
> '80000000000000000000000000000000',
> '90000000000000000000000000000000',
> 'a0000000000000000000000000000000',
> 'b0000000000000000000000000000000',
> 'c0000000000000000000000000000000',
> 'd0000000000000000000000000000000',
> 'e0000000000000000000000000000000',
> 'f0000000000000000000000000000000']}
>

To make this easier to type, you don't even need the 0 padding. Just '1',
'2', '3', ... 'f' is enough :)


>
> thanks,
> liam
>
>
>
> On Tue, May 13, 2014 at 6:49 AM, sudhakara st <sudhakara.st@gmail.com
> >wrote:
>
> > you can pre-splite table using you hex characters string for start key,
> end
> > key and using  number of regions to spilit
> >
> >
> >
> **************************************************************************************************************
> > HTableDescriptor tableDes = new HTableDescriptor(tableName);
> > tableDes.setValue(HTableDescriptor.SPLIT_POLICY,
> > KeyPrefixRegionSplitPolicy.class.getName());
> >
> >            byte[][] splits =
> > getHexSplits(SPLIT_START_KEY,SPLIT_END_KEY,NUM_OF_REGION_SPLIT);
> >             admin.createTable(tableDes, splits);
> >
> >
> >
> ******************************************************************************************************************
> >  private  byte[][] getHexSplits(String startKey, String endKey, int
> > numRegions) {
> >         byte[][] splits = new byte[numRegions - 1][];
> >         BigInteger lowestKey = new BigInteger(startKey, 8); //considering
> > for first 8bytes to spilte
> >         BigInteger highestKey = new BigInteger(endKey, 8);
> >         BigInteger range = highestKey.subtract(lowestKey);
> >         BigInteger regionIncrement =
> > range.divide(BigInteger.valueOf(numRegions));
> >         lowestKey = lowestKey.add(regionIncrement);
> >         for (int i = 0; i < numRegions - 1; i++) {
> >             BigInteger key =
> > lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
> >             byte[] b = String.format("%016x", key).getBytes();
> >             splits[i] = b;
> >         }
> >         return splits;
> >     }
> >
> >
> >
> *************************************************************************************************************
> >
> >
> > On Mon, May 12, 2014 at 7:07 AM, Li Li <fa...@gmail.com> wrote:
> >
> > > thanks. I will try this.
> > > by the way, byte range is -128 - 127
> > >
> > > On Mon, May 12, 2014 at 6:13 AM, Michael Segel
> > > <mi...@hotmail.com> wrote:
> > > > Simple answer… you really can’t.
> > > > The best thing you can do is to pre split the table in to 4 regions
> > > based on splitting the first byte in to 4 equal ranges.
> > > (0-63,64-127,128-191,191-255)
> > > >
> > > > And hope that you’ll have an even split.
> > > >
> > > > In theory, over time you will.
> > > >
> > > >
> > > > On May 8, 2014, at 1:58 PM, Li Li <fa...@gmail.com> wrote:
> > > >
> > > >> say I have 4 region server. How to pre split a table using MD5 as
> row
> > > key?
> > > >>
> > > >
> > >
> >
> >
> >
> > --
> >
> > Regards,
> > ...sudhakara
> >
>

Re: how to pre split a table whose row key is MD5(url)?

Posted by Liam Slusser <ls...@gmail.com>.
You can also create a table via the hbase shell with pre-split tables like
this...

Here is a 32-byte split into 16 different regions, using base16 (ie a md5
hash) for the key-type.

create 't1', {NAME => 'f1'},
{SPLITS=> ['10000000000000000000000000000000',
'20000000000000000000000000000000',
'30000000000000000000000000000000',
'40000000000000000000000000000000',
'50000000000000000000000000000000',
'60000000000000000000000000000000',
'70000000000000000000000000000000',
'80000000000000000000000000000000',
'90000000000000000000000000000000',
'a0000000000000000000000000000000',
'b0000000000000000000000000000000',
'c0000000000000000000000000000000',
'd0000000000000000000000000000000',
'e0000000000000000000000000000000',
'f0000000000000000000000000000000']}

thanks,
liam



On Tue, May 13, 2014 at 6:49 AM, sudhakara st <su...@gmail.com>wrote:

> you can pre-splite table using you hex characters string for start key, end
> key and using  number of regions to spilit
>
>
> **************************************************************************************************************
> HTableDescriptor tableDes = new HTableDescriptor(tableName);
> tableDes.setValue(HTableDescriptor.SPLIT_POLICY,
> KeyPrefixRegionSplitPolicy.class.getName());
>
>            byte[][] splits =
> getHexSplits(SPLIT_START_KEY,SPLIT_END_KEY,NUM_OF_REGION_SPLIT);
>             admin.createTable(tableDes, splits);
>
>
> ******************************************************************************************************************
>  private  byte[][] getHexSplits(String startKey, String endKey, int
> numRegions) {
>         byte[][] splits = new byte[numRegions - 1][];
>         BigInteger lowestKey = new BigInteger(startKey, 8); //considering
> for first 8bytes to spilte
>         BigInteger highestKey = new BigInteger(endKey, 8);
>         BigInteger range = highestKey.subtract(lowestKey);
>         BigInteger regionIncrement =
> range.divide(BigInteger.valueOf(numRegions));
>         lowestKey = lowestKey.add(regionIncrement);
>         for (int i = 0; i < numRegions - 1; i++) {
>             BigInteger key =
> lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
>             byte[] b = String.format("%016x", key).getBytes();
>             splits[i] = b;
>         }
>         return splits;
>     }
>
>
> *************************************************************************************************************
>
>
> On Mon, May 12, 2014 at 7:07 AM, Li Li <fa...@gmail.com> wrote:
>
> > thanks. I will try this.
> > by the way, byte range is -128 - 127
> >
> > On Mon, May 12, 2014 at 6:13 AM, Michael Segel
> > <mi...@hotmail.com> wrote:
> > > Simple answer… you really can’t.
> > > The best thing you can do is to pre split the table in to 4 regions
> > based on splitting the first byte in to 4 equal ranges.
> > (0-63,64-127,128-191,191-255)
> > >
> > > And hope that you’ll have an even split.
> > >
> > > In theory, over time you will.
> > >
> > >
> > > On May 8, 2014, at 1:58 PM, Li Li <fa...@gmail.com> wrote:
> > >
> > >> say I have 4 region server. How to pre split a table using MD5 as row
> > key?
> > >>
> > >
> >
>
>
>
> --
>
> Regards,
> ...sudhakara
>

Re: how to pre split a table whose row key is MD5(url)?

Posted by sudhakara st <su...@gmail.com>.
you can pre-splite table using you hex characters string for start key, end
key and using  number of regions to spilit

**************************************************************************************************************
HTableDescriptor tableDes = new HTableDescriptor(tableName);
tableDes.setValue(HTableDescriptor.SPLIT_POLICY,
KeyPrefixRegionSplitPolicy.class.getName());

           byte[][] splits =
getHexSplits(SPLIT_START_KEY,SPLIT_END_KEY,NUM_OF_REGION_SPLIT);
            admin.createTable(tableDes, splits);

******************************************************************************************************************
 private  byte[][] getHexSplits(String startKey, String endKey, int
numRegions) {
        byte[][] splits = new byte[numRegions - 1][];
        BigInteger lowestKey = new BigInteger(startKey, 8); //considering
for first 8bytes to spilte
        BigInteger highestKey = new BigInteger(endKey, 8);
        BigInteger range = highestKey.subtract(lowestKey);
        BigInteger regionIncrement =
range.divide(BigInteger.valueOf(numRegions));
        lowestKey = lowestKey.add(regionIncrement);
        for (int i = 0; i < numRegions - 1; i++) {
            BigInteger key =
lowestKey.add(regionIncrement.multiply(BigInteger.valueOf(i)));
            byte[] b = String.format("%016x", key).getBytes();
            splits[i] = b;
        }
        return splits;
    }

*************************************************************************************************************


On Mon, May 12, 2014 at 7:07 AM, Li Li <fa...@gmail.com> wrote:

> thanks. I will try this.
> by the way, byte range is -128 - 127
>
> On Mon, May 12, 2014 at 6:13 AM, Michael Segel
> <mi...@hotmail.com> wrote:
> > Simple answer… you really can’t.
> > The best thing you can do is to pre split the table in to 4 regions
> based on splitting the first byte in to 4 equal ranges.
> (0-63,64-127,128-191,191-255)
> >
> > And hope that you’ll have an even split.
> >
> > In theory, over time you will.
> >
> >
> > On May 8, 2014, at 1:58 PM, Li Li <fa...@gmail.com> wrote:
> >
> >> say I have 4 region server. How to pre split a table using MD5 as row
> key?
> >>
> >
>



-- 

Regards,
...sudhakara

Re: how to pre split a table whose row key is MD5(url)?

Posted by Li Li <fa...@gmail.com>.
thanks. I will try this.
by the way, byte range is -128 - 127

On Mon, May 12, 2014 at 6:13 AM, Michael Segel
<mi...@hotmail.com> wrote:
> Simple answer… you really can’t.
> The best thing you can do is to pre split the table in to 4 regions based on splitting the first byte in to 4 equal ranges. (0-63,64-127,128-191,191-255)
>
> And hope that you’ll have an even split.
>
> In theory, over time you will.
>
>
> On May 8, 2014, at 1:58 PM, Li Li <fa...@gmail.com> wrote:
>
>> say I have 4 region server. How to pre split a table using MD5 as row key?
>>
>

Re: how to pre split a table whose row key is MD5(url)?

Posted by Michael Segel <mi...@hotmail.com>.
Simple answer… you really can’t. 
The best thing you can do is to pre split the table in to 4 regions based on splitting the first byte in to 4 equal ranges. (0-63,64-127,128-191,191-255) 

And hope that you’ll have an even split. 

In theory, over time you will.  


On May 8, 2014, at 1:58 PM, Li Li <fa...@gmail.com> wrote:

> say I have 4 region server. How to pre split a table using MD5 as row key?
> 


Re: how to pre split a table whose row key is MD5(url)?

Posted by Ted Yu <yu...@gmail.com>.
Please see http://hbase.apache.org/book.html#rowkey.regionsplits


On Thu, May 8, 2014 at 5:58 AM, Li Li <fa...@gmail.com> wrote:

> say I have 4 region server. How to pre split a table using MD5 as row key?
>