You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/02/03 23:07:59 UTC

[jira] Created: (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

New MR splitting algorithm and other new features need a way to split a key range in N chunks
---------------------------------------------------------------------------------------------

                 Key: HBASE-1183
                 URL: https://issues.apache.org/jira/browse/HBASE-1183
             Project: Hadoop HBase
          Issue Type: Improvement
          Components: util
            Reporter: Jonathan Gray
            Assignee: Jonathan Gray
            Priority: Minor
             Fix For: 0.19.1, 0.20.0


For HBASE-1172 and other functionality coming soon, we need to be able to take a [start,stop) range and divide it into chunks.

For example, we have 10 regions but want to run 30 maps.  We need to divide each region into three key ranges for the start/stop of each scanner.

Implementing using java.math.BigInteger

Will also include a couple additional helpers in Bytes to make life easy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1183:
---------------------------------

    Attachment: hbase-1183-v1.patch

Introduces 5 new functions to Bytes util class:

- public static byte [][] split(final byte [] a, final byte [] b, final int num)
- public static byte [] head(final byte [] a, final int length)
- public static byte [] tail(final byte [] a, final int length)
- public static byte [] padHead(final byte [] a, final int length)
- public static byte [] padTail(final byte [] a, final int length)

head/tail are certainly useful and generic.  Not sure we have a need for the padHead/padTail functions but it's used for splitting (start/stop need to be same length for BigInteger and am also prepending a 0 to both to ensure they are not intepreted as negative numbers).

> New MR splitting algorithm and other new features need a way to split a key range in N chunks
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1183
>                 URL: https://issues.apache.org/jira/browse/HBASE-1183
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Minor
>             Fix For: 0.19.1, 0.20.0
>
>         Attachments: hbase-1183-v1.patch
>
>
> For HBASE-1172 and other functionality coming soon, we need to be able to take a [start,stop) range and divide it into chunks.
> For example, we have 10 regions but want to run 30 maps.  We need to divide each region into three key ranges for the start/stop of each scanner.
> Implementing using java.math.BigInteger
> Will also include a couple additional helpers in Bytes to make life easy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702650#action_12702650 ] 

stack commented on HBASE-1183:
------------------------------

Jon'ey, patch looks good, but how about a unit test if only to demo how useful your fancy new BigInteger doohickey is?  I can help... or, since we now have a TestBytes class, it should be super easy: testSplit....

> New MR splitting algorithm and other new features need a way to split a key range in N chunks
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1183
>                 URL: https://issues.apache.org/jira/browse/HBASE-1183
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: hbase-1183-v1.patch, hbase-1183-v2.patch, hbase-1183-v3.patch, hbase-1183-v4.patch
>
>
> For HBASE-1172 and other functionality coming soon, we need to be able to take a [start,stop) range and divide it into chunks.
> For example, we have 10 regions but want to run 30 maps.  We need to divide each region into three key ranges for the start/stop of each scanner.
> Implementing using java.math.BigInteger
> Will also include a couple additional helpers in Bytes to make life easy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1183:
---------------------------------

    Attachment: hbase-1183-v3.patch

Fixed bug related to the prepending and then stripping of a 0 byte from the beginning of keys.  We do this because BigInteger is signed and this ensures all values are positive when turned into BIs.  Just need to check whether there is still an extra 0 in front of the midkeys.

> New MR splitting algorithm and other new features need a way to split a key range in N chunks
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1183
>                 URL: https://issues.apache.org/jira/browse/HBASE-1183
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Minor
>             Fix For: 0.19.1, 0.20.0
>
>         Attachments: hbase-1183-v1.patch, hbase-1183-v2.patch, hbase-1183-v3.patch
>
>
> For HBASE-1172 and other functionality coming soon, we need to be able to take a [start,stop) range and divide it into chunks.
> For example, we have 10 regions but want to run 30 maps.  We need to divide each region into three key ranges for the start/stop of each scanner.
> Implementing using java.math.BigInteger
> Will also include a couple additional helpers in Bytes to make life easy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1183:
---------------------------------

    Attachment: hbase-1183-v2.patch

Missing one line of javadoc.  Patch applies against 0.19 branch and 0.20 trunk.

> New MR splitting algorithm and other new features need a way to split a key range in N chunks
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1183
>                 URL: https://issues.apache.org/jira/browse/HBASE-1183
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Minor
>             Fix For: 0.19.1, 0.20.0
>
>         Attachments: hbase-1183-v1.patch, hbase-1183-v2.patch
>
>
> For HBASE-1172 and other functionality coming soon, we need to be able to take a [start,stop) range and divide it into chunks.
> For example, we have 10 regions but want to run 30 maps.  We need to divide each region into three key ranges for the start/stop of each scanner.
> Implementing using java.math.BigInteger
> Will also include a couple additional helpers in Bytes to make life easy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1183:
-------------------------

    Status: Patch Available  (was: Open)

Marking patch available so attached patch gets some attention.

> New MR splitting algorithm and other new features need a way to split a key range in N chunks
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1183
>                 URL: https://issues.apache.org/jira/browse/HBASE-1183
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: hbase-1183-v1.patch, hbase-1183-v2.patch, hbase-1183-v3.patch, hbase-1183-v4.patch
>
>
> For HBASE-1172 and other functionality coming soon, we need to be able to take a [start,stop) range and divide it into chunks.
> For example, we have 10 regions but want to run 30 maps.  We need to divide each region into three key ranges for the start/stop of each scanner.
> Implementing using java.math.BigInteger
> Will also include a couple additional helpers in Bytes to make life easy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1183:
---------------------------------

    Fix Version/s:     (was: 0.19.1)

Fix for 0.20.0

> New MR splitting algorithm and other new features need a way to split a key range in N chunks
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1183
>                 URL: https://issues.apache.org/jira/browse/HBASE-1183
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: hbase-1183-v1.patch, hbase-1183-v2.patch, hbase-1183-v3.patch, hbase-1183-v4.patch
>
>
> For HBASE-1172 and other functionality coming soon, we need to be able to take a [start,stop) range and divide it into chunks.
> For example, we have 10 regions but want to run 30 maps.  We need to divide each region into three key ranges for the start/stop of each scanner.
> Implementing using java.math.BigInteger
> Will also include a couple additional helpers in Bytes to make life easy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-1183:
---------------------------------

    Attachment: hbase-1183-v4.patch

Another fix for case when you have leading 0s in the original start or stop rows.  Rather than preprending a 0, we prepend byte[]{1,0} and then remove at least the 1 at the end.  If second byte is a zero, we drop it, otherwise keep it.

This definitely needs a solid unit test.  Will post a unit test tomorrow.

> New MR splitting algorithm and other new features need a way to split a key range in N chunks
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1183
>                 URL: https://issues.apache.org/jira/browse/HBASE-1183
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Minor
>             Fix For: 0.19.1, 0.20.0
>
>         Attachments: hbase-1183-v1.patch, hbase-1183-v2.patch, hbase-1183-v3.patch, hbase-1183-v4.patch
>
>
> For HBASE-1172 and other functionality coming soon, we need to be able to take a [start,stop) range and divide it into chunks.
> For example, we have 10 regions but want to run 30 maps.  We need to divide each region into three key ranges for the start/stop of each scanner.
> Implementing using java.math.BigInteger
> Will also include a couple additional helpers in Bytes to make life easy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1183) New MR splitting algorithm and other new features need a way to split a key range in N chunks

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1183:
-------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed (after adding unit tests).  Added link in javadoc from the util.Keying class since fellas interested in Keying will be happy to learn of split.  Thanks for the patch Jon.

> New MR splitting algorithm and other new features need a way to split a key range in N chunks
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1183
>                 URL: https://issues.apache.org/jira/browse/HBASE-1183
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Minor
>             Fix For: 0.20.0
>
>         Attachments: hbase-1183-v1.patch, hbase-1183-v2.patch, hbase-1183-v3.patch, hbase-1183-v4.patch
>
>
> For HBASE-1172 and other functionality coming soon, we need to be able to take a [start,stop) range and divide it into chunks.
> For example, we have 10 regions but want to run 30 maps.  We need to divide each region into three key ranges for the start/stop of each scanner.
> Implementing using java.math.BigInteger
> Will also include a couple additional helpers in Bytes to make life easy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.