You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "David Koch (JIRA)" <ji...@apache.org> on 2014/06/10 14:00:09 UTC

[jira] [Commented] (HBASE-5140) TableInputFormat subclass to allow N number of splits per region during MR jobs

    [ https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026369#comment-14026369 ] 

David Koch commented on HBASE-5140:
-----------------------------------

{quote}
Stale issue. Reopen if still relevant.
{quote}

Why is this deemed irrelevant? Is there new functionality in recent HBase versions which supersedes this class? By the way, in method {{getMaxByteArrayValue}} the array value assignment should read:

{code}
bytes[i] = (byte) 0xff;
{code}

> TableInputFormat subclass to allow N number of splits per region during MR jobs
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-5140
>                 URL: https://issues.apache.org/jira/browse/HBASE-5140
>             Project: HBase
>          Issue Type: New Feature
>          Components: mapreduce
>    Affects Versions: 0.90.4
>            Reporter: Josh Wymer
>            Priority: Trivial
>              Labels: mapreduce, split
>         Attachments: Added_functionality_to_TableInputFormat_that_allows_splitting_of_regions.patch, Added_functionality_to_TableInputFormat_that_allows_splitting_of_regions.patch.1, Added_functionality_to_split_n_times_per_region_on_mapreduce_jobs.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I am working on a patch for the TableInputFormat class that overrides getSplits in order to generate N number of splits per regions and/or N number of splits per job. The idea is to convert the startKey and endKey for each region from byte[] to BigDecimal, take the difference, divide by N, convert back to byte[] and generate splits on the resulting values. Assuming your keys are fully distributed this should generate splits at nearly the same number of rows per split. Any suggestions on this issue are welcome.



--
This message was sent by Atlassian JIRA
(v6.2#6252)