You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Anoop Sam John (JIRA)" <ji...@apache.org> on 2014/05/23 11:22:02 UTC

[jira] [Commented] (HBASE-9556) Provide key range support to bulkload to avoid too many reducers even the data belongs to few regions

    [ https://issues.apache.org/jira/browse/HBASE-9556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006999#comment-14006999 ] 

Anoop Sam John commented on HBASE-9556:
---------------------------------------

Related issue HBASE-4063

> Provide key range support to bulkload to avoid too many reducers even the data belongs to few regions
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9556
>                 URL: https://issues.apache.org/jira/browse/HBASE-9556
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: rajeshbabu
>            Assignee: rajeshbabu
>            Priority: Minor
>
> Presently the number of reducers in bulk load are equal to number of regions.
> Lets suppose a table has 500 regions and import data only belongs 10 regions, still we are starting 500(equal to no. of regions) reducers instead of 10. Which will consume more time and resources. 
> If user knows the row key range of import data, then we can pass startkey and/or endkey as input and based on the key range we can define the partitions and number of reducers(regions to which the data belongs). This helps to avoid too many reducers to start and do nothing and also avoids contention in shuffling.



--
This message was sent by Atlassian JIRA
(v6.2#6252)