You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by beeshma r <be...@gmail.com> on 2016/03/05 18:42:37 UTC

Re: A proposal for Provide key range support to bulkload to avoid too many reducers (HBASE-9556)

HI Ted ,

Regarding for this  Fix  HBASE-9556 .while I testing with pre- split  table
i.e

*create 'test', 'cf', SPLITS=> ['a', 'b', 'c'**]* =>it should  create 3 regions.

So for this case i created logic to find start keys of regions.

HTable ht=new HTable(con,"test"); // Table object
NavigableMap<HRegionInfo,ServerName> np=ht.getRegionLocations();			
Set<HRegionInfo> setinfo=np.keySet();
List<HRegionInfo> lis=new ArrayList<HRegionInfo>();
lis.addAll(setinfo);
for(org.apache.hadoop.hbase.HRegionInfo h :lis)
    		{
    			System.out.println(h.getRegionId() + "getRegionId");
    			
    			String s = new String(h.getStartKey());

			System.out.println(s.toString()+"-------start key");
		}

As per above code logic i got 4 regions( 4 RegionId's) One is with
empty start key and end key remaining start keys are started like
a,b,c as respective regions

My question are

1.How many Region the below command will create?
*create 'test', 'cf', SPLITS=> ['a', 'b', 'c'**]*

2.To find exact number for regions can i use RegionID counts?


cheers

Beeshma



On Thu, Jul 30, 2015 at 9:57 AM, Ted Yu <yu...@gmail.com> wrote:

> The following API doesn't contain start / end keys:
> List<InputSplit> getSplits(JobContext context)
>
> You need to pass key range information.
>
> I suggest continue discussion on the JIRA.
>
> Cheers
>
> On Thu, Jul 30, 2015 at 9:50 AM, beeshma r <be...@gmail.com> wrote:
>
> > HI,
> >
> > i'd like work with key range support to bulkload to avoid too many
> reducers
> > mentioned in with these issues (HBASE-9556,HBASE-4063)
> >
> > Description and high level design for  proposed solution
> >
> > Currently while we loading bulk data in to Hbase through Mapredue in form
> > of TableInputFormatBase the number of splits matches the number of
> regions
> > in a table
> > so Here i am going to change the process TableInputFormatBase deceides
> > range for key splits
> >  For example if input data going to load data in 50 regions(Actullay RS
> has
> > 400 Regions)
> >
> >    - List<InputSplit> getSplits(JobContext context) will find  50 exact
> >    list of splits (Currently it returns 400 )
> >
> >
> > Am i understand correctly? please let me know if Am I on the wrong track
> > .Any one is willing to mentor me because i am new to ASF
> >
> > Thanks
> > Beeshma
> >
>



--

Re: A proposal for Provide key range support to bulkload to avoid too many reducers (HBASE-9556)

Posted by Ted Yu <yu...@gmail.com>.

I issued the same command in hbase shell and got 4 regions (see tail of
this email).
The values given for SPLITS parameter designate the start keys of regions.
The first region has empty start key.

For #2, getRegionLocations() returns a Map. You can use the following
method to retrieve number of regions:

https://docs.oracle.com/javase/7/docs/api/java/util/Map.html#size()

test,,1457200138927.3daccd0d6f9eb42b25625ea09b5e0e35.
test,a,1457200138927.5714dd320e470add17a566c2154b47eb.
test,b,1457200138927.01d01fa4592521d195ac4a7182e7b059.
test,c,1457200138927.334346d5892afc859833dad734353d9b.

On Sat, Mar 5, 2016 at 9:42 AM, beeshma r <be...@gmail.com> wrote:

> HI Ted ,
>
> Regarding for this  Fix  HBASE-9556 .while I testing with pre- split
> table
> i.e
>
> *create 'test', 'cf', SPLITS=> ['a', 'b', 'c'**]* =>it should  create 3 regions.
>
> So for this case i created logic to find start keys of regions.
>
> HTable ht=new HTable(con,"test"); // Table object
> NavigableMap<HRegionInfo,ServerName> np=ht.getRegionLocations();			
> Set<HRegionInfo> setinfo=np.keySet();
> List<HRegionInfo> lis=new ArrayList<HRegionInfo>();
> lis.addAll(setinfo);
> for(org.apache.hadoop.hbase.HRegionInfo h :lis)
>     		{
>     			System.out.println(h.getRegionId() + "getRegionId");
>     			
>     			String s = new String(h.getStartKey());
>
> 			System.out.println(s.toString()+"-------start key");
> 		}
>
> As per above code logic i got 4 regions( 4 RegionId's) One is with empty start key and end key remaining start keys are started like a,b,c as respective regions
>
> My question are
>
> 1.How many Region the below command will create?
> *create 'test', 'cf', SPLITS=> ['a', 'b', 'c'**]*
>
> 2.To find exact number for regions can i use RegionID counts?
>
>
> cheers
>
> Beeshma
>
>
>
> On Thu, Jul 30, 2015 at 9:57 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> The following API doesn't contain start / end keys:
>> List<InputSplit> getSplits(JobContext context)
>>
>> You need to pass key range information.
>>
>> I suggest continue discussion on the JIRA.
>>
>> Cheers
>>
>> On Thu, Jul 30, 2015 at 9:50 AM, beeshma r <be...@gmail.com> wrote:
>>
>> > HI,
>> >
>> > i'd like work with key range support to bulkload to avoid too many
>> reducers
>> > mentioned in with these issues (HBASE-9556,HBASE-4063)
>> >
>> > Description and high level design for  proposed solution
>> >
>> > Currently while we loading bulk data in to Hbase through Mapredue in
>> form
>> > of TableInputFormatBase the number of splits matches the number of
>> regions
>> > in a table
>> > so Here i am going to change the process TableInputFormatBase deceides
>> > range for key splits
>> >  For example if input data going to load data in 50 regions(Actullay RS
>> has
>> > 400 Regions)
>> >
>> >    - List<InputSplit> getSplits(JobContext context) will find  50 exact
>> >    list of splits (Currently it returns 400 )
>> >
>> >
>> > Am i understand correctly? please let me know if Am I on the wrong track
>> > .Any one is willing to mentor me because i am new to ASF
>> >
>> > Thanks
>> > Beeshma
>> >
>>
>
>
>
> --
>
>
>
>
>
>