You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by beeshma r <be...@gmail.com> on 2016/03/05 18:42:37 UTC
Re: A proposal for Provide key range support to bulkload to avoid too
many reducers (HBASE-9556)
HI Ted ,
Regarding for this Fix HBASE-9556 .while I testing with pre- split table
i.e
*create 'test', 'cf', SPLITS=> ['a', 'b', 'c'**]* =>it should create 3 regions.
So for this case i created logic to find start keys of regions.
HTable ht=new HTable(con,"test"); // Table object
NavigableMap<HRegionInfo,ServerName> np=ht.getRegionLocations();
Set<HRegionInfo> setinfo=np.keySet();
List<HRegionInfo> lis=new ArrayList<HRegionInfo>();
lis.addAll(setinfo);
for(org.apache.hadoop.hbase.HRegionInfo h :lis)
{
System.out.println(h.getRegionId() + "getRegionId");
String s = new String(h.getStartKey());
System.out.println(s.toString()+"-------start key");
}
As per above code logic i got 4 regions( 4 RegionId's) One is with
empty start key and end key remaining start keys are started like
a,b,c as respective regions
My question are
1.How many Region the below command will create?
*create 'test', 'cf', SPLITS=> ['a', 'b', 'c'**]*
2.To find exact number for regions can i use RegionID counts?
cheers
Beeshma
On Thu, Jul 30, 2015 at 9:57 AM, Ted Yu <yu...@gmail.com> wrote:
> The following API doesn't contain start / end keys:
> List<InputSplit> getSplits(JobContext context)
>
> You need to pass key range information.
>
> I suggest continue discussion on the JIRA.
>
> Cheers
>
> On Thu, Jul 30, 2015 at 9:50 AM, beeshma r <be...@gmail.com> wrote:
>
> > HI,
> >
> > i'd like work with key range support to bulkload to avoid too many
> reducers
> > mentioned in with these issues (HBASE-9556,HBASE-4063)
> >
> > Description and high level design for proposed solution
> >
> > Currently while we loading bulk data in to Hbase through Mapredue in form
> > of TableInputFormatBase the number of splits matches the number of
> regions
> > in a table
> > so Here i am going to change the process TableInputFormatBase deceides
> > range for key splits
> > For example if input data going to load data in 50 regions(Actullay RS
> has
> > 400 Regions)
> >
> > - List<InputSplit> getSplits(JobContext context) will find 50 exact
> > list of splits (Currently it returns 400 )
> >
> >
> > Am i understand correctly? please let me know if Am I on the wrong track
> > .Any one is willing to mentor me because i am new to ASF
> >
> > Thanks
> > Beeshma
> >
>
--
Re: A proposal for Provide key range support to bulkload to avoid too
many reducers (HBASE-9556)
Posted by Ted Yu <yu...@gmail.com>.
I issued the same command in hbase shell and got 4 regions (see tail of
this email).
The values given for SPLITS parameter designate the start keys of regions.
The first region has empty start key.
For #2, getRegionLocations() returns a Map. You can use the following
method to retrieve number of regions:
https://docs.oracle.com/javase/7/docs/api/java/util/Map.html#size()
test,,1457200138927.3daccd0d6f9eb42b25625ea09b5e0e35.
test,a,1457200138927.5714dd320e470add17a566c2154b47eb.
test,b,1457200138927.01d01fa4592521d195ac4a7182e7b059.
test,c,1457200138927.334346d5892afc859833dad734353d9b.
On Sat, Mar 5, 2016 at 9:42 AM, beeshma r <be...@gmail.com> wrote:
> HI Ted ,
>
> Regarding for this Fix HBASE-9556 .while I testing with pre- split
> table
> i.e
>
> *create 'test', 'cf', SPLITS=> ['a', 'b', 'c'**]* =>it should create 3 regions.
>
> So for this case i created logic to find start keys of regions.
>
> HTable ht=new HTable(con,"test"); // Table object
> NavigableMap<HRegionInfo,ServerName> np=ht.getRegionLocations();
> Set<HRegionInfo> setinfo=np.keySet();
> List<HRegionInfo> lis=new ArrayList<HRegionInfo>();
> lis.addAll(setinfo);
> for(org.apache.hadoop.hbase.HRegionInfo h :lis)
> {
> System.out.println(h.getRegionId() + "getRegionId");
>
> String s = new String(h.getStartKey());
>
> System.out.println(s.toString()+"-------start key");
> }
>
> As per above code logic i got 4 regions( 4 RegionId's) One is with empty start key and end key remaining start keys are started like a,b,c as respective regions
>
> My question are
>
> 1.How many Region the below command will create?
> *create 'test', 'cf', SPLITS=> ['a', 'b', 'c'**]*
>
> 2.To find exact number for regions can i use RegionID counts?
>
>
> cheers
>
> Beeshma
>
>
>
> On Thu, Jul 30, 2015 at 9:57 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> The following API doesn't contain start / end keys:
>> List<InputSplit> getSplits(JobContext context)
>>
>> You need to pass key range information.
>>
>> I suggest continue discussion on the JIRA.
>>
>> Cheers
>>
>> On Thu, Jul 30, 2015 at 9:50 AM, beeshma r <be...@gmail.com> wrote:
>>
>> > HI,
>> >
>> > i'd like work with key range support to bulkload to avoid too many
>> reducers
>> > mentioned in with these issues (HBASE-9556,HBASE-4063)
>> >
>> > Description and high level design for proposed solution
>> >
>> > Currently while we loading bulk data in to Hbase through Mapredue in
>> form
>> > of TableInputFormatBase the number of splits matches the number of
>> regions
>> > in a table
>> > so Here i am going to change the process TableInputFormatBase deceides
>> > range for key splits
>> > For example if input data going to load data in 50 regions(Actullay RS
>> has
>> > 400 Regions)
>> >
>> > - List<InputSplit> getSplits(JobContext context) will find 50 exact
>> > list of splits (Currently it returns 400 )
>> >
>> >
>> > Am i understand correctly? please let me know if Am I on the wrong track
>> > .Any one is willing to mentor me because i am new to ASF
>> >
>> > Thanks
>> > Beeshma
>> >
>>
>
>
>
> --
>
>
>
>
>
>