You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by rajesh balamohan <rb...@gmail.com> on 2011/09/15 05:21:54 UTC

HBase ImportTSV

Hi All,

ImportTSV is a great tool for bulk loading the data into HBASE.

I have close to 500+GB of raw data which I would like to import into a newly
created HTABLE. If I go ahead with ImportTSV, it creates only one reducer
which is a bottleneck in terms of sorting and shuffling.

Are there any other way, I can increase the number of reducers while doing
bulk loads for new table?.

~Rajesh.B

Re: HBase ImportTSV

Posted by rajesh balamohan <rb...@gmail.com>.

Thanks a lot for the quick response. It worked like a charm.

On Thu, Sep 15, 2011 at 10:44 AM, Stack <st...@duboce.net> wrote:

> Do you know your keyspace roughly?  Try creating a pre-split table
> with as many regions as you want reducers.
> St.Ack
>
> On Wed, Sep 14, 2011 at 8:25 PM, rajesh balamohan
> <rb...@gmail.com> wrote:
> > ImportTSV internally uses HFileOutputFormat.configureIncrementalLoad(job,
> > table);
> >
> > However, for newly created tables there would not be any keys available.
> > Hence, it launches 1 reducer by default.
> >
> > Is there a way to increase the number of reducers for high volume imports
> > like 500+ GB.
> >
> > ~Rajesh.B
> >
> > On Thu, Sep 15, 2011 at 8:51 AM, rajesh balamohan <
> rbalamohan2k@gmail.com>wrote:
> >
> >> Hi All,
> >>
> >> ImportTSV is a great tool for bulk loading the data into HBASE.
> >>
> >> I have close to 500+GB of raw data which I would like to import into a
> >> newly created HTABLE. If I go ahead with ImportTSV, it creates only one
> >> reducer which is a bottleneck in terms of sorting and shuffling.
> >>
> >> Are there any other way, I can increase the number of reducers while
> doing
> >> bulk loads for new table?.
> >>
> >> ~Rajesh.B
> >>
> >
>

Re: HBase ImportTSV

Posted by Stack <st...@duboce.net>.

Do you know your keyspace roughly?  Try creating a pre-split table
with as many regions as you want reducers.
St.Ack

On Wed, Sep 14, 2011 at 8:25 PM, rajesh balamohan
<rb...@gmail.com> wrote:
> ImportTSV internally uses HFileOutputFormat.configureIncrementalLoad(job,
> table);
>
> However, for newly created tables there would not be any keys available.
> Hence, it launches 1 reducer by default.
>
> Is there a way to increase the number of reducers for high volume imports
> like 500+ GB.
>
> ~Rajesh.B
>
> On Thu, Sep 15, 2011 at 8:51 AM, rajesh balamohan <rb...@gmail.com>wrote:
>
>> Hi All,
>>
>> ImportTSV is a great tool for bulk loading the data into HBASE.
>>
>> I have close to 500+GB of raw data which I would like to import into a
>> newly created HTABLE. If I go ahead with ImportTSV, it creates only one
>> reducer which is a bottleneck in terms of sorting and shuffling.
>>
>> Are there any other way, I can increase the number of reducers while doing
>> bulk loads for new table?.
>>
>> ~Rajesh.B
>>
>

Re: HBase ImportTSV

Posted by rajesh balamohan <rb...@gmail.com>.

ImportTSV internally uses HFileOutputFormat.configureIncrementalLoad(job,
table);

However, for newly created tables there would not be any keys available.
Hence, it launches 1 reducer by default.

Is there a way to increase the number of reducers for high volume imports
like 500+ GB.

~Rajesh.B

On Thu, Sep 15, 2011 at 8:51 AM, rajesh balamohan <rb...@gmail.com>wrote:

> Hi All,
>
> ImportTSV is a great tool for bulk loading the data into HBASE.
>
> I have close to 500+GB of raw data which I would like to import into a
> newly created HTABLE. If I go ahead with ImportTSV, it creates only one
> reducer which is a bottleneck in terms of sorting and shuffling.
>
> Are there any other way, I can increase the number of reducers while doing
> bulk loads for new table?.
>
> ~Rajesh.B
>