You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "Taylor, Ronald C" <ro...@pnl.gov> on 2009/04/02 23:37:47 UTC

RE: Bulk import - is the error general to both MapReduce and non-MapReduce programs?

Hello,

I have been following this thread, and got a question. I am new to Hbase coding, and I have within the past few days written a standalone (not MapReduce based) Java program to do a bulk upload into one Hbase table. I believe that I got the same error that you folks have been talking about. The program works fine on small uploads, fails with the error msg you mention when moving to import of ten of thousands of rows. So - I wanted to ask: has this import error been reported for only MapReduce-based programs, or is it indeed more general (which I could then assume may be something that affects by current import program, and I should try using the doCommit() code shown below as a fix)?
  Cheers,
  Ron Taylor
___________________________________________
Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory
902 Battelle Boulevard
P.O. Box 999, MSIN K7-90
Richland, WA  99352 USA
Office:  509-372-6568
Email: ronald.taylor@pnl.gov
www.pnl.gov

-----Original Message-----
From: Stuart White [mailto:stuart.white1@gmail.com] 
Sent: Thursday, April 02, 2009 1:37 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Bulk import - does sort order of input data affect success rate?

On Thu, Apr 2, 2009 at 3:30 PM, Ryan Rawson <ry...@gmail.com> wrote:
> The last thing - success should not be a function of sort order.
>
> However, speed will be related.

How?  Sorted = faster, or Sorted = slower?

>
> One thing I found I had to do was:
>    private void doCommit(HTable t, BatchUpdate update) throws 
> IOException {
>      boolean commited = false;
>      while (!commited) {
>        try {
>          t.commit(update);
>          commited = true;
>        } catch (RetriesExhaustedException e) {
>          // DAMN, ignore
>        }
>      }
>    }
>

I'm running a mapred job, using TableOutputFormat to write the results to HBase.  For the code you've provided, was that for a custom output format?  Or a standalone (non-mapred) application?  I see the point you're making, I just don't understand where I'd put that code.
Thanks!

Re: Bulk import - is the error general to both MapReduce and non-MapReduce programs?

Posted by Stuart White <st...@gmail.com>.

To my understanding, the problem I am facing is not specific to
mapreduce.  So, I would expect that Ryan's code is equally applicable
to your case.

On Thu, Apr 2, 2009 at 4:37 PM, Taylor, Ronald C <ro...@pnl.gov> wrote:
>
> Hello,
>
> I have been following this thread, and got a question. I am new to Hbase coding, and I have within the past few days written a standalone (not MapReduce based) Java program to do a bulk upload into one Hbase table. I believe that I got the same error that you folks have been talking about. The program works fine on small uploads, fails with the error msg you mention when moving to import of ten of thousands of rows. So - I wanted to ask: has this import error been reported for only MapReduce-based programs, or is it indeed more general (which I could then assume may be something that affects by current import program, and I should try using the doCommit() code shown below as a fix)?
>  Cheers,
>  Ron Taylor
> ___________________________________________
> Ronald Taylor, Ph.D.
> Computational Biology & Bioinformatics Group
> Pacific Northwest National Laboratory
> 902 Battelle Boulevard
> P.O. Box 999, MSIN K7-90
> Richland, WA  99352 USA
> Office:  509-372-6568
> Email: ronald.taylor@pnl.gov
> www.pnl.gov
>
> -----Original Message-----
> From: Stuart White [mailto:stuart.white1@gmail.com]
> Sent: Thursday, April 02, 2009 1:37 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Bulk import - does sort order of input data affect success rate?
>
> On Thu, Apr 2, 2009 at 3:30 PM, Ryan Rawson <ry...@gmail.com> wrote:
>> The last thing - success should not be a function of sort order.
>>
>> However, speed will be related.
>
> How?  Sorted = faster, or Sorted = slower?
>
>>
>> One thing I found I had to do was:
>>    private void doCommit(HTable t, BatchUpdate update) throws
>> IOException {
>>      boolean commited = false;
>>      while (!commited) {
>>        try {
>>          t.commit(update);
>>          commited = true;
>>        } catch (RetriesExhaustedException e) {
>>          // DAMN, ignore
>>        }
>>      }
>>    }
>>
>
> I'm running a mapred job, using TableOutputFormat to write the results to HBase.  For the code you've provided, was that for a custom output format?  Or a standalone (non-mapred) application?  I see the point you're making, I just don't understand where I'd put that code.
> Thanks!
>