You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Flavio Pompermaier <po...@okkam.it> on 2014/10/29 09:08:25 UTC

Region split during mapreduce

Hi to all,
I was reading
http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html?m=1
and they say " still using
org.apache.hadoop.hbase.mapreduce.TableInputFormat is a big problem, your
job will fail when one of HBase Region for target HBase table is splitting
! because the original region will be offline by splitting".

Is that true?
Is there a solution to that?

Best,
Flavio

Re: Region split during mapreduce

Posted by Flavio Pompermaier <po...@okkam.it>.

Ok thanks for the explanation!
On Nov 1, 2014 8:20 AM, "lars hofhansl" <la...@apache.org> wrote:

> I do not believe that to be true.
> HBase only uses Region boundaries to identify useful scan ranges during
> the setup of the job. These ranges will work regardless of whether the
> number of regions increases later or not. The worst case is that a single
> mapper might be scanning multiple regions (those that are the result of a
> split of the region it was supposed to scan).
> Regions are unavailable for a short time during a split, but the mappers
> are normal HBase clients and so they wait out the splits by retrying.
> -- Lars
>
>       From: Flavio Pompermaier <po...@okkam.it>
>  To: user@hbase.apache.org
>  Sent: Friday, October 31, 2014 10:23 AM
>  Subject: Re: Region split during mapreduce
>
> The problem is that I don't know if what they say at that link is true or
> not.
> In the past I experienced several problems running mapreduce jobs on a
> "live" Hbase table but I didn't know about the fact that mapreduce jobs
> crash if region were splitting..
> Do I have to create a snapshot if I want to use TableSnapshotInputFormat or
> it automatically handles the snapshot creation and deletion of a snapshot?
> Is there any detailed reference about how to deal with such event during
> mapreduce jobs?
>
> Thanks for the support,
> Flavio
>
>
>
> On Fri, Oct 31, 2014 at 6:12 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Flavio:
> > Have you considered using TableSnapshotInputFormat ?
> >
> > See TableMapReduceUtil#initTableSnapshotMapperJob()
> >
> > Cheers
> >
> > On Fri, Oct 31, 2014 at 10:01 AM, Flavio Pompermaier <
> pompermaier@okkam.it
> > >
> > wrote:
> >
> > > Is there anybody here..?
> > >
> > > On Thu, Oct 30, 2014 at 2:28 PM, Flavio Pompermaier <
> > pompermaier@okkam.it>
> > > wrote:
> > >
> > > > Any help about this..?
> > > >
> > > > On Wed, Oct 29, 2014 at 9:08 AM, Flavio Pompermaier <
> > > pompermaier@okkam.it>
> > > > wrote:
> > > >
> > > >> Hi to all,
> > > >> I was reading
> > > >>
> > >
> >
> http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html?m=1
> > > >> and they say " still using
> > > >> org.apache.hadoop.hbase.mapreduce.TableInputFormat is a big problem,
> > > your
> > > >> job will fail when one of HBase Region for target HBase table is
> > > splitting
> > > >> ! because the original region will be offline by splitting".
> > > >>
> > > >> Is that true?
> > > >> Is there a solution to that?
> > > >>
> > > >> Best,
> > > >> Flavio
> > > >>
> > > >
> > >
> >
>
>
>

Re: Region split during mapreduce

Posted by lars hofhansl <la...@apache.org>.

I do not believe that to be true.
HBase only uses Region boundaries to identify useful scan ranges during the setup of the job. These ranges will work regardless of whether the number of regions increases later or not. The worst case is that a single mapper might be scanning multiple regions (those that are the result of a split of the region it was supposed to scan).
Regions are unavailable for a short time during a split, but the mappers are normal HBase clients and so they wait out the splits by retrying.
-- Lars

      From: Flavio Pompermaier <po...@okkam.it>
 To: user@hbase.apache.org 
 Sent: Friday, October 31, 2014 10:23 AM
 Subject: Re: Region split during mapreduce
   
The problem is that I don't know if what they say at that link is true or
not.
In the past I experienced several problems running mapreduce jobs on a
"live" Hbase table but I didn't know about the fact that mapreduce jobs
crash if region were splitting..
Do I have to create a snapshot if I want to use TableSnapshotInputFormat or
it automatically handles the snapshot creation and deletion of a snapshot?
Is there any detailed reference about how to deal with such event during
mapreduce jobs?

Thanks for the support,
Flavio



On Fri, Oct 31, 2014 at 6:12 PM, Ted Yu <yu...@gmail.com> wrote:

> Flavio:
> Have you considered using TableSnapshotInputFormat ?
>
> See TableMapReduceUtil#initTableSnapshotMapperJob()
>
> Cheers
>
> On Fri, Oct 31, 2014 at 10:01 AM, Flavio Pompermaier <pompermaier@okkam.it
> >
> wrote:
>
> > Is there anybody here..?
> >
> > On Thu, Oct 30, 2014 at 2:28 PM, Flavio Pompermaier <
> pompermaier@okkam.it>
> > wrote:
> >
> > > Any help about this..?
> > >
> > > On Wed, Oct 29, 2014 at 9:08 AM, Flavio Pompermaier <
> > pompermaier@okkam.it>
> > > wrote:
> > >
> > >> Hi to all,
> > >> I was reading
> > >>
> >
> http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html?m=1
> > >> and they say " still using
> > >> org.apache.hadoop.hbase.mapreduce.TableInputFormat is a big problem,
> > your
> > >> job will fail when one of HBase Region for target HBase table is
> > splitting
> > >> ! because the original region will be offline by splitting".
> > >>
> > >> Is that true?
> > >> Is there a solution to that?
> > >>
> > >> Best,
> > >> Flavio
> > >>
> > >
> >
>

Re: Region split during mapreduce

Posted by Flavio Pompermaier <po...@okkam.it>.

The problem is that I don't know if what they say at that link is true or
not.
In the past I experienced several problems running mapreduce jobs on a
"live" Hbase table but I didn't know about the fact that mapreduce jobs
crash if region were splitting..
Do I have to create a snapshot if I want to use TableSnapshotInputFormat or
it automatically handles the snapshot creation and deletion of a snapshot?
Is there any detailed reference about how to deal with such event during
mapreduce jobs?

Thanks for the support,
Flavio

On Fri, Oct 31, 2014 at 6:12 PM, Ted Yu <yu...@gmail.com> wrote:

> Flavio:
> Have you considered using TableSnapshotInputFormat ?
>
> See TableMapReduceUtil#initTableSnapshotMapperJob()
>
> Cheers
>
> On Fri, Oct 31, 2014 at 10:01 AM, Flavio Pompermaier <pompermaier@okkam.it
> >
> wrote:
>
> > Is there anybody here..?
> >
> > On Thu, Oct 30, 2014 at 2:28 PM, Flavio Pompermaier <
> pompermaier@okkam.it>
> > wrote:
> >
> > > Any help about this..?
> > >
> > > On Wed, Oct 29, 2014 at 9:08 AM, Flavio Pompermaier <
> > pompermaier@okkam.it>
> > > wrote:
> > >
> > >> Hi to all,
> > >> I was reading
> > >>
> >
> http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html?m=1
> > >> and they say " still using
> > >> org.apache.hadoop.hbase.mapreduce.TableInputFormat is a big problem,
> > your
> > >> job will fail when one of HBase Region for target HBase table is
> > splitting
> > >> ! because the original region will be offline by splitting".
> > >>
> > >> Is that true?
> > >> Is there a solution to that?
> > >>
> > >> Best,
> > >> Flavio
> > >>
> > >
> >
>

Re: Region split during mapreduce

Posted by Ted Yu <yu...@gmail.com>.

Flavio:
Have you considered using TableSnapshotInputFormat ?

See TableMapReduceUtil#initTableSnapshotMapperJob()

Cheers

On Fri, Oct 31, 2014 at 10:01 AM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Is there anybody here..?
>
> On Thu, Oct 30, 2014 at 2:28 PM, Flavio Pompermaier <po...@okkam.it>
> wrote:
>
> > Any help about this..?
> >
> > On Wed, Oct 29, 2014 at 9:08 AM, Flavio Pompermaier <
> pompermaier@okkam.it>
> > wrote:
> >
> >> Hi to all,
> >> I was reading
> >>
> http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html?m=1
> >> and they say " still using
> >> org.apache.hadoop.hbase.mapreduce.TableInputFormat is a big problem,
> your
> >> job will fail when one of HBase Region for target HBase table is
> splitting
> >> ! because the original region will be offline by splitting".
> >>
> >> Is that true?
> >> Is there a solution to that?
> >>
> >> Best,
> >> Flavio
> >>
> >
>

Re: Region split during mapreduce

Posted by Flavio Pompermaier <po...@okkam.it>.

Is there anybody here..?

On Thu, Oct 30, 2014 at 2:28 PM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Any help about this..?
>
> On Wed, Oct 29, 2014 at 9:08 AM, Flavio Pompermaier <po...@okkam.it>
> wrote:
>
>> Hi to all,
>> I was reading
>> http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html?m=1
>> and they say " still using
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat is a big problem, your
>> job will fail when one of HBase Region for target HBase table is splitting
>> ! because the original region will be offline by splitting".
>>
>> Is that true?
>> Is there a solution to that?
>>
>> Best,
>> Flavio
>>
>

Re: Region split during mapreduce

Posted by Flavio Pompermaier <po...@okkam.it>.

Any help about this..?

On Wed, Oct 29, 2014 at 9:08 AM, Flavio Pompermaier <po...@okkam.it>
wrote:

> Hi to all,
> I was reading
> http://www.abcn.net/2014/07/spark-hbase-result-keyvalue-bytearray.html?m=1
> and they say " still using
> org.apache.hadoop.hbase.mapreduce.TableInputFormat is a big problem, your
> job will fail when one of HBase Region for target HBase table is splitting
> ! because the original region will be offline by splitting".
>
> Is that true?
> Is there a solution to that?
>
> Best,
> Flavio
>