You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Amit Sela <am...@infolinks.com> on 2013/11/19 14:40:18 UTC

Bulk load fails to identify pre-split regions

Hi all,
I'm using HBase 0.94.2 (and Hadoop 1.0.4).
I'm using bulk load on daily basis for over a year with no problem.
I recently moved to an OSGi client, and that required some changes.
One of tha changes I made is a fix to what seems like a bug that I
described in https://issues.apache.org/jira/browse/HBASE-9682
While running some tests I executed bulk load (with pre-splitting) a few
times and in one of the times it seems that bulk load didn't identify the
pre-split regions and loaded the HFiles into 2 new regions (instead of 19
pre-split). What's even worse is that it made a mess of lexicographical
order of start/end keys in those regions.

for example:
if pre-split reginos start/end keys were:
Start                 End
                          1
  1                      2
  2                      3
  3

It turned to:
Start                 End
                        new1
  1                      2
  new1
  2                      3
  3

So that even scanning over those regions is impossible.

I'm having hard time recreating this behavior so I'm not sure it's the fix
I did (also described in the Jira comments).

Any ideas ?

Thanks,

Amit

Re: Bulk load fails to identify pre-split regions

Posted by Amit Sela <am...@infolinks.com>.
So far no issues after running in production for over a week now.

I didn't commit any patches to the JIRA I opened because I have more
changes in Algorithm on my environment (the other change I have is another
bug fix that is related to GZ configuration and was fixed in later versions
- I'm running with this change for over a year in production with no
issues).
So if I will publish a patch based on "svn diff" it would have more changes
- however, I did describe my changes in the JIRA as a possible fix.

Since I didn't encounter this issue again, the only guess I have is that it
might be related to the fact that I executed bulk load after bulk load (5
or 6 times)...

Thanks.



On Mon, Nov 25, 2013 at 11:42 PM, Ted Yu <yu...@gmail.com> wrote:

> Amit:
> bq. One of tha changes I made is a fix to what seems like a bug
>
> I don't see an attachment to HBASE-9682 so cannot tell whether the change
> was related to what you described.
>
> Have you encountered the problem since last week ?
>
> Cheers
>
>
> On Tue, Nov 19, 2013 at 9:40 PM, Amit Sela <am...@infolinks.com> wrote:
>
> > Hi all,
> > I'm using HBase 0.94.2 (and Hadoop 1.0.4).
> > I'm using bulk load on daily basis for over a year with no problem.
> > I recently moved to an OSGi client, and that required some changes.
> > One of tha changes I made is a fix to what seems like a bug that I
> > described in https://issues.apache.org/jira/browse/HBASE-9682
> > While running some tests I executed bulk load (with pre-splitting) a few
> > times and in one of the times it seems that bulk load didn't identify the
> > pre-split regions and loaded the HFiles into 2 new regions (instead of 19
> > pre-split). What's even worse is that it made a mess of lexicographical
> > order of start/end keys in those regions.
> >
> > for example:
> > if pre-split reginos start/end keys were:
> > Start                 End
> >                           1
> >   1                      2
> >   2                      3
> >   3
> >
> > It turned to:
> > Start                 End
> >                         new1
> >   1                      2
> >   new1
> >   2                      3
> >   3
> >
> > So that even scanning over those regions is impossible.
> >
> > I'm having hard time recreating this behavior so I'm not sure it's the
> fix
> > I did (also described in the Jira comments).
> >
> > Any ideas ?
> >
> > Thanks,
> >
> > Amit
> >
>

Re: Bulk load fails to identify pre-split regions

Posted by Ted Yu <yu...@gmail.com>.
Amit:
bq. One of tha changes I made is a fix to what seems like a bug

I don't see an attachment to HBASE-9682 so cannot tell whether the change
was related to what you described.

Have you encountered the problem since last week ?

Cheers


On Tue, Nov 19, 2013 at 9:40 PM, Amit Sela <am...@infolinks.com> wrote:

> Hi all,
> I'm using HBase 0.94.2 (and Hadoop 1.0.4).
> I'm using bulk load on daily basis for over a year with no problem.
> I recently moved to an OSGi client, and that required some changes.
> One of tha changes I made is a fix to what seems like a bug that I
> described in https://issues.apache.org/jira/browse/HBASE-9682
> While running some tests I executed bulk load (with pre-splitting) a few
> times and in one of the times it seems that bulk load didn't identify the
> pre-split regions and loaded the HFiles into 2 new regions (instead of 19
> pre-split). What's even worse is that it made a mess of lexicographical
> order of start/end keys in those regions.
>
> for example:
> if pre-split reginos start/end keys were:
> Start                 End
>                           1
>   1                      2
>   2                      3
>   3
>
> It turned to:
> Start                 End
>                         new1
>   1                      2
>   new1
>   2                      3
>   3
>
> So that even scanning over those regions is impossible.
>
> I'm having hard time recreating this behavior so I'm not sure it's the fix
> I did (also described in the Jira comments).
>
> Any ideas ?
>
> Thanks,
>
> Amit
>