You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Amit Sela <am...@infolinks.com> on 2013/12/16 00:09:04 UTC

Bulk load moving HFiles to the wrong region

Hi all,
I'm using Hadoop 1.0.4 and HBase 0.94.12.
When trying to bulk load using the Java API I sometimes get the HFiles
moved to the wrong directory.
I'm pre-splitting regions and the new regions are always the last
(lexicographically), so when this happens all files move to the last region
pre-split. But the split does work. I see the new regions in the webapp
before bulk load executes. Once a table has this problem (not all the time)
it keeps on until I restart HBase.

Anyone seen something similar ?

Thanks,
Amit.

RE: Bulk load moving HFiles to the wrong region

Posted by Bijieshan <bi...@huawei.com>.
>Where does the RegionServer save the partitions file written to DistributedCache ?

There's no need for RegionServer to save the partitions file.

It seems you added the new regions directly into META, and didn't change the endkey of the last previous region?

Jieshan.
-----Original Message-----
From: Amit Sela [mailto:amits@infolinks.com] 
Sent: Tuesday, December 17, 2013 7:47 PM
To: user@hbase.apache.org
Subject: Re: Bulk load moving HFiles to the wrong region

Indeed there are more than 2 split points, there are 4 split points for 5 new regions added each day.
the new data bulk loaded each day belongs to he new regions.
It seems like the partitions read are from the previous insertion, and if that is the case, the comparator will surely indicate that the data loaded belongs in the previous (pre split)  last region. Where does the RegionServer save the partitions file written to DistributedCache ?


On Tue, Dec 17, 2013 at 1:18 PM, Bijieshan <bi...@huawei.com> wrote:

> >> >>>> The previous last region is not supposed to delete I'm just 
> >> >>>> adding new regions (always following lexicographically) so 
> >> >>>> that the last region before the pre-split is not the last anymore.
>
> You mean you added the new regions into META? Sorry if I misunderstood 
> you here. But can you tell me how did you run the split for each new 
> day? It seems there may have more than 2 split points.
>
> Thanks.
> Jieshan
>
> -----Original Message-----
> From: Amit Sela [mailto:amits@infolinks.com]
> Sent: Tuesday, December 17, 2013 6:10 PM
> To: user@hbase.apache.org
> Subject: Re: Bulk load moving HFiles to the wrong region
>
> Region server logs in region servers that were supposed to get the 
> loaded data show that they get request to open the (correct) region, 
> and they open it.
> But only in the region server where the data is actually loaded in to 
> have the move in the log, for all file..
> The log actually shows it copies to the wrong directory.
> Could it have something to do with the fact that the same RegionServer 
> that hosts the "wrong" region also hosts some of the regions loading ?
>
>
> On Tue, Dec 17, 2013 at 11:39 AM, Amit Sela <am...@infolinks.com> wrote:
>
> > Like I mentioned before, running with all reducers works fine. 
> > Running with the extension of HFileOutputFormat fails, sometimes, on some tables.
> > .META. encoded qualifier points to different directories for the 
> > different regions files are supposedly loaded into. The directories 
> > actually do exist, and they contain all relevant family directories, 
> > but the directories are EMPTY.
> > Instead, the files that should have been in all different 
> > directories are moved to the corresponding family directories under 
> > directory pointed by .META. encoded qualifier of the last region 
> > before the split (which is where it would fit if non pre-splitting occurred).
> >
> >
> > On Tue, Dec 17, 2013 at 4:48 AM, Bijieshan <bi...@huawei.com> wrote:
> >
> >> >>>> In the first step, the files are read correctly and 
> >> >>>> regionGroups is creates as it should.
> >> Did you notice the reducer numbers? Did it equal to 2000(Before 
> >> your extended HFileOutputFormat)?
> >>
> >> >>> RegionServer logs in the RegionServer that the files are moved 
> >> >>> to indeed shows that all files are moved to that region (when 
> >> >>> it doesn't happen it shows only 1 file per family moved to a
> >> >>> RegionServer)
> >>
> >> How about the region-split related logs?
> >>
> >> > Loaded regions are listed in .META. table and the ENCODED field 
> >> > in the table points to an existing directory. But all family 
> >> > directories in this region are empty...
> >>
> >> Was the previous old region still in .META.?
> >>
> >> > I implemented an extension of HFileOutputFormat - because each 
> >> > bulk
> >> load will import data to the newly created regions only, I pass the 
> >> prefix
> >> > (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so 
> >> > that
> >> getRegionStartKeys returns only the corresponding keys.
> >> >I did this in order to avoid having 2000 reducers when my target 
> >> >is
> >> >15
> >> regions...
> >>
> >> We always do like this:). Only configure the necessary regions.
> >>
> >> Sorry for the lately reply.
> >>
> >> Jieshan
> >> -----Original Message-----
> >> From: Amit Sela [mailto:amits@infolinks.com]
> >> Sent: Tuesday, December 17, 2013 12:19 AM
> >> To: user@hbase.apache.org
> >> Subject: Re: Bulk load moving HFiles to the wrong region
> >>
> >> I've managed to isolate the problem.
> >> I implemented an extension of HFileOutputFormat - because each bulk 
> >> load will import data to the newly created regions only, I pass the 
> >> prefix
> >> (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so 
> >> that getRegionStartKeys returns only the corresponding keys.
> >> I did this in order to avoid having 2000 reducers when my target is
> >> 15 regions...
> >>
> >> When I use HFileOutputFormat  it seems to work. But I don't 
> >> understand why it doesn't happen in other tables (some smaller and 
> >> some much much
> >> bigger) or even in that table it happens every once in a while ?
> >>
> >> Any ideas ?
> >>
> >>
> >>
> >> On Mon, Dec 16, 2013 at 4:37 PM, Amit Sela <am...@infolinks.com> wrote:
> >>
> >> > Loaded regions are listed in .META. table and the ENCODED field 
> >> > in the table points to an existing directory. But all family 
> >> > directories in this region are empty...
> >> >
> >> >
> >> > On Mon, Dec 16, 2013 at 4:29 PM, Amit Sela <am...@infolinks.com>
> wrote:
> >> >
> >> >> I ran the hbck tool, and while I do have some inconsistencies 
> >> >> they are not in the table that has the bulk load issues.
> >> >>
> >> >>
> >> >>
> >> >> On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <am...@infolinks.com>
> >> wrote:
> >> >>
> >> >>> RegionServer logs in the RegionServer that the files are moved 
> >> >>> to indeed shows that all files are moved to that region (when 
> >> >>> it doesn't happen it shows only 1 file per family moved to a
> >> >>> RegionServer)
> >> >>>
> >> >>>
> >> >>> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela 
> >> >>> <am...@infolinks.com>
> >> wrote:
> >> >>>
> >> >>>> In the first step, the files are read correctly and 
> >> >>>> regionGroups is creates as it should.
> >> >>>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() 
> >> >>>> I notice that ServerCallable's regionName returned from server 
> >> >>>> is the wrong region (the pre-split last region).
> >> >>>> The previous last region is not supposed to delete I'm just 
> >> >>>> adding new regions (always following lexicographically) so 
> >> >>>> that the last region before the pre-split is not the last anymore.
> >> >>>> It seems that wherever the ServerCallable is running, it is 
> >> >>>> not updated with the new regions... I tried major compacting 
> >> >>>> (the new
> >> >>>> regions) after pre-split and before the bulkload, but that 
> >> >>>> didn't
> >> help.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan 
> >> >>>> <bijieshan@huawei.com
> >> >wrote:
> >> >>>>
> >> >>>>> As we know, bulk load has two steps:
> >> >>>>> 1. Create HFiles by MapReduce.
> >> >>>>> 2. Load HFiles into HBase.
> >> >>>>>
> >> >>>>> I wonder whether it read the right partitions information 
> >> >>>>> during the first step. Have you run hbck tool to check the
> cluster healthy?
> >> >>>>> You mentioned you see the new regions in the webapp. The 
> >> >>>>> files were moved to the previous old region indicated the old 
> >> >>>>> region directory was still there. So you started bulk load 
> >> >>>>> just after region split? (Old region directory will be 
> >> >>>>> deleted soon by CatalogJanitor after region-split once 
> >> >>>>> compaction finished)
> >> >>>>>
> >> >>>>> I suggest to check the regionserver logs.
> >> >>>>>
> >> >>>>> Jieshan.
> >> >>>>> -----Original Message-----
> >> >>>>> From: Amit Sela [mailto:amits@infolinks.com]
> >> >>>>> Sent: Monday, December 16, 2013 2:29 PM
> >> >>>>> To: user@hbase.apache.org
> >> >>>>> Subject: RE: Bulk load moving HFiles to the wrong region
> >> >>>>>
> >> >>>>> Every split executed is a new day. The row key design is
> >> yyyyMMdd_URL.
> >> >>>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way 
> >> >>>>> that the entire load is (almost) evenly spread.
> >> >>>>> The problem I described causes the bulk load to load all 
> >> >>>>> files to to the last region of the previous day.
> >> >>>>> Thanks.
> >> >>>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com>
> wrote:
> >> >>>>>
> >> >>>>> > Hi Amit:
> >> >>>>> > Can you provide the split-keys of the new regions and your 
> >> >>>>> > row-key
> >> >>>>> design?
> >> >>>>> >
> >> >>>>> > Thank you.
> >> >>>>> > Jieshan.
> >> >>>>> > -----Original Message-----
> >> >>>>> > From: Amit Sela [mailto:amits@infolinks.com]
> >> >>>>> > Sent: Monday, December 16, 2013 7:09 AM
> >> >>>>> > To: user@hbase.apache.org
> >> >>>>> > Subject: Bulk load moving HFiles to the wrong region
> >> >>>>> >
> >> >>>>> > Hi all,
> >> >>>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
> >> >>>>> > When trying to bulk load using the Java API I sometimes get 
> >> >>>>> > the
> >> >>>>> HFiles
> >> >>>>> > moved to the wrong directory.
> >> >>>>> > I'm pre-splitting regions and the new regions are always 
> >> >>>>> > the last (lexicographically), so when this happens all 
> >> >>>>> > files move to the last region pre-split. But the split does 
> >> >>>>> > work. I see the new regions in the webapp before bulk load 
> >> >>>>> > executes. Once a table has this problem (not all the time) 
> >> >>>>> > it keeps on until I
> >> restart HBase.
> >> >>>>> >
> >> >>>>> > Anyone seen something similar ?
> >> >>>>> >
> >> >>>>> > Thanks,
> >> >>>>> > Amit.
> >> >>>>> >
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >>
> >
> >
>

Re: Bulk load moving HFiles to the wrong region

Posted by Amit Sela <am...@infolinks.com>.
Indeed there are more than 2 split points, there are 4 split points for 5
new regions added each day.
the new data bulk loaded each day belongs to he new regions.
It seems like the partitions read are from the previous insertion, and if
that is the case, the comparator will surely indicate that the data loaded
belongs in the previous (pre split)  last region. Where does the
RegionServer save the partitions file written to DistributedCache ?


On Tue, Dec 17, 2013 at 1:18 PM, Bijieshan <bi...@huawei.com> wrote:

> >> >>>> The previous last region is not supposed to delete I'm just
> >> >>>> adding new regions (always following lexicographically) so that
> >> >>>> the last region before the pre-split is not the last anymore.
>
> You mean you added the new regions into META? Sorry if I misunderstood you
> here. But can you tell me how did you run the split for each new day? It
> seems there may have more than 2 split points.
>
> Thanks.
> Jieshan
>
> -----Original Message-----
> From: Amit Sela [mailto:amits@infolinks.com]
> Sent: Tuesday, December 17, 2013 6:10 PM
> To: user@hbase.apache.org
> Subject: Re: Bulk load moving HFiles to the wrong region
>
> Region server logs in region servers that were supposed to get the loaded
> data show that they get request to open the (correct) region, and they open
> it.
> But only in the region server where the data is actually loaded in to have
> the move in the log, for all file..
> The log actually shows it copies to the wrong directory.
> Could it have something to do with the fact that the same RegionServer
> that hosts the "wrong" region also hosts some of the regions loading ?
>
>
> On Tue, Dec 17, 2013 at 11:39 AM, Amit Sela <am...@infolinks.com> wrote:
>
> > Like I mentioned before, running with all reducers works fine. Running
> > with the extension of HFileOutputFormat fails, sometimes, on some tables.
> > .META. encoded qualifier points to different directories for the
> > different regions files are supposedly loaded into. The directories
> > actually do exist, and they contain all relevant family directories,
> > but the directories are EMPTY.
> > Instead, the files that should have been in all different directories
> > are moved to the corresponding family directories under directory
> > pointed by .META. encoded qualifier of the last region before the
> > split (which is where it would fit if non pre-splitting occurred).
> >
> >
> > On Tue, Dec 17, 2013 at 4:48 AM, Bijieshan <bi...@huawei.com> wrote:
> >
> >> >>>> In the first step, the files are read correctly and regionGroups
> >> >>>> is creates as it should.
> >> Did you notice the reducer numbers? Did it equal to 2000(Before your
> >> extended HFileOutputFormat)?
> >>
> >> >>> RegionServer logs in the RegionServer that the files are moved to
> >> >>> indeed shows that all files are moved to that region (when it
> >> >>> doesn't happen it shows only 1 file per family moved to a
> >> >>> RegionServer)
> >>
> >> How about the region-split related logs?
> >>
> >> > Loaded regions are listed in .META. table and the ENCODED field in
> >> > the table points to an existing directory. But all family
> >> > directories in this region are empty...
> >>
> >> Was the previous old region still in .META.?
> >>
> >> > I implemented an extension of HFileOutputFormat - because each bulk
> >> load will import data to the newly created regions only, I pass the
> >> prefix
> >> > (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so
> >> > that
> >> getRegionStartKeys returns only the corresponding keys.
> >> >I did this in order to avoid having 2000 reducers when my target is
> >> >15
> >> regions...
> >>
> >> We always do like this:). Only configure the necessary regions.
> >>
> >> Sorry for the lately reply.
> >>
> >> Jieshan
> >> -----Original Message-----
> >> From: Amit Sela [mailto:amits@infolinks.com]
> >> Sent: Tuesday, December 17, 2013 12:19 AM
> >> To: user@hbase.apache.org
> >> Subject: Re: Bulk load moving HFiles to the wrong region
> >>
> >> I've managed to isolate the problem.
> >> I implemented an extension of HFileOutputFormat - because each bulk
> >> load will import data to the newly created regions only, I pass the
> >> prefix
> >> (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so that
> >> getRegionStartKeys returns only the corresponding keys.
> >> I did this in order to avoid having 2000 reducers when my target is
> >> 15 regions...
> >>
> >> When I use HFileOutputFormat  it seems to work. But I don't
> >> understand why it doesn't happen in other tables (some smaller and
> >> some much much
> >> bigger) or even in that table it happens every once in a while ?
> >>
> >> Any ideas ?
> >>
> >>
> >>
> >> On Mon, Dec 16, 2013 at 4:37 PM, Amit Sela <am...@infolinks.com> wrote:
> >>
> >> > Loaded regions are listed in .META. table and the ENCODED field in
> >> > the table points to an existing directory. But all family
> >> > directories in this region are empty...
> >> >
> >> >
> >> > On Mon, Dec 16, 2013 at 4:29 PM, Amit Sela <am...@infolinks.com>
> wrote:
> >> >
> >> >> I ran the hbck tool, and while I do have some inconsistencies they
> >> >> are not in the table that has the bulk load issues.
> >> >>
> >> >>
> >> >>
> >> >> On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <am...@infolinks.com>
> >> wrote:
> >> >>
> >> >>> RegionServer logs in the RegionServer that the files are moved to
> >> >>> indeed shows that all files are moved to that region (when it
> >> >>> doesn't happen it shows only 1 file per family moved to a
> >> >>> RegionServer)
> >> >>>
> >> >>>
> >> >>> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <am...@infolinks.com>
> >> wrote:
> >> >>>
> >> >>>> In the first step, the files are read correctly and regionGroups
> >> >>>> is creates as it should.
> >> >>>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I
> >> >>>> notice that ServerCallable's regionName returned from server is
> >> >>>> the wrong region (the pre-split last region).
> >> >>>> The previous last region is not supposed to delete I'm just
> >> >>>> adding new regions (always following lexicographically) so that
> >> >>>> the last region before the pre-split is not the last anymore.
> >> >>>> It seems that wherever the ServerCallable is running, it is not
> >> >>>> updated with the new regions... I tried major compacting (the
> >> >>>> new
> >> >>>> regions) after pre-split and before the bulkload, but that
> >> >>>> didn't
> >> help.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bijieshan@huawei.com
> >> >wrote:
> >> >>>>
> >> >>>>> As we know, bulk load has two steps:
> >> >>>>> 1. Create HFiles by MapReduce.
> >> >>>>> 2. Load HFiles into HBase.
> >> >>>>>
> >> >>>>> I wonder whether it read the right partitions information
> >> >>>>> during the first step. Have you run hbck tool to check the
> cluster healthy?
> >> >>>>> You mentioned you see the new regions in the webapp. The files
> >> >>>>> were moved to the previous old region indicated the old region
> >> >>>>> directory was still there. So you started bulk load just after
> >> >>>>> region split? (Old region directory will be deleted soon by
> >> >>>>> CatalogJanitor after region-split once compaction finished)
> >> >>>>>
> >> >>>>> I suggest to check the regionserver logs.
> >> >>>>>
> >> >>>>> Jieshan.
> >> >>>>> -----Original Message-----
> >> >>>>> From: Amit Sela [mailto:amits@infolinks.com]
> >> >>>>> Sent: Monday, December 16, 2013 2:29 PM
> >> >>>>> To: user@hbase.apache.org
> >> >>>>> Subject: RE: Bulk load moving HFiles to the wrong region
> >> >>>>>
> >> >>>>> Every split executed is a new day. The row key design is
> >> yyyyMMdd_URL.
> >> >>>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way
> >> >>>>> that the entire load is (almost) evenly spread.
> >> >>>>> The problem I described causes the bulk load to load all files
> >> >>>>> to to the last region of the previous day.
> >> >>>>> Thanks.
> >> >>>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com>
> wrote:
> >> >>>>>
> >> >>>>> > Hi Amit:
> >> >>>>> > Can you provide the split-keys of the new regions and your
> >> >>>>> > row-key
> >> >>>>> design?
> >> >>>>> >
> >> >>>>> > Thank you.
> >> >>>>> > Jieshan.
> >> >>>>> > -----Original Message-----
> >> >>>>> > From: Amit Sela [mailto:amits@infolinks.com]
> >> >>>>> > Sent: Monday, December 16, 2013 7:09 AM
> >> >>>>> > To: user@hbase.apache.org
> >> >>>>> > Subject: Bulk load moving HFiles to the wrong region
> >> >>>>> >
> >> >>>>> > Hi all,
> >> >>>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
> >> >>>>> > When trying to bulk load using the Java API I sometimes get
> >> >>>>> > the
> >> >>>>> HFiles
> >> >>>>> > moved to the wrong directory.
> >> >>>>> > I'm pre-splitting regions and the new regions are always the
> >> >>>>> > last (lexicographically), so when this happens all files move
> >> >>>>> > to the last region pre-split. But the split does work. I see
> >> >>>>> > the new regions in the webapp before bulk load executes. Once
> >> >>>>> > a table has this problem (not all the time) it keeps on until
> >> >>>>> > I
> >> restart HBase.
> >> >>>>> >
> >> >>>>> > Anyone seen something similar ?
> >> >>>>> >
> >> >>>>> > Thanks,
> >> >>>>> > Amit.
> >> >>>>> >
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >>
> >
> >
>

RE: Bulk load moving HFiles to the wrong region

Posted by Bijieshan <bi...@huawei.com>.
>> >>>> The previous last region is not supposed to delete I'm just 
>> >>>> adding new regions (always following lexicographically) so that 
>> >>>> the last region before the pre-split is not the last anymore.

You mean you added the new regions into META? Sorry if I misunderstood you here. But can you tell me how did you run the split for each new day? It seems there may have more than 2 split points.

Thanks.
Jieshan

-----Original Message-----
From: Amit Sela [mailto:amits@infolinks.com] 
Sent: Tuesday, December 17, 2013 6:10 PM
To: user@hbase.apache.org
Subject: Re: Bulk load moving HFiles to the wrong region

Region server logs in region servers that were supposed to get the loaded data show that they get request to open the (correct) region, and they open it.
But only in the region server where the data is actually loaded in to have the move in the log, for all file..
The log actually shows it copies to the wrong directory.
Could it have something to do with the fact that the same RegionServer that hosts the "wrong" region also hosts some of the regions loading ?


On Tue, Dec 17, 2013 at 11:39 AM, Amit Sela <am...@infolinks.com> wrote:

> Like I mentioned before, running with all reducers works fine. Running 
> with the extension of HFileOutputFormat fails, sometimes, on some tables.
> .META. encoded qualifier points to different directories for the 
> different regions files are supposedly loaded into. The directories 
> actually do exist, and they contain all relevant family directories, 
> but the directories are EMPTY.
> Instead, the files that should have been in all different directories 
> are moved to the corresponding family directories under directory 
> pointed by .META. encoded qualifier of the last region before the 
> split (which is where it would fit if non pre-splitting occurred).
>
>
> On Tue, Dec 17, 2013 at 4:48 AM, Bijieshan <bi...@huawei.com> wrote:
>
>> >>>> In the first step, the files are read correctly and regionGroups 
>> >>>> is creates as it should.
>> Did you notice the reducer numbers? Did it equal to 2000(Before your 
>> extended HFileOutputFormat)?
>>
>> >>> RegionServer logs in the RegionServer that the files are moved to 
>> >>> indeed shows that all files are moved to that region (when it 
>> >>> doesn't happen it shows only 1 file per family moved to a
>> >>> RegionServer)
>>
>> How about the region-split related logs?
>>
>> > Loaded regions are listed in .META. table and the ENCODED field in 
>> > the table points to an existing directory. But all family 
>> > directories in this region are empty...
>>
>> Was the previous old region still in .META.?
>>
>> > I implemented an extension of HFileOutputFormat - because each bulk
>> load will import data to the newly created regions only, I pass the 
>> prefix
>> > (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so 
>> > that
>> getRegionStartKeys returns only the corresponding keys.
>> >I did this in order to avoid having 2000 reducers when my target is 
>> >15
>> regions...
>>
>> We always do like this:). Only configure the necessary regions.
>>
>> Sorry for the lately reply.
>>
>> Jieshan
>> -----Original Message-----
>> From: Amit Sela [mailto:amits@infolinks.com]
>> Sent: Tuesday, December 17, 2013 12:19 AM
>> To: user@hbase.apache.org
>> Subject: Re: Bulk load moving HFiles to the wrong region
>>
>> I've managed to isolate the problem.
>> I implemented an extension of HFileOutputFormat - because each bulk 
>> load will import data to the newly created regions only, I pass the 
>> prefix
>> (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so that 
>> getRegionStartKeys returns only the corresponding keys.
>> I did this in order to avoid having 2000 reducers when my target is 
>> 15 regions...
>>
>> When I use HFileOutputFormat  it seems to work. But I don't 
>> understand why it doesn't happen in other tables (some smaller and 
>> some much much
>> bigger) or even in that table it happens every once in a while ?
>>
>> Any ideas ?
>>
>>
>>
>> On Mon, Dec 16, 2013 at 4:37 PM, Amit Sela <am...@infolinks.com> wrote:
>>
>> > Loaded regions are listed in .META. table and the ENCODED field in 
>> > the table points to an existing directory. But all family 
>> > directories in this region are empty...
>> >
>> >
>> > On Mon, Dec 16, 2013 at 4:29 PM, Amit Sela <am...@infolinks.com> wrote:
>> >
>> >> I ran the hbck tool, and while I do have some inconsistencies they 
>> >> are not in the table that has the bulk load issues.
>> >>
>> >>
>> >>
>> >> On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <am...@infolinks.com>
>> wrote:
>> >>
>> >>> RegionServer logs in the RegionServer that the files are moved to 
>> >>> indeed shows that all files are moved to that region (when it 
>> >>> doesn't happen it shows only 1 file per family moved to a
>> >>> RegionServer)
>> >>>
>> >>>
>> >>> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <am...@infolinks.com>
>> wrote:
>> >>>
>> >>>> In the first step, the files are read correctly and regionGroups 
>> >>>> is creates as it should.
>> >>>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I 
>> >>>> notice that ServerCallable's regionName returned from server is 
>> >>>> the wrong region (the pre-split last region).
>> >>>> The previous last region is not supposed to delete I'm just 
>> >>>> adding new regions (always following lexicographically) so that 
>> >>>> the last region before the pre-split is not the last anymore.
>> >>>> It seems that wherever the ServerCallable is running, it is not 
>> >>>> updated with the new regions... I tried major compacting (the 
>> >>>> new
>> >>>> regions) after pre-split and before the bulkload, but that 
>> >>>> didn't
>> help.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bijieshan@huawei.com
>> >wrote:
>> >>>>
>> >>>>> As we know, bulk load has two steps:
>> >>>>> 1. Create HFiles by MapReduce.
>> >>>>> 2. Load HFiles into HBase.
>> >>>>>
>> >>>>> I wonder whether it read the right partitions information 
>> >>>>> during the first step. Have you run hbck tool to check the cluster healthy?
>> >>>>> You mentioned you see the new regions in the webapp. The files 
>> >>>>> were moved to the previous old region indicated the old region 
>> >>>>> directory was still there. So you started bulk load just after 
>> >>>>> region split? (Old region directory will be deleted soon by 
>> >>>>> CatalogJanitor after region-split once compaction finished)
>> >>>>>
>> >>>>> I suggest to check the regionserver logs.
>> >>>>>
>> >>>>> Jieshan.
>> >>>>> -----Original Message-----
>> >>>>> From: Amit Sela [mailto:amits@infolinks.com]
>> >>>>> Sent: Monday, December 16, 2013 2:29 PM
>> >>>>> To: user@hbase.apache.org
>> >>>>> Subject: RE: Bulk load moving HFiles to the wrong region
>> >>>>>
>> >>>>> Every split executed is a new day. The row key design is
>> yyyyMMdd_URL.
>> >>>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way 
>> >>>>> that the entire load is (almost) evenly spread.
>> >>>>> The problem I described causes the bulk load to load all files 
>> >>>>> to to the last region of the previous day.
>> >>>>> Thanks.
>> >>>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com> wrote:
>> >>>>>
>> >>>>> > Hi Amit:
>> >>>>> > Can you provide the split-keys of the new regions and your 
>> >>>>> > row-key
>> >>>>> design?
>> >>>>> >
>> >>>>> > Thank you.
>> >>>>> > Jieshan.
>> >>>>> > -----Original Message-----
>> >>>>> > From: Amit Sela [mailto:amits@infolinks.com]
>> >>>>> > Sent: Monday, December 16, 2013 7:09 AM
>> >>>>> > To: user@hbase.apache.org
>> >>>>> > Subject: Bulk load moving HFiles to the wrong region
>> >>>>> >
>> >>>>> > Hi all,
>> >>>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
>> >>>>> > When trying to bulk load using the Java API I sometimes get 
>> >>>>> > the
>> >>>>> HFiles
>> >>>>> > moved to the wrong directory.
>> >>>>> > I'm pre-splitting regions and the new regions are always the 
>> >>>>> > last (lexicographically), so when this happens all files move 
>> >>>>> > to the last region pre-split. But the split does work. I see 
>> >>>>> > the new regions in the webapp before bulk load executes. Once 
>> >>>>> > a table has this problem (not all the time) it keeps on until 
>> >>>>> > I
>> restart HBase.
>> >>>>> >
>> >>>>> > Anyone seen something similar ?
>> >>>>> >
>> >>>>> > Thanks,
>> >>>>> > Amit.
>> >>>>> >
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>
>

Re: Bulk load moving HFiles to the wrong region

Posted by Amit Sela <am...@infolinks.com>.
Region server logs in region servers that were supposed to get the loaded
data show that they get request to open the (correct) region, and they open
it.
But only in the region server where the data is actually loaded in to have
the move in the log, for all file..
The log actually shows it copies to the wrong directory.
Could it have something to do with the fact that the same RegionServer that
hosts the "wrong" region also hosts some of the regions loading ?


On Tue, Dec 17, 2013 at 11:39 AM, Amit Sela <am...@infolinks.com> wrote:

> Like I mentioned before, running with all reducers works fine. Running
> with the extension of HFileOutputFormat fails, sometimes, on some tables.
> .META. encoded qualifier points to different directories for the different
> regions files are supposedly loaded into. The directories actually do
> exist, and they contain all relevant family directories, but the
> directories are EMPTY.
> Instead, the files that should have been in all different directories are
> moved to the corresponding family directories under directory pointed by
> .META. encoded qualifier of the last region before the split (which is
> where it would fit if non pre-splitting occurred).
>
>
> On Tue, Dec 17, 2013 at 4:48 AM, Bijieshan <bi...@huawei.com> wrote:
>
>> >>>> In the first step, the files are read correctly and regionGroups is
>> >>>> creates as it should.
>> Did you notice the reducer numbers? Did it equal to 2000(Before your
>> extended HFileOutputFormat)?
>>
>> >>> RegionServer logs in the RegionServer that the files are moved to
>> >>> indeed shows that all files are moved to that region (when it
>> >>> doesn't happen it shows only 1 file per family moved to a
>> >>> RegionServer)
>>
>> How about the region-split related logs?
>>
>> > Loaded regions are listed in .META. table and the ENCODED field in the
>> > table points to an existing directory. But all family directories in
>> > this region are empty...
>>
>> Was the previous old region still in .META.?
>>
>> > I implemented an extension of HFileOutputFormat - because each bulk
>> load will import data to the newly created regions only, I pass the prefix
>> > (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so that
>> getRegionStartKeys returns only the corresponding keys.
>> >I did this in order to avoid having 2000 reducers when my target is 15
>> regions...
>>
>> We always do like this:). Only configure the necessary regions.
>>
>> Sorry for the lately reply.
>>
>> Jieshan
>> -----Original Message-----
>> From: Amit Sela [mailto:amits@infolinks.com]
>> Sent: Tuesday, December 17, 2013 12:19 AM
>> To: user@hbase.apache.org
>> Subject: Re: Bulk load moving HFiles to the wrong region
>>
>> I've managed to isolate the problem.
>> I implemented an extension of HFileOutputFormat - because each bulk load
>> will import data to the newly created regions only, I pass the prefix
>> (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so that
>> getRegionStartKeys returns only the corresponding keys.
>> I did this in order to avoid having 2000 reducers when my target is 15
>> regions...
>>
>> When I use HFileOutputFormat  it seems to work. But I don't understand
>> why it doesn't happen in other tables (some smaller and some much much
>> bigger) or even in that table it happens every once in a while ?
>>
>> Any ideas ?
>>
>>
>>
>> On Mon, Dec 16, 2013 at 4:37 PM, Amit Sela <am...@infolinks.com> wrote:
>>
>> > Loaded regions are listed in .META. table and the ENCODED field in the
>> > table points to an existing directory. But all family directories in
>> > this region are empty...
>> >
>> >
>> > On Mon, Dec 16, 2013 at 4:29 PM, Amit Sela <am...@infolinks.com> wrote:
>> >
>> >> I ran the hbck tool, and while I do have some inconsistencies they
>> >> are not in the table that has the bulk load issues.
>> >>
>> >>
>> >>
>> >> On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <am...@infolinks.com>
>> wrote:
>> >>
>> >>> RegionServer logs in the RegionServer that the files are moved to
>> >>> indeed shows that all files are moved to that region (when it
>> >>> doesn't happen it shows only 1 file per family moved to a
>> >>> RegionServer)
>> >>>
>> >>>
>> >>> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <am...@infolinks.com>
>> wrote:
>> >>>
>> >>>> In the first step, the files are read correctly and regionGroups is
>> >>>> creates as it should.
>> >>>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I
>> >>>> notice that ServerCallable's regionName returned from server is the
>> >>>> wrong region (the pre-split last region).
>> >>>> The previous last region is not supposed to delete I'm just adding
>> >>>> new regions (always following lexicographically) so that the last
>> >>>> region before the pre-split is not the last anymore.
>> >>>> It seems that wherever the ServerCallable is running, it is not
>> >>>> updated with the new regions... I tried major compacting (the new
>> >>>> regions) after pre-split and before the bulkload, but that didn't
>> help.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bijieshan@huawei.com
>> >wrote:
>> >>>>
>> >>>>> As we know, bulk load has two steps:
>> >>>>> 1. Create HFiles by MapReduce.
>> >>>>> 2. Load HFiles into HBase.
>> >>>>>
>> >>>>> I wonder whether it read the right partitions information during
>> >>>>> the first step. Have you run hbck tool to check the cluster healthy?
>> >>>>> You mentioned you see the new regions in the webapp. The files
>> >>>>> were moved to the previous old region indicated the old region
>> >>>>> directory was still there. So you started bulk load just after
>> >>>>> region split? (Old region directory will be deleted soon by
>> >>>>> CatalogJanitor after region-split once compaction finished)
>> >>>>>
>> >>>>> I suggest to check the regionserver logs.
>> >>>>>
>> >>>>> Jieshan.
>> >>>>> -----Original Message-----
>> >>>>> From: Amit Sela [mailto:amits@infolinks.com]
>> >>>>> Sent: Monday, December 16, 2013 2:29 PM
>> >>>>> To: user@hbase.apache.org
>> >>>>> Subject: RE: Bulk load moving HFiles to the wrong region
>> >>>>>
>> >>>>> Every split executed is a new day. The row key design is
>> yyyyMMdd_URL.
>> >>>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that
>> >>>>> the entire load is (almost) evenly spread.
>> >>>>> The problem I described causes the bulk load to load all files to
>> >>>>> to the last region of the previous day.
>> >>>>> Thanks.
>> >>>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com> wrote:
>> >>>>>
>> >>>>> > Hi Amit:
>> >>>>> > Can you provide the split-keys of the new regions and your
>> >>>>> > row-key
>> >>>>> design?
>> >>>>> >
>> >>>>> > Thank you.
>> >>>>> > Jieshan.
>> >>>>> > -----Original Message-----
>> >>>>> > From: Amit Sela [mailto:amits@infolinks.com]
>> >>>>> > Sent: Monday, December 16, 2013 7:09 AM
>> >>>>> > To: user@hbase.apache.org
>> >>>>> > Subject: Bulk load moving HFiles to the wrong region
>> >>>>> >
>> >>>>> > Hi all,
>> >>>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
>> >>>>> > When trying to bulk load using the Java API I sometimes get the
>> >>>>> HFiles
>> >>>>> > moved to the wrong directory.
>> >>>>> > I'm pre-splitting regions and the new regions are always the
>> >>>>> > last (lexicographically), so when this happens all files move to
>> >>>>> > the last region pre-split. But the split does work. I see the
>> >>>>> > new regions in the webapp before bulk load executes. Once a
>> >>>>> > table has this problem (not all the time) it keeps on until I
>> restart HBase.
>> >>>>> >
>> >>>>> > Anyone seen something similar ?
>> >>>>> >
>> >>>>> > Thanks,
>> >>>>> > Amit.
>> >>>>> >
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>
>

Re: Bulk load moving HFiles to the wrong region

Posted by Amit Sela <am...@infolinks.com>.
Like I mentioned before, running with all reducers works fine. Running with
the extension of HFileOutputFormat fails, sometimes, on some tables.
.META. encoded qualifier points to different directories for the different
regions files are supposedly loaded into. The directories actually do
exist, and they contain all relevant family directories, but the
directories are EMPTY.
Instead, the files that should have been in all different directories are
moved to the corresponding family directories under directory pointed by
.META. encoded qualifier of the last region before the split (which is
where it would fit if non pre-splitting occurred).


On Tue, Dec 17, 2013 at 4:48 AM, Bijieshan <bi...@huawei.com> wrote:

> >>>> In the first step, the files are read correctly and regionGroups is
> >>>> creates as it should.
> Did you notice the reducer numbers? Did it equal to 2000(Before your
> extended HFileOutputFormat)?
>
> >>> RegionServer logs in the RegionServer that the files are moved to
> >>> indeed shows that all files are moved to that region (when it
> >>> doesn't happen it shows only 1 file per family moved to a
> >>> RegionServer)
>
> How about the region-split related logs?
>
> > Loaded regions are listed in .META. table and the ENCODED field in the
> > table points to an existing directory. But all family directories in
> > this region are empty...
>
> Was the previous old region still in .META.?
>
> > I implemented an extension of HFileOutputFormat - because each bulk load
> will import data to the newly created regions only, I pass the prefix
> > (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so that
> getRegionStartKeys returns only the corresponding keys.
> >I did this in order to avoid having 2000 reducers when my target is 15
> regions...
>
> We always do like this:). Only configure the necessary regions.
>
> Sorry for the lately reply.
>
> Jieshan
> -----Original Message-----
> From: Amit Sela [mailto:amits@infolinks.com]
> Sent: Tuesday, December 17, 2013 12:19 AM
> To: user@hbase.apache.org
> Subject: Re: Bulk load moving HFiles to the wrong region
>
> I've managed to isolate the problem.
> I implemented an extension of HFileOutputFormat - because each bulk load
> will import data to the newly created regions only, I pass the prefix
> (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so that
> getRegionStartKeys returns only the corresponding keys.
> I did this in order to avoid having 2000 reducers when my target is 15
> regions...
>
> When I use HFileOutputFormat  it seems to work. But I don't understand why
> it doesn't happen in other tables (some smaller and some much much bigger)
> or even in that table it happens every once in a while ?
>
> Any ideas ?
>
>
>
> On Mon, Dec 16, 2013 at 4:37 PM, Amit Sela <am...@infolinks.com> wrote:
>
> > Loaded regions are listed in .META. table and the ENCODED field in the
> > table points to an existing directory. But all family directories in
> > this region are empty...
> >
> >
> > On Mon, Dec 16, 2013 at 4:29 PM, Amit Sela <am...@infolinks.com> wrote:
> >
> >> I ran the hbck tool, and while I do have some inconsistencies they
> >> are not in the table that has the bulk load issues.
> >>
> >>
> >>
> >> On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <am...@infolinks.com> wrote:
> >>
> >>> RegionServer logs in the RegionServer that the files are moved to
> >>> indeed shows that all files are moved to that region (when it
> >>> doesn't happen it shows only 1 file per family moved to a
> >>> RegionServer)
> >>>
> >>>
> >>> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <am...@infolinks.com>
> wrote:
> >>>
> >>>> In the first step, the files are read correctly and regionGroups is
> >>>> creates as it should.
> >>>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I
> >>>> notice that ServerCallable's regionName returned from server is the
> >>>> wrong region (the pre-split last region).
> >>>> The previous last region is not supposed to delete I'm just adding
> >>>> new regions (always following lexicographically) so that the last
> >>>> region before the pre-split is not the last anymore.
> >>>> It seems that wherever the ServerCallable is running, it is not
> >>>> updated with the new regions... I tried major compacting (the new
> >>>> regions) after pre-split and before the bulkload, but that didn't
> help.
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bijieshan@huawei.com
> >wrote:
> >>>>
> >>>>> As we know, bulk load has two steps:
> >>>>> 1. Create HFiles by MapReduce.
> >>>>> 2. Load HFiles into HBase.
> >>>>>
> >>>>> I wonder whether it read the right partitions information during
> >>>>> the first step. Have you run hbck tool to check the cluster healthy?
> >>>>> You mentioned you see the new regions in the webapp. The files
> >>>>> were moved to the previous old region indicated the old region
> >>>>> directory was still there. So you started bulk load just after
> >>>>> region split? (Old region directory will be deleted soon by
> >>>>> CatalogJanitor after region-split once compaction finished)
> >>>>>
> >>>>> I suggest to check the regionserver logs.
> >>>>>
> >>>>> Jieshan.
> >>>>> -----Original Message-----
> >>>>> From: Amit Sela [mailto:amits@infolinks.com]
> >>>>> Sent: Monday, December 16, 2013 2:29 PM
> >>>>> To: user@hbase.apache.org
> >>>>> Subject: RE: Bulk load moving HFiles to the wrong region
> >>>>>
> >>>>> Every split executed is a new day. The row key design is
> yyyyMMdd_URL.
> >>>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that
> >>>>> the entire load is (almost) evenly spread.
> >>>>> The problem I described causes the bulk load to load all files to
> >>>>> to the last region of the previous day.
> >>>>> Thanks.
> >>>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com> wrote:
> >>>>>
> >>>>> > Hi Amit:
> >>>>> > Can you provide the split-keys of the new regions and your
> >>>>> > row-key
> >>>>> design?
> >>>>> >
> >>>>> > Thank you.
> >>>>> > Jieshan.
> >>>>> > -----Original Message-----
> >>>>> > From: Amit Sela [mailto:amits@infolinks.com]
> >>>>> > Sent: Monday, December 16, 2013 7:09 AM
> >>>>> > To: user@hbase.apache.org
> >>>>> > Subject: Bulk load moving HFiles to the wrong region
> >>>>> >
> >>>>> > Hi all,
> >>>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
> >>>>> > When trying to bulk load using the Java API I sometimes get the
> >>>>> HFiles
> >>>>> > moved to the wrong directory.
> >>>>> > I'm pre-splitting regions and the new regions are always the
> >>>>> > last (lexicographically), so when this happens all files move to
> >>>>> > the last region pre-split. But the split does work. I see the
> >>>>> > new regions in the webapp before bulk load executes. Once a
> >>>>> > table has this problem (not all the time) it keeps on until I
> restart HBase.
> >>>>> >
> >>>>> > Anyone seen something similar ?
> >>>>> >
> >>>>> > Thanks,
> >>>>> > Amit.
> >>>>> >
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>

RE: Bulk load moving HFiles to the wrong region

Posted by Bijieshan <bi...@huawei.com>.
>>>> In the first step, the files are read correctly and regionGroups is 
>>>> creates as it should.
Did you notice the reducer numbers? Did it equal to 2000(Before your extended HFileOutputFormat)? 

>>> RegionServer logs in the RegionServer that the files are moved to 
>>> indeed shows that all files are moved to that region (when it 
>>> doesn't happen it shows only 1 file per family moved to a 
>>> RegionServer)

How about the region-split related logs? 

> Loaded regions are listed in .META. table and the ENCODED field in the 
> table points to an existing directory. But all family directories in 
> this region are empty...

Was the previous old region still in .META.?

> I implemented an extension of HFileOutputFormat - because each bulk load will import data to the newly created regions only, I pass the prefix
> (yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so that getRegionStartKeys returns only the corresponding keys.
>I did this in order to avoid having 2000 reducers when my target is 15 regions...

We always do like this:). Only configure the necessary regions.

Sorry for the lately reply.

Jieshan
-----Original Message-----
From: Amit Sela [mailto:amits@infolinks.com] 
Sent: Tuesday, December 17, 2013 12:19 AM
To: user@hbase.apache.org
Subject: Re: Bulk load moving HFiles to the wrong region

I've managed to isolate the problem.
I implemented an extension of HFileOutputFormat - because each bulk load will import data to the newly created regions only, I pass the prefix
(yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so that getRegionStartKeys returns only the corresponding keys.
I did this in order to avoid having 2000 reducers when my target is 15 regions...

When I use HFileOutputFormat  it seems to work. But I don't understand why it doesn't happen in other tables (some smaller and some much much bigger) or even in that table it happens every once in a while ?

Any ideas ?



On Mon, Dec 16, 2013 at 4:37 PM, Amit Sela <am...@infolinks.com> wrote:

> Loaded regions are listed in .META. table and the ENCODED field in the 
> table points to an existing directory. But all family directories in 
> this region are empty...
>
>
> On Mon, Dec 16, 2013 at 4:29 PM, Amit Sela <am...@infolinks.com> wrote:
>
>> I ran the hbck tool, and while I do have some inconsistencies they 
>> are not in the table that has the bulk load issues.
>>
>>
>>
>> On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <am...@infolinks.com> wrote:
>>
>>> RegionServer logs in the RegionServer that the files are moved to 
>>> indeed shows that all files are moved to that region (when it 
>>> doesn't happen it shows only 1 file per family moved to a 
>>> RegionServer)
>>>
>>>
>>> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <am...@infolinks.com> wrote:
>>>
>>>> In the first step, the files are read correctly and regionGroups is 
>>>> creates as it should.
>>>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I 
>>>> notice that ServerCallable's regionName returned from server is the 
>>>> wrong region (the pre-split last region).
>>>> The previous last region is not supposed to delete I'm just adding 
>>>> new regions (always following lexicographically) so that the last 
>>>> region before the pre-split is not the last anymore.
>>>> It seems that wherever the ServerCallable is running, it is not 
>>>> updated with the new regions... I tried major compacting (the new 
>>>> regions) after pre-split and before the bulkload, but that didn't help.
>>>>
>>>>
>>>>
>>>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bi...@huawei.com>wrote:
>>>>
>>>>> As we know, bulk load has two steps:
>>>>> 1. Create HFiles by MapReduce.
>>>>> 2. Load HFiles into HBase.
>>>>>
>>>>> I wonder whether it read the right partitions information during 
>>>>> the first step. Have you run hbck tool to check the cluster healthy?
>>>>> You mentioned you see the new regions in the webapp. The files 
>>>>> were moved to the previous old region indicated the old region 
>>>>> directory was still there. So you started bulk load just after 
>>>>> region split? (Old region directory will be deleted soon by 
>>>>> CatalogJanitor after region-split once compaction finished)
>>>>>
>>>>> I suggest to check the regionserver logs.
>>>>>
>>>>> Jieshan.
>>>>> -----Original Message-----
>>>>> From: Amit Sela [mailto:amits@infolinks.com]
>>>>> Sent: Monday, December 16, 2013 2:29 PM
>>>>> To: user@hbase.apache.org
>>>>> Subject: RE: Bulk load moving HFiles to the wrong region
>>>>>
>>>>> Every split executed is a new day. The row key design is yyyyMMdd_URL.
>>>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that 
>>>>> the entire load is (almost) evenly spread.
>>>>> The problem I described causes the bulk load to load all files to 
>>>>> to the last region of the previous day.
>>>>> Thanks.
>>>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com> wrote:
>>>>>
>>>>> > Hi Amit:
>>>>> > Can you provide the split-keys of the new regions and your 
>>>>> > row-key
>>>>> design?
>>>>> >
>>>>> > Thank you.
>>>>> > Jieshan.
>>>>> > -----Original Message-----
>>>>> > From: Amit Sela [mailto:amits@infolinks.com]
>>>>> > Sent: Monday, December 16, 2013 7:09 AM
>>>>> > To: user@hbase.apache.org
>>>>> > Subject: Bulk load moving HFiles to the wrong region
>>>>> >
>>>>> > Hi all,
>>>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
>>>>> > When trying to bulk load using the Java API I sometimes get the
>>>>> HFiles
>>>>> > moved to the wrong directory.
>>>>> > I'm pre-splitting regions and the new regions are always the 
>>>>> > last (lexicographically), so when this happens all files move to 
>>>>> > the last region pre-split. But the split does work. I see the 
>>>>> > new regions in the webapp before bulk load executes. Once a 
>>>>> > table has this problem (not all the time) it keeps on until I restart HBase.
>>>>> >
>>>>> > Anyone seen something similar ?
>>>>> >
>>>>> > Thanks,
>>>>> > Amit.
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Bulk load moving HFiles to the wrong region

Posted by Amit Sela <am...@infolinks.com>.
I've managed to isolate the problem.
I implemented an extension of HFileOutputFormat - because each bulk load
will import data to the newly created regions only, I pass the prefix
(yyyyMMdd) to MyHFileOutputFormat.configureIncrementalLoad() so
that getRegionStartKeys returns only the corresponding keys.
I did this in order to avoid having 2000 reducers when my target is 15
regions...

When I use HFileOutputFormat  it seems to work. But I don't understand why
it doesn't happen in other tables (some smaller and some much much bigger)
or even in that table it happens every once in a while ?

Any ideas ?



On Mon, Dec 16, 2013 at 4:37 PM, Amit Sela <am...@infolinks.com> wrote:

> Loaded regions are listed in .META. table and the ENCODED field in the
> table points to an existing directory. But all family directories in this
> region are empty...
>
>
> On Mon, Dec 16, 2013 at 4:29 PM, Amit Sela <am...@infolinks.com> wrote:
>
>> I ran the hbck tool, and while I do have some inconsistencies they are
>> not in the table that has the bulk load issues.
>>
>>
>>
>> On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <am...@infolinks.com> wrote:
>>
>>> RegionServer logs in the RegionServer that the files are moved to indeed
>>> shows that all files are moved to that region (when it doesn't happen it
>>> shows only 1 file per family moved to a RegionServer)
>>>
>>>
>>> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <am...@infolinks.com> wrote:
>>>
>>>> In the first step, the files are read correctly and regionGroups is
>>>> creates as it should.
>>>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I notice
>>>> that ServerCallable's regionName returned from server is the wrong region
>>>> (the pre-split last region).
>>>> The previous last region is not supposed to delete I'm just adding new
>>>> regions (always following lexicographically) so that the last region before
>>>> the pre-split is not the last anymore.
>>>> It seems that wherever the ServerCallable is running, it is not updated
>>>> with the new regions... I tried major compacting (the new regions) after
>>>> pre-split and before the bulkload, but that didn't help.
>>>>
>>>>
>>>>
>>>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bi...@huawei.com>wrote:
>>>>
>>>>> As we know, bulk load has two steps:
>>>>> 1. Create HFiles by MapReduce.
>>>>> 2. Load HFiles into HBase.
>>>>>
>>>>> I wonder whether it read the right partitions information during the
>>>>> first step. Have you run hbck tool to check the cluster healthy?
>>>>> You mentioned you see the new regions in the webapp. The files were
>>>>> moved to the previous old region indicated the old region directory was
>>>>> still there. So you started bulk load just after region split? (Old region
>>>>> directory will be deleted soon by CatalogJanitor after region-split once
>>>>> compaction finished)
>>>>>
>>>>> I suggest to check the regionserver logs.
>>>>>
>>>>> Jieshan.
>>>>> -----Original Message-----
>>>>> From: Amit Sela [mailto:amits@infolinks.com]
>>>>> Sent: Monday, December 16, 2013 2:29 PM
>>>>> To: user@hbase.apache.org
>>>>> Subject: RE: Bulk load moving HFiles to the wrong region
>>>>>
>>>>> Every split executed is a new day. The row key design is yyyyMMdd_URL.
>>>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that the
>>>>> entire load is (almost) evenly spread.
>>>>> The problem I described causes the bulk load to load all files to to
>>>>> the last region of the previous day.
>>>>> Thanks.
>>>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com> wrote:
>>>>>
>>>>> > Hi Amit:
>>>>> > Can you provide the split-keys of the new regions and your row-key
>>>>> design?
>>>>> >
>>>>> > Thank you.
>>>>> > Jieshan.
>>>>> > -----Original Message-----
>>>>> > From: Amit Sela [mailto:amits@infolinks.com]
>>>>> > Sent: Monday, December 16, 2013 7:09 AM
>>>>> > To: user@hbase.apache.org
>>>>> > Subject: Bulk load moving HFiles to the wrong region
>>>>> >
>>>>> > Hi all,
>>>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
>>>>> > When trying to bulk load using the Java API I sometimes get the
>>>>> HFiles
>>>>> > moved to the wrong directory.
>>>>> > I'm pre-splitting regions and the new regions are always the last
>>>>> > (lexicographically), so when this happens all files move to the last
>>>>> > region pre-split. But the split does work. I see the new regions in
>>>>> > the webapp before bulk load executes. Once a table has this problem
>>>>> > (not all the time) it keeps on until I restart HBase.
>>>>> >
>>>>> > Anyone seen something similar ?
>>>>> >
>>>>> > Thanks,
>>>>> > Amit.
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Bulk load moving HFiles to the wrong region

Posted by Amit Sela <am...@infolinks.com>.
Loaded regions are listed in .META. table and the ENCODED field in the
table points to an existing directory. But all family directories in this
region are empty...


On Mon, Dec 16, 2013 at 4:29 PM, Amit Sela <am...@infolinks.com> wrote:

> I ran the hbck tool, and while I do have some inconsistencies they are not
> in the table that has the bulk load issues.
>
>
>
> On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <am...@infolinks.com> wrote:
>
>> RegionServer logs in the RegionServer that the files are moved to indeed
>> shows that all files are moved to that region (when it doesn't happen it
>> shows only 1 file per family moved to a RegionServer)
>>
>>
>> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <am...@infolinks.com> wrote:
>>
>>> In the first step, the files are read correctly and regionGroups is
>>> creates as it should.
>>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I notice
>>> that ServerCallable's regionName returned from server is the wrong region
>>> (the pre-split last region).
>>> The previous last region is not supposed to delete I'm just adding new
>>> regions (always following lexicographically) so that the last region before
>>> the pre-split is not the last anymore.
>>> It seems that wherever the ServerCallable is running, it is not updated
>>> with the new regions... I tried major compacting (the new regions) after
>>> pre-split and before the bulkload, but that didn't help.
>>>
>>>
>>>
>>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bi...@huawei.com> wrote:
>>>
>>>> As we know, bulk load has two steps:
>>>> 1. Create HFiles by MapReduce.
>>>> 2. Load HFiles into HBase.
>>>>
>>>> I wonder whether it read the right partitions information during the
>>>> first step. Have you run hbck tool to check the cluster healthy?
>>>> You mentioned you see the new regions in the webapp. The files were
>>>> moved to the previous old region indicated the old region directory was
>>>> still there. So you started bulk load just after region split? (Old region
>>>> directory will be deleted soon by CatalogJanitor after region-split once
>>>> compaction finished)
>>>>
>>>> I suggest to check the regionserver logs.
>>>>
>>>> Jieshan.
>>>> -----Original Message-----
>>>> From: Amit Sela [mailto:amits@infolinks.com]
>>>> Sent: Monday, December 16, 2013 2:29 PM
>>>> To: user@hbase.apache.org
>>>> Subject: RE: Bulk load moving HFiles to the wrong region
>>>>
>>>> Every split executed is a new day. The row key design is yyyyMMdd_URL.
>>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that the
>>>> entire load is (almost) evenly spread.
>>>> The problem I described causes the bulk load to load all files to to
>>>> the last region of the previous day.
>>>> Thanks.
>>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com> wrote:
>>>>
>>>> > Hi Amit:
>>>> > Can you provide the split-keys of the new regions and your row-key
>>>> design?
>>>> >
>>>> > Thank you.
>>>> > Jieshan.
>>>> > -----Original Message-----
>>>> > From: Amit Sela [mailto:amits@infolinks.com]
>>>> > Sent: Monday, December 16, 2013 7:09 AM
>>>> > To: user@hbase.apache.org
>>>> > Subject: Bulk load moving HFiles to the wrong region
>>>> >
>>>> > Hi all,
>>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
>>>> > When trying to bulk load using the Java API I sometimes get the HFiles
>>>> > moved to the wrong directory.
>>>> > I'm pre-splitting regions and the new regions are always the last
>>>> > (lexicographically), so when this happens all files move to the last
>>>> > region pre-split. But the split does work. I see the new regions in
>>>> > the webapp before bulk load executes. Once a table has this problem
>>>> > (not all the time) it keeps on until I restart HBase.
>>>> >
>>>> > Anyone seen something similar ?
>>>> >
>>>> > Thanks,
>>>> > Amit.
>>>> >
>>>>
>>>
>>>
>>
>

Re: Bulk load moving HFiles to the wrong region

Posted by Amit Sela <am...@infolinks.com>.
I ran the hbck tool, and while I do have some inconsistencies they are not
in the table that has the bulk load issues.



On Mon, Dec 16, 2013 at 4:22 PM, Amit Sela <am...@infolinks.com> wrote:

> RegionServer logs in the RegionServer that the files are moved to indeed
> shows that all files are moved to that region (when it doesn't happen it
> shows only 1 file per family moved to a RegionServer)
>
>
> On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <am...@infolinks.com> wrote:
>
>> In the first step, the files are read correctly and regionGroups is
>> creates as it should.
>> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I notice
>> that ServerCallable's regionName returned from server is the wrong region
>> (the pre-split last region).
>> The previous last region is not supposed to delete I'm just adding new
>> regions (always following lexicographically) so that the last region before
>> the pre-split is not the last anymore.
>> It seems that wherever the ServerCallable is running, it is not updated
>> with the new regions... I tried major compacting (the new regions) after
>> pre-split and before the bulkload, but that didn't help.
>>
>>
>>
>> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bi...@huawei.com> wrote:
>>
>>> As we know, bulk load has two steps:
>>> 1. Create HFiles by MapReduce.
>>> 2. Load HFiles into HBase.
>>>
>>> I wonder whether it read the right partitions information during the
>>> first step. Have you run hbck tool to check the cluster healthy?
>>> You mentioned you see the new regions in the webapp. The files were
>>> moved to the previous old region indicated the old region directory was
>>> still there. So you started bulk load just after region split? (Old region
>>> directory will be deleted soon by CatalogJanitor after region-split once
>>> compaction finished)
>>>
>>> I suggest to check the regionserver logs.
>>>
>>> Jieshan.
>>> -----Original Message-----
>>> From: Amit Sela [mailto:amits@infolinks.com]
>>> Sent: Monday, December 16, 2013 2:29 PM
>>> To: user@hbase.apache.org
>>> Subject: RE: Bulk load moving HFiles to the wrong region
>>>
>>> Every split executed is a new day. The row key design is yyyyMMdd_URL.
>>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that the
>>> entire load is (almost) evenly spread.
>>> The problem I described causes the bulk load to load all files to to the
>>> last region of the previous day.
>>> Thanks.
>>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com> wrote:
>>>
>>> > Hi Amit:
>>> > Can you provide the split-keys of the new regions and your row-key
>>> design?
>>> >
>>> > Thank you.
>>> > Jieshan.
>>> > -----Original Message-----
>>> > From: Amit Sela [mailto:amits@infolinks.com]
>>> > Sent: Monday, December 16, 2013 7:09 AM
>>> > To: user@hbase.apache.org
>>> > Subject: Bulk load moving HFiles to the wrong region
>>> >
>>> > Hi all,
>>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
>>> > When trying to bulk load using the Java API I sometimes get the HFiles
>>> > moved to the wrong directory.
>>> > I'm pre-splitting regions and the new regions are always the last
>>> > (lexicographically), so when this happens all files move to the last
>>> > region pre-split. But the split does work. I see the new regions in
>>> > the webapp before bulk load executes. Once a table has this problem
>>> > (not all the time) it keeps on until I restart HBase.
>>> >
>>> > Anyone seen something similar ?
>>> >
>>> > Thanks,
>>> > Amit.
>>> >
>>>
>>
>>
>

Re: Bulk load moving HFiles to the wrong region

Posted by Amit Sela <am...@infolinks.com>.
RegionServer logs in the RegionServer that the files are moved to indeed
shows that all files are moved to that region (when it doesn't happen it
shows only 1 file per family moved to a RegionServer)


On Mon, Dec 16, 2013 at 4:21 PM, Amit Sela <am...@infolinks.com> wrote:

> In the first step, the files are read correctly and regionGroups is
> creates as it should.
> When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I notice
> that ServerCallable's regionName returned from server is the wrong region
> (the pre-split last region).
> The previous last region is not supposed to delete I'm just adding new
> regions (always following lexicographically) so that the last region before
> the pre-split is not the last anymore.
> It seems that wherever the ServerCallable is running, it is not updated
> with the new regions... I tried major compacting (the new regions) after
> pre-split and before the bulkload, but that didn't help.
>
>
>
> On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bi...@huawei.com> wrote:
>
>> As we know, bulk load has two steps:
>> 1. Create HFiles by MapReduce.
>> 2. Load HFiles into HBase.
>>
>> I wonder whether it read the right partitions information during the
>> first step. Have you run hbck tool to check the cluster healthy?
>> You mentioned you see the new regions in the webapp. The files were moved
>> to the previous old region indicated the old region directory was still
>> there. So you started bulk load just after region split? (Old region
>> directory will be deleted soon by CatalogJanitor after region-split once
>> compaction finished)
>>
>> I suggest to check the regionserver logs.
>>
>> Jieshan.
>> -----Original Message-----
>> From: Amit Sela [mailto:amits@infolinks.com]
>> Sent: Monday, December 16, 2013 2:29 PM
>> To: user@hbase.apache.org
>> Subject: RE: Bulk load moving HFiles to the wrong region
>>
>> Every split executed is a new day. The row key design is yyyyMMdd_URL.
>> And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that the
>> entire load is (almost) evenly spread.
>> The problem I described causes the bulk load to load all files to to the
>> last region of the previous day.
>> Thanks.
>> On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com> wrote:
>>
>> > Hi Amit:
>> > Can you provide the split-keys of the new regions and your row-key
>> design?
>> >
>> > Thank you.
>> > Jieshan.
>> > -----Original Message-----
>> > From: Amit Sela [mailto:amits@infolinks.com]
>> > Sent: Monday, December 16, 2013 7:09 AM
>> > To: user@hbase.apache.org
>> > Subject: Bulk load moving HFiles to the wrong region
>> >
>> > Hi all,
>> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
>> > When trying to bulk load using the Java API I sometimes get the HFiles
>> > moved to the wrong directory.
>> > I'm pre-splitting regions and the new regions are always the last
>> > (lexicographically), so when this happens all files move to the last
>> > region pre-split. But the split does work. I see the new regions in
>> > the webapp before bulk load executes. Once a table has this problem
>> > (not all the time) it keeps on until I restart HBase.
>> >
>> > Anyone seen something similar ?
>> >
>> > Thanks,
>> > Amit.
>> >
>>
>
>

Re: Bulk load moving HFiles to the wrong region

Posted by Amit Sela <am...@infolinks.com>.
In the first step, the files are read correctly and regionGroups is creates
as it should.
When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I notice
that ServerCallable's regionName returned from server is the wrong region
(the pre-split last region).
The previous last region is not supposed to delete I'm just adding new
regions (always following lexicographically) so that the last region before
the pre-split is not the last anymore.
It seems that wherever the ServerCallable is running, it is not updated
with the new regions... I tried major compacting (the new regions) after
pre-split and before the bulkload, but that didn't help.



On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <bi...@huawei.com> wrote:

> As we know, bulk load has two steps:
> 1. Create HFiles by MapReduce.
> 2. Load HFiles into HBase.
>
> I wonder whether it read the right partitions information during the first
> step. Have you run hbck tool to check the cluster healthy?
> You mentioned you see the new regions in the webapp. The files were moved
> to the previous old region indicated the old region directory was still
> there. So you started bulk load just after region split? (Old region
> directory will be deleted soon by CatalogJanitor after region-split once
> compaction finished)
>
> I suggest to check the regionserver logs.
>
> Jieshan.
> -----Original Message-----
> From: Amit Sela [mailto:amits@infolinks.com]
> Sent: Monday, December 16, 2013 2:29 PM
> To: user@hbase.apache.org
> Subject: RE: Bulk load moving HFiles to the wrong region
>
> Every split executed is a new day. The row key design is yyyyMMdd_URL. And
> the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that the entire
> load is (almost) evenly spread.
> The problem I described causes the bulk load to load all files to to the
> last region of the previous day.
> Thanks.
> On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com> wrote:
>
> > Hi Amit:
> > Can you provide the split-keys of the new regions and your row-key
> design?
> >
> > Thank you.
> > Jieshan.
> > -----Original Message-----
> > From: Amit Sela [mailto:amits@infolinks.com]
> > Sent: Monday, December 16, 2013 7:09 AM
> > To: user@hbase.apache.org
> > Subject: Bulk load moving HFiles to the wrong region
> >
> > Hi all,
> > I'm using Hadoop 1.0.4 and HBase 0.94.12.
> > When trying to bulk load using the Java API I sometimes get the HFiles
> > moved to the wrong directory.
> > I'm pre-splitting regions and the new regions are always the last
> > (lexicographically), so when this happens all files move to the last
> > region pre-split. But the split does work. I see the new regions in
> > the webapp before bulk load executes. Once a table has this problem
> > (not all the time) it keeps on until I restart HBase.
> >
> > Anyone seen something similar ?
> >
> > Thanks,
> > Amit.
> >
>

RE: Bulk load moving HFiles to the wrong region

Posted by Bijieshan <bi...@huawei.com>.
As we know, bulk load has two steps:
1. Create HFiles by MapReduce.
2. Load HFiles into HBase.

I wonder whether it read the right partitions information during the first step. Have you run hbck tool to check the cluster healthy? 
You mentioned you see the new regions in the webapp. The files were moved to the previous old region indicated the old region directory was still there. So you started bulk load just after region split? (Old region directory will be deleted soon by CatalogJanitor after region-split once compaction finished)

I suggest to check the regionserver logs.

Jieshan.
-----Original Message-----
From: Amit Sela [mailto:amits@infolinks.com] 
Sent: Monday, December 16, 2013 2:29 PM
To: user@hbase.apache.org
Subject: RE: Bulk load moving HFiles to the wrong region

Every split executed is a new day. The row key design is yyyyMMdd_URL. And the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that the entire load is (almost) evenly spread.
The problem I described causes the bulk load to load all files to to the last region of the previous day.
Thanks.
On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com> wrote:

> Hi Amit:
> Can you provide the split-keys of the new regions and your row-key design?
>
> Thank you.
> Jieshan.
> -----Original Message-----
> From: Amit Sela [mailto:amits@infolinks.com]
> Sent: Monday, December 16, 2013 7:09 AM
> To: user@hbase.apache.org
> Subject: Bulk load moving HFiles to the wrong region
>
> Hi all,
> I'm using Hadoop 1.0.4 and HBase 0.94.12.
> When trying to bulk load using the Java API I sometimes get the HFiles 
> moved to the wrong directory.
> I'm pre-splitting regions and the new regions are always the last 
> (lexicographically), so when this happens all files move to the last 
> region pre-split. But the split does work. I see the new regions in 
> the webapp before bulk load executes. Once a table has this problem 
> (not all the time) it keeps on until I restart HBase.
>
> Anyone seen something similar ?
>
> Thanks,
> Amit.
>

RE: Bulk load moving HFiles to the wrong region

Posted by Amit Sela <am...@infolinks.com>.
Every split executed is a new day. The row key design is yyyyMMdd_URL. And
the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that the entire
load is (almost) evenly spread.
The problem I described causes the bulk load to load all files to to the
last region of the previous day.
Thanks.
On Dec 16, 2013 3:43 AM, "Bijieshan" <bi...@huawei.com> wrote:

> Hi Amit:
> Can you provide the split-keys of the new regions and your row-key design?
>
> Thank you.
> Jieshan.
> -----Original Message-----
> From: Amit Sela [mailto:amits@infolinks.com]
> Sent: Monday, December 16, 2013 7:09 AM
> To: user@hbase.apache.org
> Subject: Bulk load moving HFiles to the wrong region
>
> Hi all,
> I'm using Hadoop 1.0.4 and HBase 0.94.12.
> When trying to bulk load using the Java API I sometimes get the HFiles
> moved to the wrong directory.
> I'm pre-splitting regions and the new regions are always the last
> (lexicographically), so when this happens all files move to the last region
> pre-split. But the split does work. I see the new regions in the webapp
> before bulk load executes. Once a table has this problem (not all the time)
> it keeps on until I restart HBase.
>
> Anyone seen something similar ?
>
> Thanks,
> Amit.
>

RE: Bulk load moving HFiles to the wrong region

Posted by Bijieshan <bi...@huawei.com>.
Hi Amit:
Can you provide the split-keys of the new regions and your row-key design?

Thank you.
Jieshan.
-----Original Message-----
From: Amit Sela [mailto:amits@infolinks.com] 
Sent: Monday, December 16, 2013 7:09 AM
To: user@hbase.apache.org
Subject: Bulk load moving HFiles to the wrong region

Hi all,
I'm using Hadoop 1.0.4 and HBase 0.94.12.
When trying to bulk load using the Java API I sometimes get the HFiles moved to the wrong directory.
I'm pre-splitting regions and the new regions are always the last (lexicographically), so when this happens all files move to the last region pre-split. But the split does work. I see the new regions in the webapp before bulk load executes. Once a table has this problem (not all the time) it keeps on until I restart HBase.

Anyone seen something similar ?

Thanks,
Amit.