You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Shahab Yunus <sh...@gmail.com> on 2014/11/14 15:41:09 UTC

Forcibly merging regions

The documentation of online merge tool (merge_region) states that if we
forcibly merge regions (by setting the 3rd attribute as true) then it can
create overlapping regions. if this happens then will this render the
region or table unusable or it is just a performance hit? I mean how bigger
of a deal it is?

Actually, we are merging regions using the programmatic API for this and
setting this flag ('forcible') as false. But for some tables (we haven't
figured out a pattern yet, data is still accessible), merge of regions do
not happen at all. Afterwards we tried with this flag = true, and it still
doesn't merge them.

CDH 5.1.0
(Hbase is 0.98.1-cdh5.1.0)

Regards,
Shahab

Re: Forcibly merging regions

Posted by Shahab Yunus <sh...@gmail.com>.

I just checked. No snapshots were taken and 'list_snapshots' also returns
nothing.

Regards,
Shahab

On Fri, Nov 14, 2014 at 12:39 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> No. Not that I can recall but I can check.
>
> From resolution perspective, is there any way we can resolve this. More
> importantly, anyway we can automate the resolution, if we run into such
> issues in future? 'Cleaning the qualifier', that is.
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> One possibility was that region 7373f75181c71eb5061a6673cee15931 was
>> involved in some hbase snapshot.
>>
>> Was the underlying table being snapshotted in recent past ?
>>
>> Cheers
>>
>> On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>> > Thanks again.
>> >
>> > But I have been polling for a while and it still doesn't merge. I mean
>> this
>> > particular region example that I sent you, I am trying to merge it since
>> > yesterday. I ran the polling-base code all night and I have to kill it.
>> > Then in the morning, I tried manual merging through hbase shell and it
>> > still doesn't merge. Note that the current polling logic doesnot try to
>> > call merge again. It just checks the region size.
>> >
>> > So how to clean it then? Or actually make it merge? Plus is this
>> something
>> > expected (a region keeping a reference)? How can we avoid it?
>> >
>> > Note that this is not limited to this table only. We are seeing this in
>> > other regions of other tables as well. Are we merging too fast?
>> >
>> >
>> >
>> > Regards,
>> > Shahab
>> >
>> > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yu...@gmail.com> wrote:
>> >
>> > > Polling as you described is fine.
>> > >
>> > > catalogJanitor.cleanMergeQualifier() is called by
>> > > DispatchMergingRegionHandler.
>> > >
>> > > If clean was successful, you would see the following:
>> > >
>> > >       LOG.debug("Deleting region " + regionA.getRegionNameAsString()
>> + "
>> > > and "
>> > >
>> > >           + regionB.getRegionNameAsString()
>> > >
>> > >           + " from fs because merged region no longer holds
>> references");
>> > >
>> > > Assuming there was no log below in your master log:
>> > >
>> > >       LOG.error("Merged region " + region.getRegionNameAsString()
>> > >
>> > >           + " has only one merge qualifier in META.");
>> > >
>> > > It would be the case that 7373f75181c71eb5061a6673cee15931 still had
>> > > reference file.
>> > >
>> > > Cheers
>> > >
>> > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <shahab.yunus@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > Hi Ted.
>> > > >
>> > > > The log bit is below at the end of the email. This is the command to
>> > > merge
>> > > > that I gave just now through hbase shell. forcible was false but it
>> > > behaves
>> > > > similarly if forcible is true too. This is from master log. Indeed
>> the
>> > > > region merging was skipped! What does this mean? Data seems to be
>> > intact
>> > > > for this table.
>> > > >
>> > > > Just to give you a background. This table was first merge by the
>> auto
>> > > mated
>> > > > java application. What we are doing is that we are merging tables
>> > > > programmatically. As the HBaseAdmin.mergeRegions calls i async, we
>> poll
>> > > for
>> > > > the number of regions getting lowered after this merge call. The
>> > > > application hangs and continues polling for ever as the previous
>> merge
>> > > > didn't happen.
>> > > >
>> > > > In this poll loop, we do get the number of regions by a fresh call
>> to
>> > > > HBaseAdmin.getTableRegions(tableName).getSize().
>> > > >
>> > > > What are these merge qualifiers and what are we doing wrong or
>> should
>> > do?
>> > > >
>> > > > In the polling loop we can somehow retry merge again? But how can we
>> > > know,
>> > > > that we need to call merge again as it works for some regions. Is
>> the
>> > > table
>> > > > meta corrupted for some reason by the above logic?
>> > > >
>> > > > Thanks a lot.
>> > > >
>> > > >
>> > > >
>> > > >
>> > ------------------------------------------------------------------------
>> > > >
>> > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper:
>> Session:
>> > > > 0x348c7017707236b closed
>> > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn:
>> > EventThread
>> > > > shut down
>> > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper:
>> Initiating
>> > > > client connection,
>> > > >
>> > > >
>> > >
>> >
>> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>> > > > sessionTimeout=60000
>> watcher=catalogtracker-on-hconnection-0x47d865f2,
>> > > >
>> > > >
>> > >
>> >
>> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
>> > > > baseZNode=/hbase
>> > > > 2014-11-14 11:25:02,645 INFO
>> > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
>> > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
>> > > ZooKeeper
>> > > >
>> > > >
>> > >
>> >
>> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>> > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn:
>> Opening
>> > > > socket connection to server ip-1010018.ec2.internal/1010019:2181.
>> Will
>> > > not
>> > > > attempt to authenticate using SASL (unknown error)
>> > > > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn: Socket
>> > > > connection established to ip-1010018.ec2.internal/1010019:2181,
>> > > initiating
>> > > > session
>> > > > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn:
>> Session
>> > > > establishment complete on server
>> ip-1010018.ec2.internal/1010019:2181,
>> > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
>> > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper:
>> Session:
>> > > > 0x348c7017707236c closed
>> > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn:
>> > EventThread
>> > > > shut down
>> > > > 2014-11-14 11:25:30,713 INFO
>> > > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
>> > Skip
>> > > > merging regions
>> > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
>> > > >
>> > > >
>> > >
>> >
>> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
>> > > > because region 7373f75181c71eb5061a6673cee15931 has merge qualifier
>> > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper:
>> Initiating
>> > > > client connection,
>> > > >
>> > > >
>> > >
>> >
>> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>> > > > sessionTimeout=60000
>> watcher=catalogtracker-on-hconnection-0x47d865f2,
>> > > >
>> > > >
>> > >
>> >
>> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
>> > > > baseZNode=/hbase
>> > > > 2014-11-14 11:25:41,384 INFO
>> > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
>> > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
>> > > ZooKeeper
>> > > >
>> > > >
>> > >
>> >
>> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>> > > > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn:
>> Opening
>> > > > socket connection to server ip-1010018.ec2.internal/1010019:2181.
>> Will
>> > > not
>> > > > attempt to authenticate using SASL (unknown error)
>> > > > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn: Socket
>> > > > connection established to ip-1010018.ec2.internal/1010019:2181,
>> > > initiating
>> > > > session
>> > > > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn:
>> Session
>> > > > establishment complete on server
>> ip-1010018.ec2.internal/1010019:2181,
>> > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
>> > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper:
>> Session:
>> > > > 0x348c7017707236e closed
>> > > > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn:
>> > EventThread
>> > > > shut down
>> > > >
>> > > >
>> > >
>> >
>> ------------------------------------------------------------------------------------------------------------------------------------
>> > > >
>> > > > Regards,
>> > > > Shahab
>> > > >
>> > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yu...@gmail.com>
>> wrote:
>> > > >
>> > > > > Looking at DispatchMergingRegionHandler, it does some check before
>> > > > > initiating the merge.
>> > > > > e.g.:
>> > > > >
>> > > > >       LOG.info("Skip merging regions " +
>> > > region_a.getRegionNameAsString()
>> > > > >
>> > > > >           + ", " + region_b.getRegionNameAsString() + ", because
>> > > region "
>> > > > >
>> > > > >           + (regionAHasMergeQualifier ? region_a.getEncodedName()
>> :
>> > > > > region_b
>> > > > >
>> > > > >               .getEncodedName()) + " has merge qualifier");
>> > > > >
>> > > > > Can you take a look at master log around the time merge request
>> was
>> > > > issued
>> > > > > to see if you can get some clue ?
>> > > > >
>> > > > > Cheers
>> > > > >
>> > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
>> > shahab.yunus@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > The documentation of online merge tool (merge_region) states
>> that
>> > if
>> > > we
>> > > > > > forcibly merge regions (by setting the 3rd attribute as true)
>> then
>> > it
>> > > > can
>> > > > > > create overlapping regions. if this happens then will this
>> render
>> > the
>> > > > > > region or table unusable or it is just a performance hit? I mean
>> > how
>> > > > > bigger
>> > > > > > of a deal it is?
>> > > > > >
>> > > > > > Actually, we are merging regions using the programmatic API for
>> > this
>> > > > and
>> > > > > > setting this flag ('forcible') as false. But for some tables (we
>> > > > haven't
>> > > > > > figured out a pattern yet, data is still accessible), merge of
>> > > regions
>> > > > do
>> > > > > > not happen at all. Afterwards we tried with this flag = true,
>> and
>> > it
>> > > > > still
>> > > > > > doesn't merge them.
>> > > > > >
>> > > > > > CDH 5.1.0
>> > > > > > (Hbase is 0.98.1-cdh5.1.0)
>> > > > > >
>> > > > > > Regards,
>> > > > > > Shahab
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Forcibly merging regions

Posted by Ted Yu <yu...@gmail.com>.

The other way around: thanks for pacing through this session which other
hbase users would find helpful.

On Fri, Nov 14, 2014 at 2:39 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> Thanks a lot Ted!
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 4:11 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > w.r.t. querying compaction status, please take a look at the following
> > method in HBaseAdmin:
> >
> >   public CompactionState getCompactionState(final TableName tableName)
> >
> > For triggering major compaction on selected region, see:
> >
> >   public void majorCompactRegion(final byte[] regionName)
> >
> > Cheers
> >
> > On Fri, Nov 14, 2014 at 11:49 AM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> > > I see. Thanks.
> > >
> > > So we can in a way automate this resolution by invoking major
> compaction
> > > programmatically for the 2 regions under process (or we need to do the
> > > whole table?). Point being, that the merge tool, once identifies that
> it
> > is
> > > stuck in a polling loop, can invoke major compaction on the 2 regions
> or
> > > table and then try again. Does it make sense? Plausible solution? We do
> > > know that this merging, although automated, will still be run in a
> > > controlled manner so chances of overstepping or synchronization issues
> on
> > > the current table should not occur.
> > >
> > > But ow the question is that majorCompact is also an aync operation. So
> > how
> > > and when to know it has finished? :)
> > >
> > > Regards,
> > > Shahab
> > >
> > > On Fri, Nov 14, 2014 at 2:34 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > This means that yesterday's compaction was not major compaction.
> > > >
> > > > When references get in the way of merging regions, you know that it
> is
> > > time
> > > > for major compaction.
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Nov 14, 2014 at 11:31 AM, Shahab Yunus <
> shahab.yunus@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > After major compacting the references were freed for the above
> > > mentioned
> > > > > regions and then the merge_region command succeeded and they got
> > > merged.
> > > > > Hmmm.
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > > On Fri, Nov 14, 2014 at 2:08 PM, Shahab Yunus <
> > shahab.yunus@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Digging deeper into the code, I came across this (this is from
> > > > > > CatalogJanitor#cleanMergeRegion):
> > > > > >
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > HFileArchiver.archiveRegion <
> > > > >
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29
> > > > >(this.services
> > > > > <
> > > > >
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services
> > > > >.getConfiguration
> > > > > <
> > > > >
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29
> > > > >(),
> > > > > fs, regionA);
> > > > > >
> > > > > > HFileArchiver.archiveRegion <
> > > > >
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29
> > > > >(this.services
> > > > > <
> > > > >
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services
> > > > >.getConfiguration
> > > > > <
> > > > >
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29
> > > > >(),
> > > > > fs, regionB);
> > > > > >
> > > > > > MetaEditor.deleteMergeQualifiers <
> > > > >
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/catalog/MetaEditor.java#MetaEditor.deleteMergeQualifiers%28org.apache.hadoop.hbase.catalog.CatalogTracker%2Corg.apache.hadoop.hbase.HRegionInfo%29
> > > > >(server
> > > > > <
> > > > >
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0server
> > > > >.getCatalogTracker
> > > > > <
> > > > >
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getCatalogTracker%28%29
> > > > >(),
> > > > > mergedRegion);
> > > > > >
> > > > > > return true;
> > > > > >
> > > > > >
> > > > > > Do you think it is ok if we face this issue then we forcibly
> > archive
> > > > and
> > > > > > clean the regions ?
> > > > > >
> > > > > > Regards,
> > > > > > Shahab
> > > > > >
> > > > > > On Fri, Nov 14, 2014 at 1:10 PM, Shahab Yunus <
> > > shahab.yunus@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> Yesterday, I believe.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Shahab
> > > > > >>
> > > > > >> On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > > > >>
> > > > > >>> Shahab:
> > > > > >>> When was the last time compaction was run on this table ?
> > > > > >>>
> > > > > >>> Cheers
> > > > > >>>
> > > > > >>> On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <
> > > > shahab.yunus@gmail.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> > I see. Thanks.
> > > > > >>> >
> > > > > >>> > And if the region indeed has references, then can we somehow
> > > > forcibly
> > > > > >>> > remove them? Is this even possible (if not advisable)?
> > Basically
> > > > what
> > > > > >>> I am
> > > > > >>> > trying to ask is that let us say we do hit this scenario and
> we
> > > > know
> > > > > >>> it is
> > > > > >>> > OK to go ahead and merge. What steps can we follow after
> > > detection
> > > > of
> > > > > >>> such
> > > > > >>> > unwanted references.
> > > > > >>> >
> > > > > >>> > Regards,
> > > > > >>> > Shahab
> > > > > >>> >
> > > > > >>> > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <
> yuzhihong@gmail.com>
> > > > > wrote:
> > > > > >>> >
> > > > > >>> > > For automated detection of such scenario, you can reference
> > the
> > > > > code
> > > > > >>> in
> > > > > >>> > > CatalogJanitor#cleanMergeRegion():
> > > > > >>> > >
> > > > > >>> > >       regionFs =
> HRegionFileSystem.openRegionFromFileSystem(
> > > > > >>> > >
> > > > > >>> > >           this.services.getConfiguration(), fs, tabledir,
> > > > > >>> mergedRegion,
> > > > > >>> > > true
> > > > > >>> > > );
> > > > > >>> > >
> > > > > >>> > > ...
> > > > > >>> > >
> > > > > >>> > > Then regionFs.hasReferences(htd) would tell you whether the
> > > > > >>> underlying
> > > > > >>> > > region has reference files.
> > > > > >>> > > Cheers
> > > > > >>> > >
> > > > > >>> > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <
> > > > > >>> shahab.yunus@gmail.com>
> > > > > >>> > > wrote:
> > > > > >>> > >
> > > > > >>> > > > No. Not that I can recall but I can check.
> > > > > >>> > > >
> > > > > >>> > > > From resolution perspective, is there any way we can
> > resolve
> > > > > this.
> > > > > >>> More
> > > > > >>> > > > importantly, anyway we can automate the resolution, if we
> > run
> > > > > into
> > > > > >>> such
> > > > > >>> > > > issues in future? 'Cleaning the qualifier', that is.
> > > > > >>> > > >
> > > > > >>> > > > Regards,
> > > > > >>> > > > Shahab
> > > > > >>> > > >
> > > > > >>> > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <
> > > yuzhihong@gmail.com>
> > > > > >>> wrote:
> > > > > >>> > > >
> > > > > >>> > > > > One possibility was that region
> > > > > 7373f75181c71eb5061a6673cee15931
> > > > > >>> was
> > > > > >>> > > > > involved in some hbase snapshot.
> > > > > >>> > > > >
> > > > > >>> > > > > Was the underlying table being snapshotted in recent
> > past ?
> > > > > >>> > > > >
> > > > > >>> > > > > Cheers
> > > > > >>> > > > >
> > > > > >>> > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
> > > > > >>> > shahab.yunus@gmail.com>
> > > > > >>> > > > > wrote:
> > > > > >>> > > > >
> > > > > >>> > > > > > Thanks again.
> > > > > >>> > > > > >
> > > > > >>> > > > > > But I have been polling for a while and it still
> > doesn't
> > > > > >>> merge. I
> > > > > >>> > > mean
> > > > > >>> > > > > this
> > > > > >>> > > > > > particular region example that I sent you, I am
> trying
> > to
> > > > > >>> merge it
> > > > > >>> > > > since
> > > > > >>> > > > > > yesterday. I ran the polling-base code all night and
> I
> > > have
> > > > > to
> > > > > >>> kill
> > > > > >>> > > it.
> > > > > >>> > > > > > Then in the morning, I tried manual merging through
> > hbase
> > > > > >>> shell and
> > > > > >>> > > it
> > > > > >>> > > > > > still doesn't merge. Note that the current polling
> > logic
> > > > > >>> doesnot
> > > > > >>> > try
> > > > > >>> > > to
> > > > > >>> > > > > > call merge again. It just checks the region size.
> > > > > >>> > > > > >
> > > > > >>> > > > > > So how to clean it then? Or actually make it merge?
> > Plus
> > > is
> > > > > >>> this
> > > > > >>> > > > > something
> > > > > >>> > > > > > expected (a region keeping a reference)? How can we
> > avoid
> > > > it?
> > > > > >>> > > > > >
> > > > > >>> > > > > > Note that this is not limited to this table only. We
> > are
> > > > > seeing
> > > > > >>> > this
> > > > > >>> > > in
> > > > > >>> > > > > > other regions of other tables as well. Are we merging
> > too
> > > > > fast?
> > > > > >>> > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > > > Regards,
> > > > > >>> > > > > > Shahab
> > > > > >>> > > > > >
> > > > > >>> > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <
> > > > > yuzhihong@gmail.com>
> > > > > >>> > > wrote:
> > > > > >>> > > > > >
> > > > > >>> > > > > > > Polling as you described is fine.
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > catalogJanitor.cleanMergeQualifier() is called by
> > > > > >>> > > > > > > DispatchMergingRegionHandler.
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > If clean was successful, you would see the
> following:
> > > > > >>> > > > > > >
> > > > > >>> > > > > > >       LOG.debug("Deleting region " +
> > > > > >>> > > regionA.getRegionNameAsString()
> > > > > >>> > > > +
> > > > > >>> > > > > "
> > > > > >>> > > > > > > and "
> > > > > >>> > > > > > >
> > > > > >>> > > > > > >           + regionB.getRegionNameAsString()
> > > > > >>> > > > > > >
> > > > > >>> > > > > > >           + " from fs because merged region no
> longer
> > > > holds
> > > > > >>> > > > > references");
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > Assuming there was no log below in your master log:
> > > > > >>> > > > > > >
> > > > > >>> > > > > > >       LOG.error("Merged region " +
> > > > > >>> region.getRegionNameAsString()
> > > > > >>> > > > > > >
> > > > > >>> > > > > > >           + " has only one merge qualifier in
> > META.");
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > It would be the case that
> > > > 7373f75181c71eb5061a6673cee15931
> > > > > >>> still
> > > > > >>> > > had
> > > > > >>> > > > > > > reference file.
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > Cheers
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
> > > > > >>> > > > shahab.yunus@gmail.com>
> > > > > >>> > > > > > > wrote:
> > > > > >>> > > > > > >
> > > > > >>> > > > > > > > Hi Ted.
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > > The log bit is below at the end of the email.
> This
> > is
> > > > the
> > > > > >>> > command
> > > > > >>> > > > to
> > > > > >>> > > > > > > merge
> > > > > >>> > > > > > > > that I gave just now through hbase shell.
> forcible
> > > was
> > > > > >>> false
> > > > > >>> > but
> > > > > >>> > > it
> > > > > >>> > > > > > > behaves
> > > > > >>> > > > > > > > similarly if forcible is true too. This is from
> > > master
> > > > > log.
> > > > > >>> > > Indeed
> > > > > >>> > > > > the
> > > > > >>> > > > > > > > region merging was skipped! What does this mean?
> > Data
> > > > > >>> seems to
> > > > > >>> > be
> > > > > >>> > > > > > intact
> > > > > >>> > > > > > > > for this table.
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > > Just to give you a background. This table was
> first
> > > > merge
> > > > > >>> by
> > > > > >>> > the
> > > > > >>> > > > auto
> > > > > >>> > > > > > > mated
> > > > > >>> > > > > > > > java application. What we are doing is that we
> are
> > > > > merging
> > > > > >>> > tables
> > > > > >>> > > > > > > > programmatically. As the HBaseAdmin.mergeRegions
> > > calls
> > > > i
> > > > > >>> async,
> > > > > >>> > > we
> > > > > >>> > > > > poll
> > > > > >>> > > > > > > for
> > > > > >>> > > > > > > > the number of regions getting lowered after this
> > > merge
> > > > > >>> call.
> > > > > >>> > The
> > > > > >>> > > > > > > > application hangs and continues polling for ever
> as
> > > the
> > > > > >>> > previous
> > > > > >>> > > > > merge
> > > > > >>> > > > > > > > didn't happen.
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > > In this poll loop, we do get the number of
> regions
> > > by a
> > > > > >>> fresh
> > > > > >>> > > call
> > > > > >>> > > > to
> > > > > >>> > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > > What are these merge qualifiers and what are we
> > doing
> > > > > >>> wrong or
> > > > > >>> > > > should
> > > > > >>> > > > > > do?
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > > In the polling loop we can somehow retry merge
> > again?
> > > > But
> > > > > >>> how
> > > > > >>> > can
> > > > > >>> > > > we
> > > > > >>> > > > > > > know,
> > > > > >>> > > > > > > > that we need to call merge again as it works for
> > some
> > > > > >>> regions.
> > > > > >>> > Is
> > > > > >>> > > > the
> > > > > >>> > > > > > > table
> > > > > >>> > > > > > > > meta corrupted for some reason by the above
> logic?
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > > Thanks a lot.
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > >
> ------------------------------------------------------------------------
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > > 2014-11-14 11:25:02,643 INFO
> > > > > >>> org.apache.zookeeper.ZooKeeper:
> > > > > >>> > > > Session:
> > > > > >>> > > > > > > > 0x348c7017707236b closed
> > > > > >>> > > > > > > > 2014-11-14 11:25:02,643 INFO
> > > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > > >>> > > > > > EventThread
> > > > > >>> > > > > > > > shut down
> > > > > >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > > > > >>> org.apache.zookeeper.ZooKeeper:
> > > > > >>> > > > > Initiating
> > > > > >>> > > > > > > > client connection,
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > >>> > > > > > > > sessionTimeout=60000
> > > > > >>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > >>> > > > > > > > baseZNode=/hbase
> > > > > >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > > > > >>> > > > > > > >
> > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> > > > > >>> Process
> > > > > >>> > > > > > > >
> identifier=catalogtracker-on-hconnection-0x47d865f2
> > > > > >>> connecting
> > > > > >>> > to
> > > > > >>> > > > > > > ZooKeeper
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > > >>> > > > Opening
> > > > > >>> > > > > > > > socket connection to server
> > > > > >>> > ip-1010018.ec2.internal/1010019:2181.
> > > > > >>> > > > > Will
> > > > > >>> > > > > > > not
> > > > > >>> > > > > > > > attempt to authenticate using SASL (unknown
> error)
> > > > > >>> > > > > > > > 2014-11-14 11:25:02,646 INFO
> > > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > > >>> > > > Socket
> > > > > >>> > > > > > > > connection established to
> > > > > >>> ip-1010018.ec2.internal/1010019:2181,
> > > > > >>> > > > > > > initiating
> > > > > >>> > > > > > > > session
> > > > > >>> > > > > > > > 2014-11-14 11:25:02,648 INFO
> > > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > > >>> > > > Session
> > > > > >>> > > > > > > > establishment complete on server
> > > > > >>> > > > > ip-1010018.ec2.internal/1010019:2181,
> > > > > >>> > > > > > > > sessionid = 0x348c7017707236c, negotiated
> timeout =
> > > > 60000
> > > > > >>> > > > > > > > 2014-11-14 11:25:02,703 INFO
> > > > > >>> org.apache.zookeeper.ZooKeeper:
> > > > > >>> > > > Session:
> > > > > >>> > > > > > > > 0x348c7017707236c closed
> > > > > >>> > > > > > > > 2014-11-14 11:25:02,703 INFO
> > > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > > >>> > > > > > EventThread
> > > > > >>> > > > > > > > shut down
> > > > > >>> > > > > > > > 2014-11-14 11:25:30,713 INFO
> > > > > >>> > > > > > > >
> > > > > >>> > > >
> > > > > >>>
> > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> > > > > >>> > > > > > Skip
> > > > > >>> > > > > > > > merging regions
> > > > > >>> > > > > > > >
> > > > > >>> TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > > > > >>> > > > > > > > because region 7373f75181c71eb5061a6673cee15931
> has
> > > > merge
> > > > > >>> > > qualifier
> > > > > >>> > > > > > > > 2014-11-14 11:25:41,383 INFO
> > > > > >>> org.apache.zookeeper.ZooKeeper:
> > > > > >>> > > > > Initiating
> > > > > >>> > > > > > > > client connection,
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > >>> > > > > > > > sessionTimeout=60000
> > > > > >>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > >>> > > > > > > > baseZNode=/hbase
> > > > > >>> > > > > > > > 2014-11-14 11:25:41,384 INFO
> > > > > >>> > > > > > > >
> > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> > > > > >>> Process
> > > > > >>> > > > > > > >
> identifier=catalogtracker-on-hconnection-0x47d865f2
> > > > > >>> connecting
> > > > > >>> > to
> > > > > >>> > > > > > > ZooKeeper
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > >>> > > > > > > > 2014-11-14 11:25:41,384 INFO
> > > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > > >>> > > > Opening
> > > > > >>> > > > > > > > socket connection to server
> > > > > >>> > ip-1010018.ec2.internal/1010019:2181.
> > > > > >>> > > > > Will
> > > > > >>> > > > > > > not
> > > > > >>> > > > > > > > attempt to authenticate using SASL (unknown
> error)
> > > > > >>> > > > > > > > 2014-11-14 11:25:41,386 INFO
> > > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > > >>> > > > Socket
> > > > > >>> > > > > > > > connection established to
> > > > > >>> ip-1010018.ec2.internal/1010019:2181,
> > > > > >>> > > > > > > initiating
> > > > > >>> > > > > > > > session
> > > > > >>> > > > > > > > 2014-11-14 11:25:41,389 INFO
> > > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > > >>> > > > Session
> > > > > >>> > > > > > > > establishment complete on server
> > > > > >>> > > > > ip-1010018.ec2.internal/1010019:2181,
> > > > > >>> > > > > > > > sessionid = 0x348c7017707236e, negotiated
> timeout =
> > > > 60000
> > > > > >>> > > > > > > > 2014-11-14 11:25:41,397 INFO
> > > > > >>> org.apache.zookeeper.ZooKeeper:
> > > > > >>> > > > Session:
> > > > > >>> > > > > > > > 0x348c7017707236e closed
> > > > > >>> > > > > > > > 2014-11-14 11:25:41,398 INFO
> > > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > > >>> > > > > > EventThread
> > > > > >>> > > > > > > > shut down
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > > Regards,
> > > > > >>> > > > > > > > Shahab
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <
> > > > > >>> yuzhihong@gmail.com>
> > > > > >>> > > > > wrote:
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > > > > Looking at DispatchMergingRegionHandler, it
> does
> > > some
> > > > > >>> check
> > > > > >>> > > > before
> > > > > >>> > > > > > > > > initiating the merge.
> > > > > >>> > > > > > > > > e.g.:
> > > > > >>> > > > > > > > >
> > > > > >>> > > > > > > > >       LOG.info("Skip merging regions " +
> > > > > >>> > > > > > > region_a.getRegionNameAsString()
> > > > > >>> > > > > > > > >
> > > > > >>> > > > > > > > >           + ", " +
> > > region_b.getRegionNameAsString() +
> > > > > ",
> > > > > >>> > > because
> > > > > >>> > > > > > > region "
> > > > > >>> > > > > > > > >
> > > > > >>> > > > > > > > >           + (regionAHasMergeQualifier ?
> > > > > >>> > > > region_a.getEncodedName() :
> > > > > >>> > > > > > > > > region_b
> > > > > >>> > > > > > > > >
> > > > > >>> > > > > > > > >               .getEncodedName()) + " has merge
> > > > > >>> qualifier");
> > > > > >>> > > > > > > > >
> > > > > >>> > > > > > > > > Can you take a look at master log around the
> time
> > > > merge
> > > > > >>> > request
> > > > > >>> > > > was
> > > > > >>> > > > > > > > issued
> > > > > >>> > > > > > > > > to see if you can get some clue ?
> > > > > >>> > > > > > > > >
> > > > > >>> > > > > > > > > Cheers
> > > > > >>> > > > > > > > >
> > > > > >>> > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> > > > > >>> > > > > > shahab.yunus@gmail.com>
> > > > > >>> > > > > > > > > wrote:
> > > > > >>> > > > > > > > >
> > > > > >>> > > > > > > > > > The documentation of online merge tool
> > > > (merge_region)
> > > > > >>> > states
> > > > > >>> > > > that
> > > > > >>> > > > > > if
> > > > > >>> > > > > > > we
> > > > > >>> > > > > > > > > > forcibly merge regions (by setting the 3rd
> > > > attribute
> > > > > as
> > > > > >>> > true)
> > > > > >>> > > > > then
> > > > > >>> > > > > > it
> > > > > >>> > > > > > > > can
> > > > > >>> > > > > > > > > > create overlapping regions. if this happens
> > then
> > > > will
> > > > > >>> this
> > > > > >>> > > > render
> > > > > >>> > > > > > the
> > > > > >>> > > > > > > > > > region or table unusable or it is just a
> > > > performance
> > > > > >>> hit? I
> > > > > >>> > > > mean
> > > > > >>> > > > > > how
> > > > > >>> > > > > > > > > bigger
> > > > > >>> > > > > > > > > > of a deal it is?
> > > > > >>> > > > > > > > > >
> > > > > >>> > > > > > > > > > Actually, we are merging regions using the
> > > > > >>> programmatic API
> > > > > >>> > > for
> > > > > >>> > > > > > this
> > > > > >>> > > > > > > > and
> > > > > >>> > > > > > > > > > setting this flag ('forcible') as false. But
> > for
> > > > some
> > > > > >>> > tables
> > > > > >>> > > > (we
> > > > > >>> > > > > > > > haven't
> > > > > >>> > > > > > > > > > figured out a pattern yet, data is still
> > > > accessible),
> > > > > >>> merge
> > > > > >>> > > of
> > > > > >>> > > > > > > regions
> > > > > >>> > > > > > > > do
> > > > > >>> > > > > > > > > > not happen at all. Afterwards we tried with
> > this
> > > > > flag =
> > > > > >>> > true,
> > > > > >>> > > > and
> > > > > >>> > > > > > it
> > > > > >>> > > > > > > > > still
> > > > > >>> > > > > > > > > > doesn't merge them.
> > > > > >>> > > > > > > > > >
> > > > > >>> > > > > > > > > > CDH 5.1.0
> > > > > >>> > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
> > > > > >>> > > > > > > > > >
> > > > > >>> > > > > > > > > > Regards,
> > > > > >>> > > > > > > > > > Shahab
> > > > > >>> > > > > > > > > >
> > > > > >>> > > > > > > > >
> > > > > >>> > > > > > > >
> > > > > >>> > > > > > >
> > > > > >>> > > > > >
> > > > > >>> > > > >
> > > > > >>> > > >
> > > > > >>> > >
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Forcibly merging regions

Posted by Shahab Yunus <sh...@gmail.com>.

Thanks a lot Ted!

Regards,
Shahab

On Fri, Nov 14, 2014 at 4:11 PM, Ted Yu <yu...@gmail.com> wrote:

> w.r.t. querying compaction status, please take a look at the following
> method in HBaseAdmin:
>
>   public CompactionState getCompactionState(final TableName tableName)
>
> For triggering major compaction on selected region, see:
>
>   public void majorCompactRegion(final byte[] regionName)
>
> Cheers
>
> On Fri, Nov 14, 2014 at 11:49 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > I see. Thanks.
> >
> > So we can in a way automate this resolution by invoking major compaction
> > programmatically for the 2 regions under process (or we need to do the
> > whole table?). Point being, that the merge tool, once identifies that it
> is
> > stuck in a polling loop, can invoke major compaction on the 2 regions or
> > table and then try again. Does it make sense? Plausible solution? We do
> > know that this merging, although automated, will still be run in a
> > controlled manner so chances of overstepping or synchronization issues on
> > the current table should not occur.
> >
> > But ow the question is that majorCompact is also an aync operation. So
> how
> > and when to know it has finished? :)
> >
> > Regards,
> > Shahab
> >
> > On Fri, Nov 14, 2014 at 2:34 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > This means that yesterday's compaction was not major compaction.
> > >
> > > When references get in the way of merging regions, you know that it is
> > time
> > > for major compaction.
> > >
> > > Cheers
> > >
> > > On Fri, Nov 14, 2014 at 11:31 AM, Shahab Yunus <shahab.yunus@gmail.com
> >
> > > wrote:
> > >
> > > > After major compacting the references were freed for the above
> > mentioned
> > > > regions and then the merge_region command succeeded and they got
> > merged.
> > > > Hmmm.
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > > On Fri, Nov 14, 2014 at 2:08 PM, Shahab Yunus <
> shahab.yunus@gmail.com>
> > > > wrote:
> > > >
> > > > > Digging deeper into the code, I came across this (this is from
> > > > > CatalogJanitor#cleanMergeRegion):
> > > > >
> > > > >
> > > > > ...
> > > > >
> > > > > ...
> > > > >
> > > > > HFileArchiver.archiveRegion <
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29
> > > >(this.services
> > > > <
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services
> > > >.getConfiguration
> > > > <
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29
> > > >(),
> > > > fs, regionA);
> > > > >
> > > > > HFileArchiver.archiveRegion <
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29
> > > >(this.services
> > > > <
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services
> > > >.getConfiguration
> > > > <
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29
> > > >(),
> > > > fs, regionB);
> > > > >
> > > > > MetaEditor.deleteMergeQualifiers <
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/catalog/MetaEditor.java#MetaEditor.deleteMergeQualifiers%28org.apache.hadoop.hbase.catalog.CatalogTracker%2Corg.apache.hadoop.hbase.HRegionInfo%29
> > > >(server
> > > > <
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0server
> > > >.getCatalogTracker
> > > > <
> > > >
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getCatalogTracker%28%29
> > > >(),
> > > > mergedRegion);
> > > > >
> > > > > return true;
> > > > >
> > > > >
> > > > > Do you think it is ok if we face this issue then we forcibly
> archive
> > > and
> > > > > clean the regions ?
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > > On Fri, Nov 14, 2014 at 1:10 PM, Shahab Yunus <
> > shahab.yunus@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Yesterday, I believe.
> > > > >>
> > > > >> Regards,
> > > > >> Shahab
> > > > >>
> > > > >> On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > >>
> > > > >>> Shahab:
> > > > >>> When was the last time compaction was run on this table ?
> > > > >>>
> > > > >>> Cheers
> > > > >>>
> > > > >>> On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <
> > > shahab.yunus@gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>> > I see. Thanks.
> > > > >>> >
> > > > >>> > And if the region indeed has references, then can we somehow
> > > forcibly
> > > > >>> > remove them? Is this even possible (if not advisable)?
> Basically
> > > what
> > > > >>> I am
> > > > >>> > trying to ask is that let us say we do hit this scenario and we
> > > know
> > > > >>> it is
> > > > >>> > OK to go ahead and merge. What steps can we follow after
> > detection
> > > of
> > > > >>> such
> > > > >>> > unwanted references.
> > > > >>> >
> > > > >>> > Regards,
> > > > >>> > Shahab
> > > > >>> >
> > > > >>> > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <yu...@gmail.com>
> > > > wrote:
> > > > >>> >
> > > > >>> > > For automated detection of such scenario, you can reference
> the
> > > > code
> > > > >>> in
> > > > >>> > > CatalogJanitor#cleanMergeRegion():
> > > > >>> > >
> > > > >>> > >       regionFs = HRegionFileSystem.openRegionFromFileSystem(
> > > > >>> > >
> > > > >>> > >           this.services.getConfiguration(), fs, tabledir,
> > > > >>> mergedRegion,
> > > > >>> > > true
> > > > >>> > > );
> > > > >>> > >
> > > > >>> > > ...
> > > > >>> > >
> > > > >>> > > Then regionFs.hasReferences(htd) would tell you whether the
> > > > >>> underlying
> > > > >>> > > region has reference files.
> > > > >>> > > Cheers
> > > > >>> > >
> > > > >>> > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <
> > > > >>> shahab.yunus@gmail.com>
> > > > >>> > > wrote:
> > > > >>> > >
> > > > >>> > > > No. Not that I can recall but I can check.
> > > > >>> > > >
> > > > >>> > > > From resolution perspective, is there any way we can
> resolve
> > > > this.
> > > > >>> More
> > > > >>> > > > importantly, anyway we can automate the resolution, if we
> run
> > > > into
> > > > >>> such
> > > > >>> > > > issues in future? 'Cleaning the qualifier', that is.
> > > > >>> > > >
> > > > >>> > > > Regards,
> > > > >>> > > > Shahab
> > > > >>> > > >
> > > > >>> > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <
> > yuzhihong@gmail.com>
> > > > >>> wrote:
> > > > >>> > > >
> > > > >>> > > > > One possibility was that region
> > > > 7373f75181c71eb5061a6673cee15931
> > > > >>> was
> > > > >>> > > > > involved in some hbase snapshot.
> > > > >>> > > > >
> > > > >>> > > > > Was the underlying table being snapshotted in recent
> past ?
> > > > >>> > > > >
> > > > >>> > > > > Cheers
> > > > >>> > > > >
> > > > >>> > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
> > > > >>> > shahab.yunus@gmail.com>
> > > > >>> > > > > wrote:
> > > > >>> > > > >
> > > > >>> > > > > > Thanks again.
> > > > >>> > > > > >
> > > > >>> > > > > > But I have been polling for a while and it still
> doesn't
> > > > >>> merge. I
> > > > >>> > > mean
> > > > >>> > > > > this
> > > > >>> > > > > > particular region example that I sent you, I am trying
> to
> > > > >>> merge it
> > > > >>> > > > since
> > > > >>> > > > > > yesterday. I ran the polling-base code all night and I
> > have
> > > > to
> > > > >>> kill
> > > > >>> > > it.
> > > > >>> > > > > > Then in the morning, I tried manual merging through
> hbase
> > > > >>> shell and
> > > > >>> > > it
> > > > >>> > > > > > still doesn't merge. Note that the current polling
> logic
> > > > >>> doesnot
> > > > >>> > try
> > > > >>> > > to
> > > > >>> > > > > > call merge again. It just checks the region size.
> > > > >>> > > > > >
> > > > >>> > > > > > So how to clean it then? Or actually make it merge?
> Plus
> > is
> > > > >>> this
> > > > >>> > > > > something
> > > > >>> > > > > > expected (a region keeping a reference)? How can we
> avoid
> > > it?
> > > > >>> > > > > >
> > > > >>> > > > > > Note that this is not limited to this table only. We
> are
> > > > seeing
> > > > >>> > this
> > > > >>> > > in
> > > > >>> > > > > > other regions of other tables as well. Are we merging
> too
> > > > fast?
> > > > >>> > > > > >
> > > > >>> > > > > >
> > > > >>> > > > > >
> > > > >>> > > > > > Regards,
> > > > >>> > > > > > Shahab
> > > > >>> > > > > >
> > > > >>> > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <
> > > > yuzhihong@gmail.com>
> > > > >>> > > wrote:
> > > > >>> > > > > >
> > > > >>> > > > > > > Polling as you described is fine.
> > > > >>> > > > > > >
> > > > >>> > > > > > > catalogJanitor.cleanMergeQualifier() is called by
> > > > >>> > > > > > > DispatchMergingRegionHandler.
> > > > >>> > > > > > >
> > > > >>> > > > > > > If clean was successful, you would see the following:
> > > > >>> > > > > > >
> > > > >>> > > > > > >       LOG.debug("Deleting region " +
> > > > >>> > > regionA.getRegionNameAsString()
> > > > >>> > > > +
> > > > >>> > > > > "
> > > > >>> > > > > > > and "
> > > > >>> > > > > > >
> > > > >>> > > > > > >           + regionB.getRegionNameAsString()
> > > > >>> > > > > > >
> > > > >>> > > > > > >           + " from fs because merged region no longer
> > > holds
> > > > >>> > > > > references");
> > > > >>> > > > > > >
> > > > >>> > > > > > > Assuming there was no log below in your master log:
> > > > >>> > > > > > >
> > > > >>> > > > > > >       LOG.error("Merged region " +
> > > > >>> region.getRegionNameAsString()
> > > > >>> > > > > > >
> > > > >>> > > > > > >           + " has only one merge qualifier in
> META.");
> > > > >>> > > > > > >
> > > > >>> > > > > > > It would be the case that
> > > 7373f75181c71eb5061a6673cee15931
> > > > >>> still
> > > > >>> > > had
> > > > >>> > > > > > > reference file.
> > > > >>> > > > > > >
> > > > >>> > > > > > > Cheers
> > > > >>> > > > > > >
> > > > >>> > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
> > > > >>> > > > shahab.yunus@gmail.com>
> > > > >>> > > > > > > wrote:
> > > > >>> > > > > > >
> > > > >>> > > > > > > > Hi Ted.
> > > > >>> > > > > > > >
> > > > >>> > > > > > > > The log bit is below at the end of the email. This
> is
> > > the
> > > > >>> > command
> > > > >>> > > > to
> > > > >>> > > > > > > merge
> > > > >>> > > > > > > > that I gave just now through hbase shell. forcible
> > was
> > > > >>> false
> > > > >>> > but
> > > > >>> > > it
> > > > >>> > > > > > > behaves
> > > > >>> > > > > > > > similarly if forcible is true too. This is from
> > master
> > > > log.
> > > > >>> > > Indeed
> > > > >>> > > > > the
> > > > >>> > > > > > > > region merging was skipped! What does this mean?
> Data
> > > > >>> seems to
> > > > >>> > be
> > > > >>> > > > > > intact
> > > > >>> > > > > > > > for this table.
> > > > >>> > > > > > > >
> > > > >>> > > > > > > > Just to give you a background. This table was first
> > > merge
> > > > >>> by
> > > > >>> > the
> > > > >>> > > > auto
> > > > >>> > > > > > > mated
> > > > >>> > > > > > > > java application. What we are doing is that we are
> > > > merging
> > > > >>> > tables
> > > > >>> > > > > > > > programmatically. As the HBaseAdmin.mergeRegions
> > calls
> > > i
> > > > >>> async,
> > > > >>> > > we
> > > > >>> > > > > poll
> > > > >>> > > > > > > for
> > > > >>> > > > > > > > the number of regions getting lowered after this
> > merge
> > > > >>> call.
> > > > >>> > The
> > > > >>> > > > > > > > application hangs and continues polling for ever as
> > the
> > > > >>> > previous
> > > > >>> > > > > merge
> > > > >>> > > > > > > > didn't happen.
> > > > >>> > > > > > > >
> > > > >>> > > > > > > > In this poll loop, we do get the number of regions
> > by a
> > > > >>> fresh
> > > > >>> > > call
> > > > >>> > > > to
> > > > >>> > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
> > > > >>> > > > > > > >
> > > > >>> > > > > > > > What are these merge qualifiers and what are we
> doing
> > > > >>> wrong or
> > > > >>> > > > should
> > > > >>> > > > > > do?
> > > > >>> > > > > > > >
> > > > >>> > > > > > > > In the polling loop we can somehow retry merge
> again?
> > > But
> > > > >>> how
> > > > >>> > can
> > > > >>> > > > we
> > > > >>> > > > > > > know,
> > > > >>> > > > > > > > that we need to call merge again as it works for
> some
> > > > >>> regions.
> > > > >>> > Is
> > > > >>> > > > the
> > > > >>> > > > > > > table
> > > > >>> > > > > > > > meta corrupted for some reason by the above logic?
> > > > >>> > > > > > > >
> > > > >>> > > > > > > > Thanks a lot.
> > > > >>> > > > > > > >
> > > > >>> > > > > > > >
> > > > >>> > > > > > > >
> > > > >>> > > > > > > >
> > > > >>> > > > > >
> > > > >>> > > >
> > > > >>> >
> > > > >>>
> > > >
> > ------------------------------------------------------------------------
> > > > >>> > > > > > > >
> > > > >>> > > > > > > > 2014-11-14 11:25:02,643 INFO
> > > > >>> org.apache.zookeeper.ZooKeeper:
> > > > >>> > > > Session:
> > > > >>> > > > > > > > 0x348c7017707236b closed
> > > > >>> > > > > > > > 2014-11-14 11:25:02,643 INFO
> > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > >>> > > > > > EventThread
> > > > >>> > > > > > > > shut down
> > > > >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > > > >>> org.apache.zookeeper.ZooKeeper:
> > > > >>> > > > > Initiating
> > > > >>> > > > > > > > client connection,
> > > > >>> > > > > > > >
> > > > >>> > > > > > > >
> > > > >>> > > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > >>> > > > > > > > sessionTimeout=60000
> > > > >>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > >>> > > > > > > >
> > > > >>> > > > > > > >
> > > > >>> > > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > >>> > > > > > > > baseZNode=/hbase
> > > > >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > > > >>> > > > > > > >
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> > > > >>> Process
> > > > >>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
> > > > >>> connecting
> > > > >>> > to
> > > > >>> > > > > > > ZooKeeper
> > > > >>> > > > > > > >
> > > > >>> > > > > > > >
> > > > >>> > > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > >>> > > > Opening
> > > > >>> > > > > > > > socket connection to server
> > > > >>> > ip-1010018.ec2.internal/1010019:2181.
> > > > >>> > > > > Will
> > > > >>> > > > > > > not
> > > > >>> > > > > > > > attempt to authenticate using SASL (unknown error)
> > > > >>> > > > > > > > 2014-11-14 11:25:02,646 INFO
> > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > >>> > > > Socket
> > > > >>> > > > > > > > connection established to
> > > > >>> ip-1010018.ec2.internal/1010019:2181,
> > > > >>> > > > > > > initiating
> > > > >>> > > > > > > > session
> > > > >>> > > > > > > > 2014-11-14 11:25:02,648 INFO
> > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > >>> > > > Session
> > > > >>> > > > > > > > establishment complete on server
> > > > >>> > > > > ip-1010018.ec2.internal/1010019:2181,
> > > > >>> > > > > > > > sessionid = 0x348c7017707236c, negotiated timeout =
> > > 60000
> > > > >>> > > > > > > > 2014-11-14 11:25:02,703 INFO
> > > > >>> org.apache.zookeeper.ZooKeeper:
> > > > >>> > > > Session:
> > > > >>> > > > > > > > 0x348c7017707236c closed
> > > > >>> > > > > > > > 2014-11-14 11:25:02,703 INFO
> > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > >>> > > > > > EventThread
> > > > >>> > > > > > > > shut down
> > > > >>> > > > > > > > 2014-11-14 11:25:30,713 INFO
> > > > >>> > > > > > > >
> > > > >>> > > >
> > > > >>>
> > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> > > > >>> > > > > > Skip
> > > > >>> > > > > > > > merging regions
> > > > >>> > > > > > > >
> > > > >>> TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > > > >>> > > > > > > >
> > > > >>> > > > > > > >
> > > > >>> > > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > > > >>> > > > > > > > because region 7373f75181c71eb5061a6673cee15931 has
> > > merge
> > > > >>> > > qualifier
> > > > >>> > > > > > > > 2014-11-14 11:25:41,383 INFO
> > > > >>> org.apache.zookeeper.ZooKeeper:
> > > > >>> > > > > Initiating
> > > > >>> > > > > > > > client connection,
> > > > >>> > > > > > > >
> > > > >>> > > > > > > >
> > > > >>> > > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > >>> > > > > > > > sessionTimeout=60000
> > > > >>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > >>> > > > > > > >
> > > > >>> > > > > > > >
> > > > >>> > > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > >>> > > > > > > > baseZNode=/hbase
> > > > >>> > > > > > > > 2014-11-14 11:25:41,384 INFO
> > > > >>> > > > > > > >
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> > > > >>> Process
> > > > >>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
> > > > >>> connecting
> > > > >>> > to
> > > > >>> > > > > > > ZooKeeper
> > > > >>> > > > > > > >
> > > > >>> > > > > > > >
> > > > >>> > > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > >>> > > > > > > > 2014-11-14 11:25:41,384 INFO
> > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > >>> > > > Opening
> > > > >>> > > > > > > > socket connection to server
> > > > >>> > ip-1010018.ec2.internal/1010019:2181.
> > > > >>> > > > > Will
> > > > >>> > > > > > > not
> > > > >>> > > > > > > > attempt to authenticate using SASL (unknown error)
> > > > >>> > > > > > > > 2014-11-14 11:25:41,386 INFO
> > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > >>> > > > Socket
> > > > >>> > > > > > > > connection established to
> > > > >>> ip-1010018.ec2.internal/1010019:2181,
> > > > >>> > > > > > > initiating
> > > > >>> > > > > > > > session
> > > > >>> > > > > > > > 2014-11-14 11:25:41,389 INFO
> > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > >>> > > > Session
> > > > >>> > > > > > > > establishment complete on server
> > > > >>> > > > > ip-1010018.ec2.internal/1010019:2181,
> > > > >>> > > > > > > > sessionid = 0x348c7017707236e, negotiated timeout =
> > > 60000
> > > > >>> > > > > > > > 2014-11-14 11:25:41,397 INFO
> > > > >>> org.apache.zookeeper.ZooKeeper:
> > > > >>> > > > Session:
> > > > >>> > > > > > > > 0x348c7017707236e closed
> > > > >>> > > > > > > > 2014-11-14 11:25:41,398 INFO
> > > > >>> org.apache.zookeeper.ClientCnxn:
> > > > >>> > > > > > EventThread
> > > > >>> > > > > > > > shut down
> > > > >>> > > > > > > >
> > > > >>> > > > > > > >
> > > > >>> > > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > > > >>> > > > > > > >
> > > > >>> > > > > > > > Regards,
> > > > >>> > > > > > > > Shahab
> > > > >>> > > > > > > >
> > > > >>> > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <
> > > > >>> yuzhihong@gmail.com>
> > > > >>> > > > > wrote:
> > > > >>> > > > > > > >
> > > > >>> > > > > > > > > Looking at DispatchMergingRegionHandler, it does
> > some
> > > > >>> check
> > > > >>> > > > before
> > > > >>> > > > > > > > > initiating the merge.
> > > > >>> > > > > > > > > e.g.:
> > > > >>> > > > > > > > >
> > > > >>> > > > > > > > >       LOG.info("Skip merging regions " +
> > > > >>> > > > > > > region_a.getRegionNameAsString()
> > > > >>> > > > > > > > >
> > > > >>> > > > > > > > >           + ", " +
> > region_b.getRegionNameAsString() +
> > > > ",
> > > > >>> > > because
> > > > >>> > > > > > > region "
> > > > >>> > > > > > > > >
> > > > >>> > > > > > > > >           + (regionAHasMergeQualifier ?
> > > > >>> > > > region_a.getEncodedName() :
> > > > >>> > > > > > > > > region_b
> > > > >>> > > > > > > > >
> > > > >>> > > > > > > > >               .getEncodedName()) + " has merge
> > > > >>> qualifier");
> > > > >>> > > > > > > > >
> > > > >>> > > > > > > > > Can you take a look at master log around the time
> > > merge
> > > > >>> > request
> > > > >>> > > > was
> > > > >>> > > > > > > > issued
> > > > >>> > > > > > > > > to see if you can get some clue ?
> > > > >>> > > > > > > > >
> > > > >>> > > > > > > > > Cheers
> > > > >>> > > > > > > > >
> > > > >>> > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> > > > >>> > > > > > shahab.yunus@gmail.com>
> > > > >>> > > > > > > > > wrote:
> > > > >>> > > > > > > > >
> > > > >>> > > > > > > > > > The documentation of online merge tool
> > > (merge_region)
> > > > >>> > states
> > > > >>> > > > that
> > > > >>> > > > > > if
> > > > >>> > > > > > > we
> > > > >>> > > > > > > > > > forcibly merge regions (by setting the 3rd
> > > attribute
> > > > as
> > > > >>> > true)
> > > > >>> > > > > then
> > > > >>> > > > > > it
> > > > >>> > > > > > > > can
> > > > >>> > > > > > > > > > create overlapping regions. if this happens
> then
> > > will
> > > > >>> this
> > > > >>> > > > render
> > > > >>> > > > > > the
> > > > >>> > > > > > > > > > region or table unusable or it is just a
> > > performance
> > > > >>> hit? I
> > > > >>> > > > mean
> > > > >>> > > > > > how
> > > > >>> > > > > > > > > bigger
> > > > >>> > > > > > > > > > of a deal it is?
> > > > >>> > > > > > > > > >
> > > > >>> > > > > > > > > > Actually, we are merging regions using the
> > > > >>> programmatic API
> > > > >>> > > for
> > > > >>> > > > > > this
> > > > >>> > > > > > > > and
> > > > >>> > > > > > > > > > setting this flag ('forcible') as false. But
> for
> > > some
> > > > >>> > tables
> > > > >>> > > > (we
> > > > >>> > > > > > > > haven't
> > > > >>> > > > > > > > > > figured out a pattern yet, data is still
> > > accessible),
> > > > >>> merge
> > > > >>> > > of
> > > > >>> > > > > > > regions
> > > > >>> > > > > > > > do
> > > > >>> > > > > > > > > > not happen at all. Afterwards we tried with
> this
> > > > flag =
> > > > >>> > true,
> > > > >>> > > > and
> > > > >>> > > > > > it
> > > > >>> > > > > > > > > still
> > > > >>> > > > > > > > > > doesn't merge them.
> > > > >>> > > > > > > > > >
> > > > >>> > > > > > > > > > CDH 5.1.0
> > > > >>> > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
> > > > >>> > > > > > > > > >
> > > > >>> > > > > > > > > > Regards,
> > > > >>> > > > > > > > > > Shahab
> > > > >>> > > > > > > > > >
> > > > >>> > > > > > > > >
> > > > >>> > > > > > > >
> > > > >>> > > > > > >
> > > > >>> > > > > >
> > > > >>> > > > >
> > > > >>> > > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Forcibly merging regions

Posted by Ted Yu <yu...@gmail.com>.

w.r.t. querying compaction status, please take a look at the following
method in HBaseAdmin:

  public CompactionState getCompactionState(final TableName tableName)

For triggering major compaction on selected region, see:

  public void majorCompactRegion(final byte[] regionName)

Cheers

On Fri, Nov 14, 2014 at 11:49 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> I see. Thanks.
>
> So we can in a way automate this resolution by invoking major compaction
> programmatically for the 2 regions under process (or we need to do the
> whole table?). Point being, that the merge tool, once identifies that it is
> stuck in a polling loop, can invoke major compaction on the 2 regions or
> table and then try again. Does it make sense? Plausible solution? We do
> know that this merging, although automated, will still be run in a
> controlled manner so chances of overstepping or synchronization issues on
> the current table should not occur.
>
> But ow the question is that majorCompact is also an aync operation. So how
> and when to know it has finished? :)
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 2:34 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > This means that yesterday's compaction was not major compaction.
> >
> > When references get in the way of merging regions, you know that it is
> time
> > for major compaction.
> >
> > Cheers
> >
> > On Fri, Nov 14, 2014 at 11:31 AM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> > > After major compacting the references were freed for the above
> mentioned
> > > regions and then the merge_region command succeeded and they got
> merged.
> > > Hmmm.
> > >
> > > Regards,
> > > Shahab
> > >
> > > On Fri, Nov 14, 2014 at 2:08 PM, Shahab Yunus <sh...@gmail.com>
> > > wrote:
> > >
> > > > Digging deeper into the code, I came across this (this is from
> > > > CatalogJanitor#cleanMergeRegion):
> > > >
> > > >
> > > > ...
> > > >
> > > > ...
> > > >
> > > > HFileArchiver.archiveRegion <
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29
> > >(this.services
> > > <
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services
> > >.getConfiguration
> > > <
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29
> > >(),
> > > fs, regionA);
> > > >
> > > > HFileArchiver.archiveRegion <
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29
> > >(this.services
> > > <
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services
> > >.getConfiguration
> > > <
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29
> > >(),
> > > fs, regionB);
> > > >
> > > > MetaEditor.deleteMergeQualifiers <
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/catalog/MetaEditor.java#MetaEditor.deleteMergeQualifiers%28org.apache.hadoop.hbase.catalog.CatalogTracker%2Corg.apache.hadoop.hbase.HRegionInfo%29
> > >(server
> > > <
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0server
> > >.getCatalogTracker
> > > <
> > >
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getCatalogTracker%28%29
> > >(),
> > > mergedRegion);
> > > >
> > > > return true;
> > > >
> > > >
> > > > Do you think it is ok if we face this issue then we forcibly archive
> > and
> > > > clean the regions ?
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > > On Fri, Nov 14, 2014 at 1:10 PM, Shahab Yunus <
> shahab.yunus@gmail.com>
> > > > wrote:
> > > >
> > > >> Yesterday, I believe.
> > > >>
> > > >> Regards,
> > > >> Shahab
> > > >>
> > > >> On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >>
> > > >>> Shahab:
> > > >>> When was the last time compaction was run on this table ?
> > > >>>
> > > >>> Cheers
> > > >>>
> > > >>> On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <
> > shahab.yunus@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>> > I see. Thanks.
> > > >>> >
> > > >>> > And if the region indeed has references, then can we somehow
> > forcibly
> > > >>> > remove them? Is this even possible (if not advisable)? Basically
> > what
> > > >>> I am
> > > >>> > trying to ask is that let us say we do hit this scenario and we
> > know
> > > >>> it is
> > > >>> > OK to go ahead and merge. What steps can we follow after
> detection
> > of
> > > >>> such
> > > >>> > unwanted references.
> > > >>> >
> > > >>> > Regards,
> > > >>> > Shahab
> > > >>> >
> > > >>> > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > >>> >
> > > >>> > > For automated detection of such scenario, you can reference the
> > > code
> > > >>> in
> > > >>> > > CatalogJanitor#cleanMergeRegion():
> > > >>> > >
> > > >>> > >       regionFs = HRegionFileSystem.openRegionFromFileSystem(
> > > >>> > >
> > > >>> > >           this.services.getConfiguration(), fs, tabledir,
> > > >>> mergedRegion,
> > > >>> > > true
> > > >>> > > );
> > > >>> > >
> > > >>> > > ...
> > > >>> > >
> > > >>> > > Then regionFs.hasReferences(htd) would tell you whether the
> > > >>> underlying
> > > >>> > > region has reference files.
> > > >>> > > Cheers
> > > >>> > >
> > > >>> > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <
> > > >>> shahab.yunus@gmail.com>
> > > >>> > > wrote:
> > > >>> > >
> > > >>> > > > No. Not that I can recall but I can check.
> > > >>> > > >
> > > >>> > > > From resolution perspective, is there any way we can resolve
> > > this.
> > > >>> More
> > > >>> > > > importantly, anyway we can automate the resolution, if we run
> > > into
> > > >>> such
> > > >>> > > > issues in future? 'Cleaning the qualifier', that is.
> > > >>> > > >
> > > >>> > > > Regards,
> > > >>> > > > Shahab
> > > >>> > > >
> > > >>> > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <
> yuzhihong@gmail.com>
> > > >>> wrote:
> > > >>> > > >
> > > >>> > > > > One possibility was that region
> > > 7373f75181c71eb5061a6673cee15931
> > > >>> was
> > > >>> > > > > involved in some hbase snapshot.
> > > >>> > > > >
> > > >>> > > > > Was the underlying table being snapshotted in recent past ?
> > > >>> > > > >
> > > >>> > > > > Cheers
> > > >>> > > > >
> > > >>> > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
> > > >>> > shahab.yunus@gmail.com>
> > > >>> > > > > wrote:
> > > >>> > > > >
> > > >>> > > > > > Thanks again.
> > > >>> > > > > >
> > > >>> > > > > > But I have been polling for a while and it still doesn't
> > > >>> merge. I
> > > >>> > > mean
> > > >>> > > > > this
> > > >>> > > > > > particular region example that I sent you, I am trying to
> > > >>> merge it
> > > >>> > > > since
> > > >>> > > > > > yesterday. I ran the polling-base code all night and I
> have
> > > to
> > > >>> kill
> > > >>> > > it.
> > > >>> > > > > > Then in the morning, I tried manual merging through hbase
> > > >>> shell and
> > > >>> > > it
> > > >>> > > > > > still doesn't merge. Note that the current polling logic
> > > >>> doesnot
> > > >>> > try
> > > >>> > > to
> > > >>> > > > > > call merge again. It just checks the region size.
> > > >>> > > > > >
> > > >>> > > > > > So how to clean it then? Or actually make it merge? Plus
> is
> > > >>> this
> > > >>> > > > > something
> > > >>> > > > > > expected (a region keeping a reference)? How can we avoid
> > it?
> > > >>> > > > > >
> > > >>> > > > > > Note that this is not limited to this table only. We are
> > > seeing
> > > >>> > this
> > > >>> > > in
> > > >>> > > > > > other regions of other tables as well. Are we merging too
> > > fast?
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > > Regards,
> > > >>> > > > > > Shahab
> > > >>> > > > > >
> > > >>> > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <
> > > yuzhihong@gmail.com>
> > > >>> > > wrote:
> > > >>> > > > > >
> > > >>> > > > > > > Polling as you described is fine.
> > > >>> > > > > > >
> > > >>> > > > > > > catalogJanitor.cleanMergeQualifier() is called by
> > > >>> > > > > > > DispatchMergingRegionHandler.
> > > >>> > > > > > >
> > > >>> > > > > > > If clean was successful, you would see the following:
> > > >>> > > > > > >
> > > >>> > > > > > >       LOG.debug("Deleting region " +
> > > >>> > > regionA.getRegionNameAsString()
> > > >>> > > > +
> > > >>> > > > > "
> > > >>> > > > > > > and "
> > > >>> > > > > > >
> > > >>> > > > > > >           + regionB.getRegionNameAsString()
> > > >>> > > > > > >
> > > >>> > > > > > >           + " from fs because merged region no longer
> > holds
> > > >>> > > > > references");
> > > >>> > > > > > >
> > > >>> > > > > > > Assuming there was no log below in your master log:
> > > >>> > > > > > >
> > > >>> > > > > > >       LOG.error("Merged region " +
> > > >>> region.getRegionNameAsString()
> > > >>> > > > > > >
> > > >>> > > > > > >           + " has only one merge qualifier in META.");
> > > >>> > > > > > >
> > > >>> > > > > > > It would be the case that
> > 7373f75181c71eb5061a6673cee15931
> > > >>> still
> > > >>> > > had
> > > >>> > > > > > > reference file.
> > > >>> > > > > > >
> > > >>> > > > > > > Cheers
> > > >>> > > > > > >
> > > >>> > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
> > > >>> > > > shahab.yunus@gmail.com>
> > > >>> > > > > > > wrote:
> > > >>> > > > > > >
> > > >>> > > > > > > > Hi Ted.
> > > >>> > > > > > > >
> > > >>> > > > > > > > The log bit is below at the end of the email. This is
> > the
> > > >>> > command
> > > >>> > > > to
> > > >>> > > > > > > merge
> > > >>> > > > > > > > that I gave just now through hbase shell. forcible
> was
> > > >>> false
> > > >>> > but
> > > >>> > > it
> > > >>> > > > > > > behaves
> > > >>> > > > > > > > similarly if forcible is true too. This is from
> master
> > > log.
> > > >>> > > Indeed
> > > >>> > > > > the
> > > >>> > > > > > > > region merging was skipped! What does this mean? Data
> > > >>> seems to
> > > >>> > be
> > > >>> > > > > > intact
> > > >>> > > > > > > > for this table.
> > > >>> > > > > > > >
> > > >>> > > > > > > > Just to give you a background. This table was first
> > merge
> > > >>> by
> > > >>> > the
> > > >>> > > > auto
> > > >>> > > > > > > mated
> > > >>> > > > > > > > java application. What we are doing is that we are
> > > merging
> > > >>> > tables
> > > >>> > > > > > > > programmatically. As the HBaseAdmin.mergeRegions
> calls
> > i
> > > >>> async,
> > > >>> > > we
> > > >>> > > > > poll
> > > >>> > > > > > > for
> > > >>> > > > > > > > the number of regions getting lowered after this
> merge
> > > >>> call.
> > > >>> > The
> > > >>> > > > > > > > application hangs and continues polling for ever as
> the
> > > >>> > previous
> > > >>> > > > > merge
> > > >>> > > > > > > > didn't happen.
> > > >>> > > > > > > >
> > > >>> > > > > > > > In this poll loop, we do get the number of regions
> by a
> > > >>> fresh
> > > >>> > > call
> > > >>> > > > to
> > > >>> > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
> > > >>> > > > > > > >
> > > >>> > > > > > > > What are these merge qualifiers and what are we doing
> > > >>> wrong or
> > > >>> > > > should
> > > >>> > > > > > do?
> > > >>> > > > > > > >
> > > >>> > > > > > > > In the polling loop we can somehow retry merge again?
> > But
> > > >>> how
> > > >>> > can
> > > >>> > > > we
> > > >>> > > > > > > know,
> > > >>> > > > > > > > that we need to call merge again as it works for some
> > > >>> regions.
> > > >>> > Is
> > > >>> > > > the
> > > >>> > > > > > > table
> > > >>> > > > > > > > meta corrupted for some reason by the above logic?
> > > >>> > > > > > > >
> > > >>> > > > > > > > Thanks a lot.
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > >
> > > >>> > > >
> > > >>> >
> > > >>>
> > >
> ------------------------------------------------------------------------
> > > >>> > > > > > > >
> > > >>> > > > > > > > 2014-11-14 11:25:02,643 INFO
> > > >>> org.apache.zookeeper.ZooKeeper:
> > > >>> > > > Session:
> > > >>> > > > > > > > 0x348c7017707236b closed
> > > >>> > > > > > > > 2014-11-14 11:25:02,643 INFO
> > > >>> org.apache.zookeeper.ClientCnxn:
> > > >>> > > > > > EventThread
> > > >>> > > > > > > > shut down
> > > >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > > >>> org.apache.zookeeper.ZooKeeper:
> > > >>> > > > > Initiating
> > > >>> > > > > > > > client connection,
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > >>> > > > > > > > sessionTimeout=60000
> > > >>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > >>> > > > > > > > baseZNode=/hbase
> > > >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > > >>> > > > > > > >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> > > >>> Process
> > > >>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
> > > >>> connecting
> > > >>> > to
> > > >>> > > > > > > ZooKeeper
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > > >>> org.apache.zookeeper.ClientCnxn:
> > > >>> > > > Opening
> > > >>> > > > > > > > socket connection to server
> > > >>> > ip-1010018.ec2.internal/1010019:2181.
> > > >>> > > > > Will
> > > >>> > > > > > > not
> > > >>> > > > > > > > attempt to authenticate using SASL (unknown error)
> > > >>> > > > > > > > 2014-11-14 11:25:02,646 INFO
> > > >>> org.apache.zookeeper.ClientCnxn:
> > > >>> > > > Socket
> > > >>> > > > > > > > connection established to
> > > >>> ip-1010018.ec2.internal/1010019:2181,
> > > >>> > > > > > > initiating
> > > >>> > > > > > > > session
> > > >>> > > > > > > > 2014-11-14 11:25:02,648 INFO
> > > >>> org.apache.zookeeper.ClientCnxn:
> > > >>> > > > Session
> > > >>> > > > > > > > establishment complete on server
> > > >>> > > > > ip-1010018.ec2.internal/1010019:2181,
> > > >>> > > > > > > > sessionid = 0x348c7017707236c, negotiated timeout =
> > 60000
> > > >>> > > > > > > > 2014-11-14 11:25:02,703 INFO
> > > >>> org.apache.zookeeper.ZooKeeper:
> > > >>> > > > Session:
> > > >>> > > > > > > > 0x348c7017707236c closed
> > > >>> > > > > > > > 2014-11-14 11:25:02,703 INFO
> > > >>> org.apache.zookeeper.ClientCnxn:
> > > >>> > > > > > EventThread
> > > >>> > > > > > > > shut down
> > > >>> > > > > > > > 2014-11-14 11:25:30,713 INFO
> > > >>> > > > > > > >
> > > >>> > > >
> > > >>>
> org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> > > >>> > > > > > Skip
> > > >>> > > > > > > > merging regions
> > > >>> > > > > > > >
> > > >>> TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > > >>> > > > > > > > because region 7373f75181c71eb5061a6673cee15931 has
> > merge
> > > >>> > > qualifier
> > > >>> > > > > > > > 2014-11-14 11:25:41,383 INFO
> > > >>> org.apache.zookeeper.ZooKeeper:
> > > >>> > > > > Initiating
> > > >>> > > > > > > > client connection,
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > >>> > > > > > > > sessionTimeout=60000
> > > >>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > >>> > > > > > > > baseZNode=/hbase
> > > >>> > > > > > > > 2014-11-14 11:25:41,384 INFO
> > > >>> > > > > > > >
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> > > >>> Process
> > > >>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
> > > >>> connecting
> > > >>> > to
> > > >>> > > > > > > ZooKeeper
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > >>> > > > > > > > 2014-11-14 11:25:41,384 INFO
> > > >>> org.apache.zookeeper.ClientCnxn:
> > > >>> > > > Opening
> > > >>> > > > > > > > socket connection to server
> > > >>> > ip-1010018.ec2.internal/1010019:2181.
> > > >>> > > > > Will
> > > >>> > > > > > > not
> > > >>> > > > > > > > attempt to authenticate using SASL (unknown error)
> > > >>> > > > > > > > 2014-11-14 11:25:41,386 INFO
> > > >>> org.apache.zookeeper.ClientCnxn:
> > > >>> > > > Socket
> > > >>> > > > > > > > connection established to
> > > >>> ip-1010018.ec2.internal/1010019:2181,
> > > >>> > > > > > > initiating
> > > >>> > > > > > > > session
> > > >>> > > > > > > > 2014-11-14 11:25:41,389 INFO
> > > >>> org.apache.zookeeper.ClientCnxn:
> > > >>> > > > Session
> > > >>> > > > > > > > establishment complete on server
> > > >>> > > > > ip-1010018.ec2.internal/1010019:2181,
> > > >>> > > > > > > > sessionid = 0x348c7017707236e, negotiated timeout =
> > 60000
> > > >>> > > > > > > > 2014-11-14 11:25:41,397 INFO
> > > >>> org.apache.zookeeper.ZooKeeper:
> > > >>> > > > Session:
> > > >>> > > > > > > > 0x348c7017707236e closed
> > > >>> > > > > > > > 2014-11-14 11:25:41,398 INFO
> > > >>> org.apache.zookeeper.ClientCnxn:
> > > >>> > > > > > EventThread
> > > >>> > > > > > > > shut down
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > > >>> > > > > > > >
> > > >>> > > > > > > > Regards,
> > > >>> > > > > > > > Shahab
> > > >>> > > > > > > >
> > > >>> > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <
> > > >>> yuzhihong@gmail.com>
> > > >>> > > > > wrote:
> > > >>> > > > > > > >
> > > >>> > > > > > > > > Looking at DispatchMergingRegionHandler, it does
> some
> > > >>> check
> > > >>> > > > before
> > > >>> > > > > > > > > initiating the merge.
> > > >>> > > > > > > > > e.g.:
> > > >>> > > > > > > > >
> > > >>> > > > > > > > >       LOG.info("Skip merging regions " +
> > > >>> > > > > > > region_a.getRegionNameAsString()
> > > >>> > > > > > > > >
> > > >>> > > > > > > > >           + ", " +
> region_b.getRegionNameAsString() +
> > > ",
> > > >>> > > because
> > > >>> > > > > > > region "
> > > >>> > > > > > > > >
> > > >>> > > > > > > > >           + (regionAHasMergeQualifier ?
> > > >>> > > > region_a.getEncodedName() :
> > > >>> > > > > > > > > region_b
> > > >>> > > > > > > > >
> > > >>> > > > > > > > >               .getEncodedName()) + " has merge
> > > >>> qualifier");
> > > >>> > > > > > > > >
> > > >>> > > > > > > > > Can you take a look at master log around the time
> > merge
> > > >>> > request
> > > >>> > > > was
> > > >>> > > > > > > > issued
> > > >>> > > > > > > > > to see if you can get some clue ?
> > > >>> > > > > > > > >
> > > >>> > > > > > > > > Cheers
> > > >>> > > > > > > > >
> > > >>> > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> > > >>> > > > > > shahab.yunus@gmail.com>
> > > >>> > > > > > > > > wrote:
> > > >>> > > > > > > > >
> > > >>> > > > > > > > > > The documentation of online merge tool
> > (merge_region)
> > > >>> > states
> > > >>> > > > that
> > > >>> > > > > > if
> > > >>> > > > > > > we
> > > >>> > > > > > > > > > forcibly merge regions (by setting the 3rd
> > attribute
> > > as
> > > >>> > true)
> > > >>> > > > > then
> > > >>> > > > > > it
> > > >>> > > > > > > > can
> > > >>> > > > > > > > > > create overlapping regions. if this happens then
> > will
> > > >>> this
> > > >>> > > > render
> > > >>> > > > > > the
> > > >>> > > > > > > > > > region or table unusable or it is just a
> > performance
> > > >>> hit? I
> > > >>> > > > mean
> > > >>> > > > > > how
> > > >>> > > > > > > > > bigger
> > > >>> > > > > > > > > > of a deal it is?
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > > > Actually, we are merging regions using the
> > > >>> programmatic API
> > > >>> > > for
> > > >>> > > > > > this
> > > >>> > > > > > > > and
> > > >>> > > > > > > > > > setting this flag ('forcible') as false. But for
> > some
> > > >>> > tables
> > > >>> > > > (we
> > > >>> > > > > > > > haven't
> > > >>> > > > > > > > > > figured out a pattern yet, data is still
> > accessible),
> > > >>> merge
> > > >>> > > of
> > > >>> > > > > > > regions
> > > >>> > > > > > > > do
> > > >>> > > > > > > > > > not happen at all. Afterwards we tried with this
> > > flag =
> > > >>> > true,
> > > >>> > > > and
> > > >>> > > > > > it
> > > >>> > > > > > > > > still
> > > >>> > > > > > > > > > doesn't merge them.
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > > > CDH 5.1.0
> > > >>> > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > > > Regards,
> > > >>> > > > > > > > > > Shahab
> > > >>> > > > > > > > > >
> > > >>> > > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: Forcibly merging regions

Posted by Shahab Yunus <sh...@gmail.com>.

I see. Thanks.

So we can in a way automate this resolution by invoking major compaction
programmatically for the 2 regions under process (or we need to do the
whole table?). Point being, that the merge tool, once identifies that it is
stuck in a polling loop, can invoke major compaction on the 2 regions or
table and then try again. Does it make sense? Plausible solution? We do
know that this merging, although automated, will still be run in a
controlled manner so chances of overstepping or synchronization issues on
the current table should not occur.

But ow the question is that majorCompact is also an aync operation. So how
and when to know it has finished? :)

Regards,
Shahab

On Fri, Nov 14, 2014 at 2:34 PM, Ted Yu <yu...@gmail.com> wrote:

> This means that yesterday's compaction was not major compaction.
>
> When references get in the way of merging regions, you know that it is time
> for major compaction.
>
> Cheers
>
> On Fri, Nov 14, 2014 at 11:31 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > After major compacting the references were freed for the above mentioned
> > regions and then the merge_region command succeeded and they got merged.
> > Hmmm.
> >
> > Regards,
> > Shahab
> >
> > On Fri, Nov 14, 2014 at 2:08 PM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> > > Digging deeper into the code, I came across this (this is from
> > > CatalogJanitor#cleanMergeRegion):
> > >
> > >
> > > ...
> > >
> > > ...
> > >
> > > HFileArchiver.archiveRegion <
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29
> >(this.services
> > <
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services
> >.getConfiguration
> > <
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29
> >(),
> > fs, regionA);
> > >
> > > HFileArchiver.archiveRegion <
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29
> >(this.services
> > <
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services
> >.getConfiguration
> > <
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29
> >(),
> > fs, regionB);
> > >
> > > MetaEditor.deleteMergeQualifiers <
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/catalog/MetaEditor.java#MetaEditor.deleteMergeQualifiers%28org.apache.hadoop.hbase.catalog.CatalogTracker%2Corg.apache.hadoop.hbase.HRegionInfo%29
> >(server
> > <
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0server
> >.getCatalogTracker
> > <
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getCatalogTracker%28%29
> >(),
> > mergedRegion);
> > >
> > > return true;
> > >
> > >
> > > Do you think it is ok if we face this issue then we forcibly archive
> and
> > > clean the regions ?
> > >
> > > Regards,
> > > Shahab
> > >
> > > On Fri, Nov 14, 2014 at 1:10 PM, Shahab Yunus <sh...@gmail.com>
> > > wrote:
> > >
> > >> Yesterday, I believe.
> > >>
> > >> Regards,
> > >> Shahab
> > >>
> > >> On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu <yu...@gmail.com> wrote:
> > >>
> > >>> Shahab:
> > >>> When was the last time compaction was run on this table ?
> > >>>
> > >>> Cheers
> > >>>
> > >>> On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> > >>> wrote:
> > >>>
> > >>> > I see. Thanks.
> > >>> >
> > >>> > And if the region indeed has references, then can we somehow
> forcibly
> > >>> > remove them? Is this even possible (if not advisable)? Basically
> what
> > >>> I am
> > >>> > trying to ask is that let us say we do hit this scenario and we
> know
> > >>> it is
> > >>> > OK to go ahead and merge. What steps can we follow after detection
> of
> > >>> such
> > >>> > unwanted references.
> > >>> >
> > >>> > Regards,
> > >>> > Shahab
> > >>> >
> > >>> > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > >>> >
> > >>> > > For automated detection of such scenario, you can reference the
> > code
> > >>> in
> > >>> > > CatalogJanitor#cleanMergeRegion():
> > >>> > >
> > >>> > >       regionFs = HRegionFileSystem.openRegionFromFileSystem(
> > >>> > >
> > >>> > >           this.services.getConfiguration(), fs, tabledir,
> > >>> mergedRegion,
> > >>> > > true
> > >>> > > );
> > >>> > >
> > >>> > > ...
> > >>> > >
> > >>> > > Then regionFs.hasReferences(htd) would tell you whether the
> > >>> underlying
> > >>> > > region has reference files.
> > >>> > > Cheers
> > >>> > >
> > >>> > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <
> > >>> shahab.yunus@gmail.com>
> > >>> > > wrote:
> > >>> > >
> > >>> > > > No. Not that I can recall but I can check.
> > >>> > > >
> > >>> > > > From resolution perspective, is there any way we can resolve
> > this.
> > >>> More
> > >>> > > > importantly, anyway we can automate the resolution, if we run
> > into
> > >>> such
> > >>> > > > issues in future? 'Cleaning the qualifier', that is.
> > >>> > > >
> > >>> > > > Regards,
> > >>> > > > Shahab
> > >>> > > >
> > >>> > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yu...@gmail.com>
> > >>> wrote:
> > >>> > > >
> > >>> > > > > One possibility was that region
> > 7373f75181c71eb5061a6673cee15931
> > >>> was
> > >>> > > > > involved in some hbase snapshot.
> > >>> > > > >
> > >>> > > > > Was the underlying table being snapshotted in recent past ?
> > >>> > > > >
> > >>> > > > > Cheers
> > >>> > > > >
> > >>> > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
> > >>> > shahab.yunus@gmail.com>
> > >>> > > > > wrote:
> > >>> > > > >
> > >>> > > > > > Thanks again.
> > >>> > > > > >
> > >>> > > > > > But I have been polling for a while and it still doesn't
> > >>> merge. I
> > >>> > > mean
> > >>> > > > > this
> > >>> > > > > > particular region example that I sent you, I am trying to
> > >>> merge it
> > >>> > > > since
> > >>> > > > > > yesterday. I ran the polling-base code all night and I have
> > to
> > >>> kill
> > >>> > > it.
> > >>> > > > > > Then in the morning, I tried manual merging through hbase
> > >>> shell and
> > >>> > > it
> > >>> > > > > > still doesn't merge. Note that the current polling logic
> > >>> doesnot
> > >>> > try
> > >>> > > to
> > >>> > > > > > call merge again. It just checks the region size.
> > >>> > > > > >
> > >>> > > > > > So how to clean it then? Or actually make it merge? Plus is
> > >>> this
> > >>> > > > > something
> > >>> > > > > > expected (a region keeping a reference)? How can we avoid
> it?
> > >>> > > > > >
> > >>> > > > > > Note that this is not limited to this table only. We are
> > seeing
> > >>> > this
> > >>> > > in
> > >>> > > > > > other regions of other tables as well. Are we merging too
> > fast?
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > >
> > >>> > > > > > Regards,
> > >>> > > > > > Shahab
> > >>> > > > > >
> > >>> > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <
> > yuzhihong@gmail.com>
> > >>> > > wrote:
> > >>> > > > > >
> > >>> > > > > > > Polling as you described is fine.
> > >>> > > > > > >
> > >>> > > > > > > catalogJanitor.cleanMergeQualifier() is called by
> > >>> > > > > > > DispatchMergingRegionHandler.
> > >>> > > > > > >
> > >>> > > > > > > If clean was successful, you would see the following:
> > >>> > > > > > >
> > >>> > > > > > >       LOG.debug("Deleting region " +
> > >>> > > regionA.getRegionNameAsString()
> > >>> > > > +
> > >>> > > > > "
> > >>> > > > > > > and "
> > >>> > > > > > >
> > >>> > > > > > >           + regionB.getRegionNameAsString()
> > >>> > > > > > >
> > >>> > > > > > >           + " from fs because merged region no longer
> holds
> > >>> > > > > references");
> > >>> > > > > > >
> > >>> > > > > > > Assuming there was no log below in your master log:
> > >>> > > > > > >
> > >>> > > > > > >       LOG.error("Merged region " +
> > >>> region.getRegionNameAsString()
> > >>> > > > > > >
> > >>> > > > > > >           + " has only one merge qualifier in META.");
> > >>> > > > > > >
> > >>> > > > > > > It would be the case that
> 7373f75181c71eb5061a6673cee15931
> > >>> still
> > >>> > > had
> > >>> > > > > > > reference file.
> > >>> > > > > > >
> > >>> > > > > > > Cheers
> > >>> > > > > > >
> > >>> > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
> > >>> > > > shahab.yunus@gmail.com>
> > >>> > > > > > > wrote:
> > >>> > > > > > >
> > >>> > > > > > > > Hi Ted.
> > >>> > > > > > > >
> > >>> > > > > > > > The log bit is below at the end of the email. This is
> the
> > >>> > command
> > >>> > > > to
> > >>> > > > > > > merge
> > >>> > > > > > > > that I gave just now through hbase shell. forcible was
> > >>> false
> > >>> > but
> > >>> > > it
> > >>> > > > > > > behaves
> > >>> > > > > > > > similarly if forcible is true too. This is from master
> > log.
> > >>> > > Indeed
> > >>> > > > > the
> > >>> > > > > > > > region merging was skipped! What does this mean? Data
> > >>> seems to
> > >>> > be
> > >>> > > > > > intact
> > >>> > > > > > > > for this table.
> > >>> > > > > > > >
> > >>> > > > > > > > Just to give you a background. This table was first
> merge
> > >>> by
> > >>> > the
> > >>> > > > auto
> > >>> > > > > > > mated
> > >>> > > > > > > > java application. What we are doing is that we are
> > merging
> > >>> > tables
> > >>> > > > > > > > programmatically. As the HBaseAdmin.mergeRegions calls
> i
> > >>> async,
> > >>> > > we
> > >>> > > > > poll
> > >>> > > > > > > for
> > >>> > > > > > > > the number of regions getting lowered after this merge
> > >>> call.
> > >>> > The
> > >>> > > > > > > > application hangs and continues polling for ever as the
> > >>> > previous
> > >>> > > > > merge
> > >>> > > > > > > > didn't happen.
> > >>> > > > > > > >
> > >>> > > > > > > > In this poll loop, we do get the number of regions by a
> > >>> fresh
> > >>> > > call
> > >>> > > > to
> > >>> > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
> > >>> > > > > > > >
> > >>> > > > > > > > What are these merge qualifiers and what are we doing
> > >>> wrong or
> > >>> > > > should
> > >>> > > > > > do?
> > >>> > > > > > > >
> > >>> > > > > > > > In the polling loop we can somehow retry merge again?
> But
> > >>> how
> > >>> > can
> > >>> > > > we
> > >>> > > > > > > know,
> > >>> > > > > > > > that we need to call merge again as it works for some
> > >>> regions.
> > >>> > Is
> > >>> > > > the
> > >>> > > > > > > table
> > >>> > > > > > > > meta corrupted for some reason by the above logic?
> > >>> > > > > > > >
> > >>> > > > > > > > Thanks a lot.
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > >
> > >>> > > >
> > >>> >
> > >>>
> > ------------------------------------------------------------------------
> > >>> > > > > > > >
> > >>> > > > > > > > 2014-11-14 11:25:02,643 INFO
> > >>> org.apache.zookeeper.ZooKeeper:
> > >>> > > > Session:
> > >>> > > > > > > > 0x348c7017707236b closed
> > >>> > > > > > > > 2014-11-14 11:25:02,643 INFO
> > >>> org.apache.zookeeper.ClientCnxn:
> > >>> > > > > > EventThread
> > >>> > > > > > > > shut down
> > >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > >>> org.apache.zookeeper.ZooKeeper:
> > >>> > > > > Initiating
> > >>> > > > > > > > client connection,
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > >>> > > > > > > > sessionTimeout=60000
> > >>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > >>> > > > > > > > baseZNode=/hbase
> > >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > >>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> > >>> Process
> > >>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
> > >>> connecting
> > >>> > to
> > >>> > > > > > > ZooKeeper
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > >>> org.apache.zookeeper.ClientCnxn:
> > >>> > > > Opening
> > >>> > > > > > > > socket connection to server
> > >>> > ip-1010018.ec2.internal/1010019:2181.
> > >>> > > > > Will
> > >>> > > > > > > not
> > >>> > > > > > > > attempt to authenticate using SASL (unknown error)
> > >>> > > > > > > > 2014-11-14 11:25:02,646 INFO
> > >>> org.apache.zookeeper.ClientCnxn:
> > >>> > > > Socket
> > >>> > > > > > > > connection established to
> > >>> ip-1010018.ec2.internal/1010019:2181,
> > >>> > > > > > > initiating
> > >>> > > > > > > > session
> > >>> > > > > > > > 2014-11-14 11:25:02,648 INFO
> > >>> org.apache.zookeeper.ClientCnxn:
> > >>> > > > Session
> > >>> > > > > > > > establishment complete on server
> > >>> > > > > ip-1010018.ec2.internal/1010019:2181,
> > >>> > > > > > > > sessionid = 0x348c7017707236c, negotiated timeout =
> 60000
> > >>> > > > > > > > 2014-11-14 11:25:02,703 INFO
> > >>> org.apache.zookeeper.ZooKeeper:
> > >>> > > > Session:
> > >>> > > > > > > > 0x348c7017707236c closed
> > >>> > > > > > > > 2014-11-14 11:25:02,703 INFO
> > >>> org.apache.zookeeper.ClientCnxn:
> > >>> > > > > > EventThread
> > >>> > > > > > > > shut down
> > >>> > > > > > > > 2014-11-14 11:25:30,713 INFO
> > >>> > > > > > > >
> > >>> > > >
> > >>> org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> > >>> > > > > > Skip
> > >>> > > > > > > > merging regions
> > >>> > > > > > > >
> > >>> TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > >>> > > > > > > > because region 7373f75181c71eb5061a6673cee15931 has
> merge
> > >>> > > qualifier
> > >>> > > > > > > > 2014-11-14 11:25:41,383 INFO
> > >>> org.apache.zookeeper.ZooKeeper:
> > >>> > > > > Initiating
> > >>> > > > > > > > client connection,
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > >>> > > > > > > > sessionTimeout=60000
> > >>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > >>> > > > > > > > baseZNode=/hbase
> > >>> > > > > > > > 2014-11-14 11:25:41,384 INFO
> > >>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> > >>> Process
> > >>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
> > >>> connecting
> > >>> > to
> > >>> > > > > > > ZooKeeper
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > >>> > > > > > > > 2014-11-14 11:25:41,384 INFO
> > >>> org.apache.zookeeper.ClientCnxn:
> > >>> > > > Opening
> > >>> > > > > > > > socket connection to server
> > >>> > ip-1010018.ec2.internal/1010019:2181.
> > >>> > > > > Will
> > >>> > > > > > > not
> > >>> > > > > > > > attempt to authenticate using SASL (unknown error)
> > >>> > > > > > > > 2014-11-14 11:25:41,386 INFO
> > >>> org.apache.zookeeper.ClientCnxn:
> > >>> > > > Socket
> > >>> > > > > > > > connection established to
> > >>> ip-1010018.ec2.internal/1010019:2181,
> > >>> > > > > > > initiating
> > >>> > > > > > > > session
> > >>> > > > > > > > 2014-11-14 11:25:41,389 INFO
> > >>> org.apache.zookeeper.ClientCnxn:
> > >>> > > > Session
> > >>> > > > > > > > establishment complete on server
> > >>> > > > > ip-1010018.ec2.internal/1010019:2181,
> > >>> > > > > > > > sessionid = 0x348c7017707236e, negotiated timeout =
> 60000
> > >>> > > > > > > > 2014-11-14 11:25:41,397 INFO
> > >>> org.apache.zookeeper.ZooKeeper:
> > >>> > > > Session:
> > >>> > > > > > > > 0x348c7017707236e closed
> > >>> > > > > > > > 2014-11-14 11:25:41,398 INFO
> > >>> org.apache.zookeeper.ClientCnxn:
> > >>> > > > > > EventThread
> > >>> > > > > > > > shut down
> > >>> > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > >>> > > > > > > >
> > >>> > > > > > > > Regards,
> > >>> > > > > > > > Shahab
> > >>> > > > > > > >
> > >>> > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <
> > >>> yuzhihong@gmail.com>
> > >>> > > > > wrote:
> > >>> > > > > > > >
> > >>> > > > > > > > > Looking at DispatchMergingRegionHandler, it does some
> > >>> check
> > >>> > > > before
> > >>> > > > > > > > > initiating the merge.
> > >>> > > > > > > > > e.g.:
> > >>> > > > > > > > >
> > >>> > > > > > > > >       LOG.info("Skip merging regions " +
> > >>> > > > > > > region_a.getRegionNameAsString()
> > >>> > > > > > > > >
> > >>> > > > > > > > >           + ", " + region_b.getRegionNameAsString() +
> > ",
> > >>> > > because
> > >>> > > > > > > region "
> > >>> > > > > > > > >
> > >>> > > > > > > > >           + (regionAHasMergeQualifier ?
> > >>> > > > region_a.getEncodedName() :
> > >>> > > > > > > > > region_b
> > >>> > > > > > > > >
> > >>> > > > > > > > >               .getEncodedName()) + " has merge
> > >>> qualifier");
> > >>> > > > > > > > >
> > >>> > > > > > > > > Can you take a look at master log around the time
> merge
> > >>> > request
> > >>> > > > was
> > >>> > > > > > > > issued
> > >>> > > > > > > > > to see if you can get some clue ?
> > >>> > > > > > > > >
> > >>> > > > > > > > > Cheers
> > >>> > > > > > > > >
> > >>> > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> > >>> > > > > > shahab.yunus@gmail.com>
> > >>> > > > > > > > > wrote:
> > >>> > > > > > > > >
> > >>> > > > > > > > > > The documentation of online merge tool
> (merge_region)
> > >>> > states
> > >>> > > > that
> > >>> > > > > > if
> > >>> > > > > > > we
> > >>> > > > > > > > > > forcibly merge regions (by setting the 3rd
> attribute
> > as
> > >>> > true)
> > >>> > > > > then
> > >>> > > > > > it
> > >>> > > > > > > > can
> > >>> > > > > > > > > > create overlapping regions. if this happens then
> will
> > >>> this
> > >>> > > > render
> > >>> > > > > > the
> > >>> > > > > > > > > > region or table unusable or it is just a
> performance
> > >>> hit? I
> > >>> > > > mean
> > >>> > > > > > how
> > >>> > > > > > > > > bigger
> > >>> > > > > > > > > > of a deal it is?
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > Actually, we are merging regions using the
> > >>> programmatic API
> > >>> > > for
> > >>> > > > > > this
> > >>> > > > > > > > and
> > >>> > > > > > > > > > setting this flag ('forcible') as false. But for
> some
> > >>> > tables
> > >>> > > > (we
> > >>> > > > > > > > haven't
> > >>> > > > > > > > > > figured out a pattern yet, data is still
> accessible),
> > >>> merge
> > >>> > > of
> > >>> > > > > > > regions
> > >>> > > > > > > > do
> > >>> > > > > > > > > > not happen at all. Afterwards we tried with this
> > flag =
> > >>> > true,
> > >>> > > > and
> > >>> > > > > > it
> > >>> > > > > > > > > still
> > >>> > > > > > > > > > doesn't merge them.
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > CDH 5.1.0
> > >>> > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
> > >>> > > > > > > > > >
> > >>> > > > > > > > > > Regards,
> > >>> > > > > > > > > > Shahab
> > >>> > > > > > > > > >
> > >>> > > > > > > > >
> > >>> > > > > > > >
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: Forcibly merging regions

Posted by Ted Yu <yu...@gmail.com>.

This means that yesterday's compaction was not major compaction.

When references get in the way of merging regions, you know that it is time
for major compaction.

Cheers

On Fri, Nov 14, 2014 at 11:31 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> After major compacting the references were freed for the above mentioned
> regions and then the merge_region command succeeded and they got merged.
> Hmmm.
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 2:08 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > Digging deeper into the code, I came across this (this is from
> > CatalogJanitor#cleanMergeRegion):
> >
> >
> > ...
> >
> > ...
> >
> > HFileArchiver.archiveRegion <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29>(this.services
> <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services>.getConfiguration
> <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29>(),
> fs, regionA);
> >
> > HFileArchiver.archiveRegion <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29>(this.services
> <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services>.getConfiguration
> <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29>(),
> fs, regionB);
> >
> > MetaEditor.deleteMergeQualifiers <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/catalog/MetaEditor.java#MetaEditor.deleteMergeQualifiers%28org.apache.hadoop.hbase.catalog.CatalogTracker%2Corg.apache.hadoop.hbase.HRegionInfo%29>(server
> <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0server>.getCatalogTracker
> <
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getCatalogTracker%28%29>(),
> mergedRegion);
> >
> > return true;
> >
> >
> > Do you think it is ok if we face this issue then we forcibly archive and
> > clean the regions ?
> >
> > Regards,
> > Shahab
> >
> > On Fri, Nov 14, 2014 at 1:10 PM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> >> Yesterday, I believe.
> >>
> >> Regards,
> >> Shahab
> >>
> >> On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >>> Shahab:
> >>> When was the last time compaction was run on this table ?
> >>>
> >>> Cheers
> >>>
> >>> On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <sh...@gmail.com>
> >>> wrote:
> >>>
> >>> > I see. Thanks.
> >>> >
> >>> > And if the region indeed has references, then can we somehow forcibly
> >>> > remove them? Is this even possible (if not advisable)? Basically what
> >>> I am
> >>> > trying to ask is that let us say we do hit this scenario and we know
> >>> it is
> >>> > OK to go ahead and merge. What steps can we follow after detection of
> >>> such
> >>> > unwanted references.
> >>> >
> >>> > Regards,
> >>> > Shahab
> >>> >
> >>> > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <yu...@gmail.com>
> wrote:
> >>> >
> >>> > > For automated detection of such scenario, you can reference the
> code
> >>> in
> >>> > > CatalogJanitor#cleanMergeRegion():
> >>> > >
> >>> > >       regionFs = HRegionFileSystem.openRegionFromFileSystem(
> >>> > >
> >>> > >           this.services.getConfiguration(), fs, tabledir,
> >>> mergedRegion,
> >>> > > true
> >>> > > );
> >>> > >
> >>> > > ...
> >>> > >
> >>> > > Then regionFs.hasReferences(htd) would tell you whether the
> >>> underlying
> >>> > > region has reference files.
> >>> > > Cheers
> >>> > >
> >>> > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <
> >>> shahab.yunus@gmail.com>
> >>> > > wrote:
> >>> > >
> >>> > > > No. Not that I can recall but I can check.
> >>> > > >
> >>> > > > From resolution perspective, is there any way we can resolve
> this.
> >>> More
> >>> > > > importantly, anyway we can automate the resolution, if we run
> into
> >>> such
> >>> > > > issues in future? 'Cleaning the qualifier', that is.
> >>> > > >
> >>> > > > Regards,
> >>> > > > Shahab
> >>> > > >
> >>> > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yu...@gmail.com>
> >>> wrote:
> >>> > > >
> >>> > > > > One possibility was that region
> 7373f75181c71eb5061a6673cee15931
> >>> was
> >>> > > > > involved in some hbase snapshot.
> >>> > > > >
> >>> > > > > Was the underlying table being snapshotted in recent past ?
> >>> > > > >
> >>> > > > > Cheers
> >>> > > > >
> >>> > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
> >>> > shahab.yunus@gmail.com>
> >>> > > > > wrote:
> >>> > > > >
> >>> > > > > > Thanks again.
> >>> > > > > >
> >>> > > > > > But I have been polling for a while and it still doesn't
> >>> merge. I
> >>> > > mean
> >>> > > > > this
> >>> > > > > > particular region example that I sent you, I am trying to
> >>> merge it
> >>> > > > since
> >>> > > > > > yesterday. I ran the polling-base code all night and I have
> to
> >>> kill
> >>> > > it.
> >>> > > > > > Then in the morning, I tried manual merging through hbase
> >>> shell and
> >>> > > it
> >>> > > > > > still doesn't merge. Note that the current polling logic
> >>> doesnot
> >>> > try
> >>> > > to
> >>> > > > > > call merge again. It just checks the region size.
> >>> > > > > >
> >>> > > > > > So how to clean it then? Or actually make it merge? Plus is
> >>> this
> >>> > > > > something
> >>> > > > > > expected (a region keeping a reference)? How can we avoid it?
> >>> > > > > >
> >>> > > > > > Note that this is not limited to this table only. We are
> seeing
> >>> > this
> >>> > > in
> >>> > > > > > other regions of other tables as well. Are we merging too
> fast?
> >>> > > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > Regards,
> >>> > > > > > Shahab
> >>> > > > > >
> >>> > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <
> yuzhihong@gmail.com>
> >>> > > wrote:
> >>> > > > > >
> >>> > > > > > > Polling as you described is fine.
> >>> > > > > > >
> >>> > > > > > > catalogJanitor.cleanMergeQualifier() is called by
> >>> > > > > > > DispatchMergingRegionHandler.
> >>> > > > > > >
> >>> > > > > > > If clean was successful, you would see the following:
> >>> > > > > > >
> >>> > > > > > >       LOG.debug("Deleting region " +
> >>> > > regionA.getRegionNameAsString()
> >>> > > > +
> >>> > > > > "
> >>> > > > > > > and "
> >>> > > > > > >
> >>> > > > > > >           + regionB.getRegionNameAsString()
> >>> > > > > > >
> >>> > > > > > >           + " from fs because merged region no longer holds
> >>> > > > > references");
> >>> > > > > > >
> >>> > > > > > > Assuming there was no log below in your master log:
> >>> > > > > > >
> >>> > > > > > >       LOG.error("Merged region " +
> >>> region.getRegionNameAsString()
> >>> > > > > > >
> >>> > > > > > >           + " has only one merge qualifier in META.");
> >>> > > > > > >
> >>> > > > > > > It would be the case that 7373f75181c71eb5061a6673cee15931
> >>> still
> >>> > > had
> >>> > > > > > > reference file.
> >>> > > > > > >
> >>> > > > > > > Cheers
> >>> > > > > > >
> >>> > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
> >>> > > > shahab.yunus@gmail.com>
> >>> > > > > > > wrote:
> >>> > > > > > >
> >>> > > > > > > > Hi Ted.
> >>> > > > > > > >
> >>> > > > > > > > The log bit is below at the end of the email. This is the
> >>> > command
> >>> > > > to
> >>> > > > > > > merge
> >>> > > > > > > > that I gave just now through hbase shell. forcible was
> >>> false
> >>> > but
> >>> > > it
> >>> > > > > > > behaves
> >>> > > > > > > > similarly if forcible is true too. This is from master
> log.
> >>> > > Indeed
> >>> > > > > the
> >>> > > > > > > > region merging was skipped! What does this mean? Data
> >>> seems to
> >>> > be
> >>> > > > > > intact
> >>> > > > > > > > for this table.
> >>> > > > > > > >
> >>> > > > > > > > Just to give you a background. This table was first merge
> >>> by
> >>> > the
> >>> > > > auto
> >>> > > > > > > mated
> >>> > > > > > > > java application. What we are doing is that we are
> merging
> >>> > tables
> >>> > > > > > > > programmatically. As the HBaseAdmin.mergeRegions calls i
> >>> async,
> >>> > > we
> >>> > > > > poll
> >>> > > > > > > for
> >>> > > > > > > > the number of regions getting lowered after this merge
> >>> call.
> >>> > The
> >>> > > > > > > > application hangs and continues polling for ever as the
> >>> > previous
> >>> > > > > merge
> >>> > > > > > > > didn't happen.
> >>> > > > > > > >
> >>> > > > > > > > In this poll loop, we do get the number of regions by a
> >>> fresh
> >>> > > call
> >>> > > > to
> >>> > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
> >>> > > > > > > >
> >>> > > > > > > > What are these merge qualifiers and what are we doing
> >>> wrong or
> >>> > > > should
> >>> > > > > > do?
> >>> > > > > > > >
> >>> > > > > > > > In the polling loop we can somehow retry merge again? But
> >>> how
> >>> > can
> >>> > > > we
> >>> > > > > > > know,
> >>> > > > > > > > that we need to call merge again as it works for some
> >>> regions.
> >>> > Is
> >>> > > > the
> >>> > > > > > > table
> >>> > > > > > > > meta corrupted for some reason by the above logic?
> >>> > > > > > > >
> >>> > > > > > > > Thanks a lot.
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > >
> >>> > > >
> >>> >
> >>>
> ------------------------------------------------------------------------
> >>> > > > > > > >
> >>> > > > > > > > 2014-11-14 11:25:02,643 INFO
> >>> org.apache.zookeeper.ZooKeeper:
> >>> > > > Session:
> >>> > > > > > > > 0x348c7017707236b closed
> >>> > > > > > > > 2014-11-14 11:25:02,643 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > > > EventThread
> >>> > > > > > > > shut down
> >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> >>> org.apache.zookeeper.ZooKeeper:
> >>> > > > > Initiating
> >>> > > > > > > > client connection,
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> >>> > > > > > > > sessionTimeout=60000
> >>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> >>> > > > > > > > baseZNode=/hbase
> >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> >>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> >>> Process
> >>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
> >>> connecting
> >>> > to
> >>> > > > > > > ZooKeeper
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> >>> > > > > > > > 2014-11-14 11:25:02,645 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > Opening
> >>> > > > > > > > socket connection to server
> >>> > ip-1010018.ec2.internal/1010019:2181.
> >>> > > > > Will
> >>> > > > > > > not
> >>> > > > > > > > attempt to authenticate using SASL (unknown error)
> >>> > > > > > > > 2014-11-14 11:25:02,646 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > Socket
> >>> > > > > > > > connection established to
> >>> ip-1010018.ec2.internal/1010019:2181,
> >>> > > > > > > initiating
> >>> > > > > > > > session
> >>> > > > > > > > 2014-11-14 11:25:02,648 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > Session
> >>> > > > > > > > establishment complete on server
> >>> > > > > ip-1010018.ec2.internal/1010019:2181,
> >>> > > > > > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
> >>> > > > > > > > 2014-11-14 11:25:02,703 INFO
> >>> org.apache.zookeeper.ZooKeeper:
> >>> > > > Session:
> >>> > > > > > > > 0x348c7017707236c closed
> >>> > > > > > > > 2014-11-14 11:25:02,703 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > > > EventThread
> >>> > > > > > > > shut down
> >>> > > > > > > > 2014-11-14 11:25:30,713 INFO
> >>> > > > > > > >
> >>> > > >
> >>> org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> >>> > > > > > Skip
> >>> > > > > > > > merging regions
> >>> > > > > > > >
> >>> TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> >>> > > > > > > > because region 7373f75181c71eb5061a6673cee15931 has merge
> >>> > > qualifier
> >>> > > > > > > > 2014-11-14 11:25:41,383 INFO
> >>> org.apache.zookeeper.ZooKeeper:
> >>> > > > > Initiating
> >>> > > > > > > > client connection,
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> >>> > > > > > > > sessionTimeout=60000
> >>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> >>> > > > > > > > baseZNode=/hbase
> >>> > > > > > > > 2014-11-14 11:25:41,384 INFO
> >>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> >>> Process
> >>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
> >>> connecting
> >>> > to
> >>> > > > > > > ZooKeeper
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> >>> > > > > > > > 2014-11-14 11:25:41,384 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > Opening
> >>> > > > > > > > socket connection to server
> >>> > ip-1010018.ec2.internal/1010019:2181.
> >>> > > > > Will
> >>> > > > > > > not
> >>> > > > > > > > attempt to authenticate using SASL (unknown error)
> >>> > > > > > > > 2014-11-14 11:25:41,386 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > Socket
> >>> > > > > > > > connection established to
> >>> ip-1010018.ec2.internal/1010019:2181,
> >>> > > > > > > initiating
> >>> > > > > > > > session
> >>> > > > > > > > 2014-11-14 11:25:41,389 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > Session
> >>> > > > > > > > establishment complete on server
> >>> > > > > ip-1010018.ec2.internal/1010019:2181,
> >>> > > > > > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
> >>> > > > > > > > 2014-11-14 11:25:41,397 INFO
> >>> org.apache.zookeeper.ZooKeeper:
> >>> > > > Session:
> >>> > > > > > > > 0x348c7017707236e closed
> >>> > > > > > > > 2014-11-14 11:25:41,398 INFO
> >>> org.apache.zookeeper.ClientCnxn:
> >>> > > > > > EventThread
> >>> > > > > > > > shut down
> >>> > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> ------------------------------------------------------------------------------------------------------------------------------------
> >>> > > > > > > >
> >>> > > > > > > > Regards,
> >>> > > > > > > > Shahab
> >>> > > > > > > >
> >>> > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <
> >>> yuzhihong@gmail.com>
> >>> > > > > wrote:
> >>> > > > > > > >
> >>> > > > > > > > > Looking at DispatchMergingRegionHandler, it does some
> >>> check
> >>> > > > before
> >>> > > > > > > > > initiating the merge.
> >>> > > > > > > > > e.g.:
> >>> > > > > > > > >
> >>> > > > > > > > >       LOG.info("Skip merging regions " +
> >>> > > > > > > region_a.getRegionNameAsString()
> >>> > > > > > > > >
> >>> > > > > > > > >           + ", " + region_b.getRegionNameAsString() +
> ",
> >>> > > because
> >>> > > > > > > region "
> >>> > > > > > > > >
> >>> > > > > > > > >           + (regionAHasMergeQualifier ?
> >>> > > > region_a.getEncodedName() :
> >>> > > > > > > > > region_b
> >>> > > > > > > > >
> >>> > > > > > > > >               .getEncodedName()) + " has merge
> >>> qualifier");
> >>> > > > > > > > >
> >>> > > > > > > > > Can you take a look at master log around the time merge
> >>> > request
> >>> > > > was
> >>> > > > > > > > issued
> >>> > > > > > > > > to see if you can get some clue ?
> >>> > > > > > > > >
> >>> > > > > > > > > Cheers
> >>> > > > > > > > >
> >>> > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> >>> > > > > > shahab.yunus@gmail.com>
> >>> > > > > > > > > wrote:
> >>> > > > > > > > >
> >>> > > > > > > > > > The documentation of online merge tool (merge_region)
> >>> > states
> >>> > > > that
> >>> > > > > > if
> >>> > > > > > > we
> >>> > > > > > > > > > forcibly merge regions (by setting the 3rd attribute
> as
> >>> > true)
> >>> > > > > then
> >>> > > > > > it
> >>> > > > > > > > can
> >>> > > > > > > > > > create overlapping regions. if this happens then will
> >>> this
> >>> > > > render
> >>> > > > > > the
> >>> > > > > > > > > > region or table unusable or it is just a performance
> >>> hit? I
> >>> > > > mean
> >>> > > > > > how
> >>> > > > > > > > > bigger
> >>> > > > > > > > > > of a deal it is?
> >>> > > > > > > > > >
> >>> > > > > > > > > > Actually, we are merging regions using the
> >>> programmatic API
> >>> > > for
> >>> > > > > > this
> >>> > > > > > > > and
> >>> > > > > > > > > > setting this flag ('forcible') as false. But for some
> >>> > tables
> >>> > > > (we
> >>> > > > > > > > haven't
> >>> > > > > > > > > > figured out a pattern yet, data is still accessible),
> >>> merge
> >>> > > of
> >>> > > > > > > regions
> >>> > > > > > > > do
> >>> > > > > > > > > > not happen at all. Afterwards we tried with this
> flag =
> >>> > true,
> >>> > > > and
> >>> > > > > > it
> >>> > > > > > > > > still
> >>> > > > > > > > > > doesn't merge them.
> >>> > > > > > > > > >
> >>> > > > > > > > > > CDH 5.1.0
> >>> > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
> >>> > > > > > > > > >
> >>> > > > > > > > > > Regards,
> >>> > > > > > > > > > Shahab
> >>> > > > > > > > > >
> >>> > > > > > > > >
> >>> > > > > > > >
> >>> > > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: Forcibly merging regions

Posted by Shahab Yunus <sh...@gmail.com>.

After major compacting the references were freed for the above mentioned
regions and then the merge_region command succeeded and they got merged.
Hmmm.

Regards,
Shahab

On Fri, Nov 14, 2014 at 2:08 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> Digging deeper into the code, I came across this (this is from
> CatalogJanitor#cleanMergeRegion):
>
>
> ...
>
> ...
>
> HFileArchiver.archiveRegion <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29>(this.services <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services>.getConfiguration <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29>(), fs, regionA);
>
> HFileArchiver.archiveRegion <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29>(this.services <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services>.getConfiguration <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29>(), fs, regionB);
>
> MetaEditor.deleteMergeQualifiers <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/catalog/MetaEditor.java#MetaEditor.deleteMergeQualifiers%28org.apache.hadoop.hbase.catalog.CatalogTracker%2Corg.apache.hadoop.hbase.HRegionInfo%29>(server <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0server>.getCatalogTracker <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getCatalogTracker%28%29>(), mergedRegion);
>
> return true;
>
>
> Do you think it is ok if we face this issue then we forcibly archive and
> clean the regions ?
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 1:10 PM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
>> Yesterday, I believe.
>>
>> Regards,
>> Shahab
>>
>> On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Shahab:
>>> When was the last time compaction was run on this table ?
>>>
>>> Cheers
>>>
>>> On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <sh...@gmail.com>
>>> wrote:
>>>
>>> > I see. Thanks.
>>> >
>>> > And if the region indeed has references, then can we somehow forcibly
>>> > remove them? Is this even possible (if not advisable)? Basically what
>>> I am
>>> > trying to ask is that let us say we do hit this scenario and we know
>>> it is
>>> > OK to go ahead and merge. What steps can we follow after detection of
>>> such
>>> > unwanted references.
>>> >
>>> > Regards,
>>> > Shahab
>>> >
>>> > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <yu...@gmail.com> wrote:
>>> >
>>> > > For automated detection of such scenario, you can reference the code
>>> in
>>> > > CatalogJanitor#cleanMergeRegion():
>>> > >
>>> > >       regionFs = HRegionFileSystem.openRegionFromFileSystem(
>>> > >
>>> > >           this.services.getConfiguration(), fs, tabledir,
>>> mergedRegion,
>>> > > true
>>> > > );
>>> > >
>>> > > ...
>>> > >
>>> > > Then regionFs.hasReferences(htd) would tell you whether the
>>> underlying
>>> > > region has reference files.
>>> > > Cheers
>>> > >
>>> > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <
>>> shahab.yunus@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > No. Not that I can recall but I can check.
>>> > > >
>>> > > > From resolution perspective, is there any way we can resolve this.
>>> More
>>> > > > importantly, anyway we can automate the resolution, if we run into
>>> such
>>> > > > issues in future? 'Cleaning the qualifier', that is.
>>> > > >
>>> > > > Regards,
>>> > > > Shahab
>>> > > >
>>> > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yu...@gmail.com>
>>> wrote:
>>> > > >
>>> > > > > One possibility was that region 7373f75181c71eb5061a6673cee15931
>>> was
>>> > > > > involved in some hbase snapshot.
>>> > > > >
>>> > > > > Was the underlying table being snapshotted in recent past ?
>>> > > > >
>>> > > > > Cheers
>>> > > > >
>>> > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
>>> > shahab.yunus@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Thanks again.
>>> > > > > >
>>> > > > > > But I have been polling for a while and it still doesn't
>>> merge. I
>>> > > mean
>>> > > > > this
>>> > > > > > particular region example that I sent you, I am trying to
>>> merge it
>>> > > > since
>>> > > > > > yesterday. I ran the polling-base code all night and I have to
>>> kill
>>> > > it.
>>> > > > > > Then in the morning, I tried manual merging through hbase
>>> shell and
>>> > > it
>>> > > > > > still doesn't merge. Note that the current polling logic
>>> doesnot
>>> > try
>>> > > to
>>> > > > > > call merge again. It just checks the region size.
>>> > > > > >
>>> > > > > > So how to clean it then? Or actually make it merge? Plus is
>>> this
>>> > > > > something
>>> > > > > > expected (a region keeping a reference)? How can we avoid it?
>>> > > > > >
>>> > > > > > Note that this is not limited to this table only. We are seeing
>>> > this
>>> > > in
>>> > > > > > other regions of other tables as well. Are we merging too fast?
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > Regards,
>>> > > > > > Shahab
>>> > > > > >
>>> > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yu...@gmail.com>
>>> > > wrote:
>>> > > > > >
>>> > > > > > > Polling as you described is fine.
>>> > > > > > >
>>> > > > > > > catalogJanitor.cleanMergeQualifier() is called by
>>> > > > > > > DispatchMergingRegionHandler.
>>> > > > > > >
>>> > > > > > > If clean was successful, you would see the following:
>>> > > > > > >
>>> > > > > > >       LOG.debug("Deleting region " +
>>> > > regionA.getRegionNameAsString()
>>> > > > +
>>> > > > > "
>>> > > > > > > and "
>>> > > > > > >
>>> > > > > > >           + regionB.getRegionNameAsString()
>>> > > > > > >
>>> > > > > > >           + " from fs because merged region no longer holds
>>> > > > > references");
>>> > > > > > >
>>> > > > > > > Assuming there was no log below in your master log:
>>> > > > > > >
>>> > > > > > >       LOG.error("Merged region " +
>>> region.getRegionNameAsString()
>>> > > > > > >
>>> > > > > > >           + " has only one merge qualifier in META.");
>>> > > > > > >
>>> > > > > > > It would be the case that 7373f75181c71eb5061a6673cee15931
>>> still
>>> > > had
>>> > > > > > > reference file.
>>> > > > > > >
>>> > > > > > > Cheers
>>> > > > > > >
>>> > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
>>> > > > shahab.yunus@gmail.com>
>>> > > > > > > wrote:
>>> > > > > > >
>>> > > > > > > > Hi Ted.
>>> > > > > > > >
>>> > > > > > > > The log bit is below at the end of the email. This is the
>>> > command
>>> > > > to
>>> > > > > > > merge
>>> > > > > > > > that I gave just now through hbase shell. forcible was
>>> false
>>> > but
>>> > > it
>>> > > > > > > behaves
>>> > > > > > > > similarly if forcible is true too. This is from master log.
>>> > > Indeed
>>> > > > > the
>>> > > > > > > > region merging was skipped! What does this mean? Data
>>> seems to
>>> > be
>>> > > > > > intact
>>> > > > > > > > for this table.
>>> > > > > > > >
>>> > > > > > > > Just to give you a background. This table was first merge
>>> by
>>> > the
>>> > > > auto
>>> > > > > > > mated
>>> > > > > > > > java application. What we are doing is that we are merging
>>> > tables
>>> > > > > > > > programmatically. As the HBaseAdmin.mergeRegions calls i
>>> async,
>>> > > we
>>> > > > > poll
>>> > > > > > > for
>>> > > > > > > > the number of regions getting lowered after this merge
>>> call.
>>> > The
>>> > > > > > > > application hangs and continues polling for ever as the
>>> > previous
>>> > > > > merge
>>> > > > > > > > didn't happen.
>>> > > > > > > >
>>> > > > > > > > In this poll loop, we do get the number of regions by a
>>> fresh
>>> > > call
>>> > > > to
>>> > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
>>> > > > > > > >
>>> > > > > > > > What are these merge qualifiers and what are we doing
>>> wrong or
>>> > > > should
>>> > > > > > do?
>>> > > > > > > >
>>> > > > > > > > In the polling loop we can somehow retry merge again? But
>>> how
>>> > can
>>> > > > we
>>> > > > > > > know,
>>> > > > > > > > that we need to call merge again as it works for some
>>> regions.
>>> > Is
>>> > > > the
>>> > > > > > > table
>>> > > > > > > > meta corrupted for some reason by the above logic?
>>> > > > > > > >
>>> > > > > > > > Thanks a lot.
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > >
>>> > > >
>>> >
>>> ------------------------------------------------------------------------
>>> > > > > > > >
>>> > > > > > > > 2014-11-14 11:25:02,643 INFO
>>> org.apache.zookeeper.ZooKeeper:
>>> > > > Session:
>>> > > > > > > > 0x348c7017707236b closed
>>> > > > > > > > 2014-11-14 11:25:02,643 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > > > EventThread
>>> > > > > > > > shut down
>>> > > > > > > > 2014-11-14 11:25:02,645 INFO
>>> org.apache.zookeeper.ZooKeeper:
>>> > > > > Initiating
>>> > > > > > > > client connection,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>>> > > > > > > > sessionTimeout=60000
>>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
>>> > > > > > > > baseZNode=/hbase
>>> > > > > > > > 2014-11-14 11:25:02,645 INFO
>>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>>> Process
>>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
>>> connecting
>>> > to
>>> > > > > > > ZooKeeper
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>>> > > > > > > > 2014-11-14 11:25:02,645 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > Opening
>>> > > > > > > > socket connection to server
>>> > ip-1010018.ec2.internal/1010019:2181.
>>> > > > > Will
>>> > > > > > > not
>>> > > > > > > > attempt to authenticate using SASL (unknown error)
>>> > > > > > > > 2014-11-14 11:25:02,646 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > Socket
>>> > > > > > > > connection established to
>>> ip-1010018.ec2.internal/1010019:2181,
>>> > > > > > > initiating
>>> > > > > > > > session
>>> > > > > > > > 2014-11-14 11:25:02,648 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > Session
>>> > > > > > > > establishment complete on server
>>> > > > > ip-1010018.ec2.internal/1010019:2181,
>>> > > > > > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
>>> > > > > > > > 2014-11-14 11:25:02,703 INFO
>>> org.apache.zookeeper.ZooKeeper:
>>> > > > Session:
>>> > > > > > > > 0x348c7017707236c closed
>>> > > > > > > > 2014-11-14 11:25:02,703 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > > > EventThread
>>> > > > > > > > shut down
>>> > > > > > > > 2014-11-14 11:25:30,713 INFO
>>> > > > > > > >
>>> > > >
>>> org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
>>> > > > > > Skip
>>> > > > > > > > merging regions
>>> > > > > > > >
>>> TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
>>> > > > > > > > because region 7373f75181c71eb5061a6673cee15931 has merge
>>> > > qualifier
>>> > > > > > > > 2014-11-14 11:25:41,383 INFO
>>> org.apache.zookeeper.ZooKeeper:
>>> > > > > Initiating
>>> > > > > > > > client connection,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>>> > > > > > > > sessionTimeout=60000
>>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
>>> > > > > > > > baseZNode=/hbase
>>> > > > > > > > 2014-11-14 11:25:41,384 INFO
>>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>>> Process
>>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
>>> connecting
>>> > to
>>> > > > > > > ZooKeeper
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>>> > > > > > > > 2014-11-14 11:25:41,384 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > Opening
>>> > > > > > > > socket connection to server
>>> > ip-1010018.ec2.internal/1010019:2181.
>>> > > > > Will
>>> > > > > > > not
>>> > > > > > > > attempt to authenticate using SASL (unknown error)
>>> > > > > > > > 2014-11-14 11:25:41,386 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > Socket
>>> > > > > > > > connection established to
>>> ip-1010018.ec2.internal/1010019:2181,
>>> > > > > > > initiating
>>> > > > > > > > session
>>> > > > > > > > 2014-11-14 11:25:41,389 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > Session
>>> > > > > > > > establishment complete on server
>>> > > > > ip-1010018.ec2.internal/1010019:2181,
>>> > > > > > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
>>> > > > > > > > 2014-11-14 11:25:41,397 INFO
>>> org.apache.zookeeper.ZooKeeper:
>>> > > > Session:
>>> > > > > > > > 0x348c7017707236e closed
>>> > > > > > > > 2014-11-14 11:25:41,398 INFO
>>> org.apache.zookeeper.ClientCnxn:
>>> > > > > > EventThread
>>> > > > > > > > shut down
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> ------------------------------------------------------------------------------------------------------------------------------------
>>> > > > > > > >
>>> > > > > > > > Regards,
>>> > > > > > > > Shahab
>>> > > > > > > >
>>> > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <
>>> yuzhihong@gmail.com>
>>> > > > > wrote:
>>> > > > > > > >
>>> > > > > > > > > Looking at DispatchMergingRegionHandler, it does some
>>> check
>>> > > > before
>>> > > > > > > > > initiating the merge.
>>> > > > > > > > > e.g.:
>>> > > > > > > > >
>>> > > > > > > > >       LOG.info("Skip merging regions " +
>>> > > > > > > region_a.getRegionNameAsString()
>>> > > > > > > > >
>>> > > > > > > > >           + ", " + region_b.getRegionNameAsString() + ",
>>> > > because
>>> > > > > > > region "
>>> > > > > > > > >
>>> > > > > > > > >           + (regionAHasMergeQualifier ?
>>> > > > region_a.getEncodedName() :
>>> > > > > > > > > region_b
>>> > > > > > > > >
>>> > > > > > > > >               .getEncodedName()) + " has merge
>>> qualifier");
>>> > > > > > > > >
>>> > > > > > > > > Can you take a look at master log around the time merge
>>> > request
>>> > > > was
>>> > > > > > > > issued
>>> > > > > > > > > to see if you can get some clue ?
>>> > > > > > > > >
>>> > > > > > > > > Cheers
>>> > > > > > > > >
>>> > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
>>> > > > > > shahab.yunus@gmail.com>
>>> > > > > > > > > wrote:
>>> > > > > > > > >
>>> > > > > > > > > > The documentation of online merge tool (merge_region)
>>> > states
>>> > > > that
>>> > > > > > if
>>> > > > > > > we
>>> > > > > > > > > > forcibly merge regions (by setting the 3rd attribute as
>>> > true)
>>> > > > > then
>>> > > > > > it
>>> > > > > > > > can
>>> > > > > > > > > > create overlapping regions. if this happens then will
>>> this
>>> > > > render
>>> > > > > > the
>>> > > > > > > > > > region or table unusable or it is just a performance
>>> hit? I
>>> > > > mean
>>> > > > > > how
>>> > > > > > > > > bigger
>>> > > > > > > > > > of a deal it is?
>>> > > > > > > > > >
>>> > > > > > > > > > Actually, we are merging regions using the
>>> programmatic API
>>> > > for
>>> > > > > > this
>>> > > > > > > > and
>>> > > > > > > > > > setting this flag ('forcible') as false. But for some
>>> > tables
>>> > > > (we
>>> > > > > > > > haven't
>>> > > > > > > > > > figured out a pattern yet, data is still accessible),
>>> merge
>>> > > of
>>> > > > > > > regions
>>> > > > > > > > do
>>> > > > > > > > > > not happen at all. Afterwards we tried with this flag =
>>> > true,
>>> > > > and
>>> > > > > > it
>>> > > > > > > > > still
>>> > > > > > > > > > doesn't merge them.
>>> > > > > > > > > >
>>> > > > > > > > > > CDH 5.1.0
>>> > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
>>> > > > > > > > > >
>>> > > > > > > > > > Regards,
>>> > > > > > > > > > Shahab
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Forcibly merging regions

Posted by Shahab Yunus <sh...@gmail.com>.

Digging deeper into the code, I came across this (this is from
CatalogJanitor#cleanMergeRegion):


...

...

HFileArchiver.archiveRegion
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29>(this.services
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services>.getConfiguration
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29>(),
fs, regionA);

HFileArchiver.archiveRegion
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/backup/HFileArchiver.java#HFileArchiver.archiveRegion%28org.apache.hadoop.conf.Configuration%2Corg.apache.hadoop.fs.FileSystem%2Corg.apache.hadoop.hbase.HRegionInfo%29>(this.services
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0services>.getConfiguration
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getConfiguration%28%29>(),
fs, regionB);

MetaEditor.deleteMergeQualifiers
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/catalog/MetaEditor.java#MetaEditor.deleteMergeQualifiers%28org.apache.hadoop.hbase.catalog.CatalogTracker%2Corg.apache.hadoop.hbase.HRegionInfo%29>(server
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-server/0.96.0-hadoop2/org/apache/hadoop/hbase/master/CatalogJanitor.java#CatalogJanitor.0server>.getCatalogTracker
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase-client/0.96.0-hadoop2/org/apache/hadoop/hbase/Server.java#Server.getCatalogTracker%28%29>(),
mergedRegion);

return true;


Do you think it is ok if we face this issue then we forcibly archive and
clean the regions ?

Regards,
Shahab

On Fri, Nov 14, 2014 at 1:10 PM, Shahab Yunus <sh...@gmail.com>
wrote:

> Yesterday, I believe.
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Shahab:
>> When was the last time compaction was run on this table ?
>>
>> Cheers
>>
>> On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>> > I see. Thanks.
>> >
>> > And if the region indeed has references, then can we somehow forcibly
>> > remove them? Is this even possible (if not advisable)? Basically what I
>> am
>> > trying to ask is that let us say we do hit this scenario and we know it
>> is
>> > OK to go ahead and merge. What steps can we follow after detection of
>> such
>> > unwanted references.
>> >
>> > Regards,
>> > Shahab
>> >
>> > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <yu...@gmail.com> wrote:
>> >
>> > > For automated detection of such scenario, you can reference the code
>> in
>> > > CatalogJanitor#cleanMergeRegion():
>> > >
>> > >       regionFs = HRegionFileSystem.openRegionFromFileSystem(
>> > >
>> > >           this.services.getConfiguration(), fs, tabledir,
>> mergedRegion,
>> > > true
>> > > );
>> > >
>> > > ...
>> > >
>> > > Then regionFs.hasReferences(htd) would tell you whether the underlying
>> > > region has reference files.
>> > > Cheers
>> > >
>> > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <shahab.yunus@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > No. Not that I can recall but I can check.
>> > > >
>> > > > From resolution perspective, is there any way we can resolve this.
>> More
>> > > > importantly, anyway we can automate the resolution, if we run into
>> such
>> > > > issues in future? 'Cleaning the qualifier', that is.
>> > > >
>> > > > Regards,
>> > > > Shahab
>> > > >
>> > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yu...@gmail.com>
>> wrote:
>> > > >
>> > > > > One possibility was that region 7373f75181c71eb5061a6673cee15931
>> was
>> > > > > involved in some hbase snapshot.
>> > > > >
>> > > > > Was the underlying table being snapshotted in recent past ?
>> > > > >
>> > > > > Cheers
>> > > > >
>> > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
>> > shahab.yunus@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Thanks again.
>> > > > > >
>> > > > > > But I have been polling for a while and it still doesn't merge.
>> I
>> > > mean
>> > > > > this
>> > > > > > particular region example that I sent you, I am trying to merge
>> it
>> > > > since
>> > > > > > yesterday. I ran the polling-base code all night and I have to
>> kill
>> > > it.
>> > > > > > Then in the morning, I tried manual merging through hbase shell
>> and
>> > > it
>> > > > > > still doesn't merge. Note that the current polling logic doesnot
>> > try
>> > > to
>> > > > > > call merge again. It just checks the region size.
>> > > > > >
>> > > > > > So how to clean it then? Or actually make it merge? Plus is this
>> > > > > something
>> > > > > > expected (a region keeping a reference)? How can we avoid it?
>> > > > > >
>> > > > > > Note that this is not limited to this table only. We are seeing
>> > this
>> > > in
>> > > > > > other regions of other tables as well. Are we merging too fast?
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > Regards,
>> > > > > > Shahab
>> > > > > >
>> > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yu...@gmail.com>
>> > > wrote:
>> > > > > >
>> > > > > > > Polling as you described is fine.
>> > > > > > >
>> > > > > > > catalogJanitor.cleanMergeQualifier() is called by
>> > > > > > > DispatchMergingRegionHandler.
>> > > > > > >
>> > > > > > > If clean was successful, you would see the following:
>> > > > > > >
>> > > > > > >       LOG.debug("Deleting region " +
>> > > regionA.getRegionNameAsString()
>> > > > +
>> > > > > "
>> > > > > > > and "
>> > > > > > >
>> > > > > > >           + regionB.getRegionNameAsString()
>> > > > > > >
>> > > > > > >           + " from fs because merged region no longer holds
>> > > > > references");
>> > > > > > >
>> > > > > > > Assuming there was no log below in your master log:
>> > > > > > >
>> > > > > > >       LOG.error("Merged region " +
>> region.getRegionNameAsString()
>> > > > > > >
>> > > > > > >           + " has only one merge qualifier in META.");
>> > > > > > >
>> > > > > > > It would be the case that 7373f75181c71eb5061a6673cee15931
>> still
>> > > had
>> > > > > > > reference file.
>> > > > > > >
>> > > > > > > Cheers
>> > > > > > >
>> > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
>> > > > shahab.yunus@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Ted.
>> > > > > > > >
>> > > > > > > > The log bit is below at the end of the email. This is the
>> > command
>> > > > to
>> > > > > > > merge
>> > > > > > > > that I gave just now through hbase shell. forcible was false
>> > but
>> > > it
>> > > > > > > behaves
>> > > > > > > > similarly if forcible is true too. This is from master log.
>> > > Indeed
>> > > > > the
>> > > > > > > > region merging was skipped! What does this mean? Data seems
>> to
>> > be
>> > > > > > intact
>> > > > > > > > for this table.
>> > > > > > > >
>> > > > > > > > Just to give you a background. This table was first merge by
>> > the
>> > > > auto
>> > > > > > > mated
>> > > > > > > > java application. What we are doing is that we are merging
>> > tables
>> > > > > > > > programmatically. As the HBaseAdmin.mergeRegions calls i
>> async,
>> > > we
>> > > > > poll
>> > > > > > > for
>> > > > > > > > the number of regions getting lowered after this merge call.
>> > The
>> > > > > > > > application hangs and continues polling for ever as the
>> > previous
>> > > > > merge
>> > > > > > > > didn't happen.
>> > > > > > > >
>> > > > > > > > In this poll loop, we do get the number of regions by a
>> fresh
>> > > call
>> > > > to
>> > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
>> > > > > > > >
>> > > > > > > > What are these merge qualifiers and what are we doing wrong
>> or
>> > > > should
>> > > > > > do?
>> > > > > > > >
>> > > > > > > > In the polling loop we can somehow retry merge again? But
>> how
>> > can
>> > > > we
>> > > > > > > know,
>> > > > > > > > that we need to call merge again as it works for some
>> regions.
>> > Is
>> > > > the
>> > > > > > > table
>> > > > > > > > meta corrupted for some reason by the above logic?
>> > > > > > > >
>> > > > > > > > Thanks a lot.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > >
>> > > >
>> > ------------------------------------------------------------------------
>> > > > > > > >
>> > > > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper:
>> > > > Session:
>> > > > > > > > 0x348c7017707236b closed
>> > > > > > > > 2014-11-14 11:25:02,643 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > > > EventThread
>> > > > > > > > shut down
>> > > > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper:
>> > > > > Initiating
>> > > > > > > > client connection,
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>> > > > > > > > sessionTimeout=60000
>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
>> > > > > > > > baseZNode=/hbase
>> > > > > > > > 2014-11-14 11:25:02,645 INFO
>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>> Process
>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
>> connecting
>> > to
>> > > > > > > ZooKeeper
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>> > > > > > > > 2014-11-14 11:25:02,645 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > Opening
>> > > > > > > > socket connection to server
>> > ip-1010018.ec2.internal/1010019:2181.
>> > > > > Will
>> > > > > > > not
>> > > > > > > > attempt to authenticate using SASL (unknown error)
>> > > > > > > > 2014-11-14 11:25:02,646 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > Socket
>> > > > > > > > connection established to
>> ip-1010018.ec2.internal/1010019:2181,
>> > > > > > > initiating
>> > > > > > > > session
>> > > > > > > > 2014-11-14 11:25:02,648 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > Session
>> > > > > > > > establishment complete on server
>> > > > > ip-1010018.ec2.internal/1010019:2181,
>> > > > > > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
>> > > > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper:
>> > > > Session:
>> > > > > > > > 0x348c7017707236c closed
>> > > > > > > > 2014-11-14 11:25:02,703 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > > > EventThread
>> > > > > > > > shut down
>> > > > > > > > 2014-11-14 11:25:30,713 INFO
>> > > > > > > >
>> > > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
>> > > > > > Skip
>> > > > > > > > merging regions
>> > > > > > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
>> > > > > > > > because region 7373f75181c71eb5061a6673cee15931 has merge
>> > > qualifier
>> > > > > > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper:
>> > > > > Initiating
>> > > > > > > > client connection,
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>> > > > > > > > sessionTimeout=60000
>> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
>> > > > > > > > baseZNode=/hbase
>> > > > > > > > 2014-11-14 11:25:41,384 INFO
>> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
>> Process
>> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
>> connecting
>> > to
>> > > > > > > ZooKeeper
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
>> > > > > > > > 2014-11-14 11:25:41,384 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > Opening
>> > > > > > > > socket connection to server
>> > ip-1010018.ec2.internal/1010019:2181.
>> > > > > Will
>> > > > > > > not
>> > > > > > > > attempt to authenticate using SASL (unknown error)
>> > > > > > > > 2014-11-14 11:25:41,386 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > Socket
>> > > > > > > > connection established to
>> ip-1010018.ec2.internal/1010019:2181,
>> > > > > > > initiating
>> > > > > > > > session
>> > > > > > > > 2014-11-14 11:25:41,389 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > Session
>> > > > > > > > establishment complete on server
>> > > > > ip-1010018.ec2.internal/1010019:2181,
>> > > > > > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
>> > > > > > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper:
>> > > > Session:
>> > > > > > > > 0x348c7017707236e closed
>> > > > > > > > 2014-11-14 11:25:41,398 INFO
>> org.apache.zookeeper.ClientCnxn:
>> > > > > > EventThread
>> > > > > > > > shut down
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> ------------------------------------------------------------------------------------------------------------------------------------
>> > > > > > > >
>> > > > > > > > Regards,
>> > > > > > > > Shahab
>> > > > > > > >
>> > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <
>> yuzhihong@gmail.com>
>> > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Looking at DispatchMergingRegionHandler, it does some
>> check
>> > > > before
>> > > > > > > > > initiating the merge.
>> > > > > > > > > e.g.:
>> > > > > > > > >
>> > > > > > > > >       LOG.info("Skip merging regions " +
>> > > > > > > region_a.getRegionNameAsString()
>> > > > > > > > >
>> > > > > > > > >           + ", " + region_b.getRegionNameAsString() + ",
>> > > because
>> > > > > > > region "
>> > > > > > > > >
>> > > > > > > > >           + (regionAHasMergeQualifier ?
>> > > > region_a.getEncodedName() :
>> > > > > > > > > region_b
>> > > > > > > > >
>> > > > > > > > >               .getEncodedName()) + " has merge
>> qualifier");
>> > > > > > > > >
>> > > > > > > > > Can you take a look at master log around the time merge
>> > request
>> > > > was
>> > > > > > > > issued
>> > > > > > > > > to see if you can get some clue ?
>> > > > > > > > >
>> > > > > > > > > Cheers
>> > > > > > > > >
>> > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
>> > > > > > shahab.yunus@gmail.com>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > The documentation of online merge tool (merge_region)
>> > states
>> > > > that
>> > > > > > if
>> > > > > > > we
>> > > > > > > > > > forcibly merge regions (by setting the 3rd attribute as
>> > true)
>> > > > > then
>> > > > > > it
>> > > > > > > > can
>> > > > > > > > > > create overlapping regions. if this happens then will
>> this
>> > > > render
>> > > > > > the
>> > > > > > > > > > region or table unusable or it is just a performance
>> hit? I
>> > > > mean
>> > > > > > how
>> > > > > > > > > bigger
>> > > > > > > > > > of a deal it is?
>> > > > > > > > > >
>> > > > > > > > > > Actually, we are merging regions using the programmatic
>> API
>> > > for
>> > > > > > this
>> > > > > > > > and
>> > > > > > > > > > setting this flag ('forcible') as false. But for some
>> > tables
>> > > > (we
>> > > > > > > > haven't
>> > > > > > > > > > figured out a pattern yet, data is still accessible),
>> merge
>> > > of
>> > > > > > > regions
>> > > > > > > > do
>> > > > > > > > > > not happen at all. Afterwards we tried with this flag =
>> > true,
>> > > > and
>> > > > > > it
>> > > > > > > > > still
>> > > > > > > > > > doesn't merge them.
>> > > > > > > > > >
>> > > > > > > > > > CDH 5.1.0
>> > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
>> > > > > > > > > >
>> > > > > > > > > > Regards,
>> > > > > > > > > > Shahab
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Forcibly merging regions

Posted by Shahab Yunus <sh...@gmail.com>.

Yesterday, I believe.

Regards,
Shahab

On Fri, Nov 14, 2014 at 1:07 PM, Ted Yu <yu...@gmail.com> wrote:

> Shahab:
> When was the last time compaction was run on this table ?
>
> Cheers
>
> On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > I see. Thanks.
> >
> > And if the region indeed has references, then can we somehow forcibly
> > remove them? Is this even possible (if not advisable)? Basically what I
> am
> > trying to ask is that let us say we do hit this scenario and we know it
> is
> > OK to go ahead and merge. What steps can we follow after detection of
> such
> > unwanted references.
> >
> > Regards,
> > Shahab
> >
> > On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > For automated detection of such scenario, you can reference the code in
> > > CatalogJanitor#cleanMergeRegion():
> > >
> > >       regionFs = HRegionFileSystem.openRegionFromFileSystem(
> > >
> > >           this.services.getConfiguration(), fs, tabledir, mergedRegion,
> > > true
> > > );
> > >
> > > ...
> > >
> > > Then regionFs.hasReferences(htd) would tell you whether the underlying
> > > region has reference files.
> > > Cheers
> > >
> > > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <sh...@gmail.com>
> > > wrote:
> > >
> > > > No. Not that I can recall but I can check.
> > > >
> > > > From resolution perspective, is there any way we can resolve this.
> More
> > > > importantly, anyway we can automate the resolution, if we run into
> such
> > > > issues in future? 'Cleaning the qualifier', that is.
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >
> > > > > One possibility was that region 7373f75181c71eb5061a6673cee15931
> was
> > > > > involved in some hbase snapshot.
> > > > >
> > > > > Was the underlying table being snapshotted in recent past ?
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
> > shahab.yunus@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks again.
> > > > > >
> > > > > > But I have been polling for a while and it still doesn't merge. I
> > > mean
> > > > > this
> > > > > > particular region example that I sent you, I am trying to merge
> it
> > > > since
> > > > > > yesterday. I ran the polling-base code all night and I have to
> kill
> > > it.
> > > > > > Then in the morning, I tried manual merging through hbase shell
> and
> > > it
> > > > > > still doesn't merge. Note that the current polling logic doesnot
> > try
> > > to
> > > > > > call merge again. It just checks the region size.
> > > > > >
> > > > > > So how to clean it then? Or actually make it merge? Plus is this
> > > > > something
> > > > > > expected (a region keeping a reference)? How can we avoid it?
> > > > > >
> > > > > > Note that this is not limited to this table only. We are seeing
> > this
> > > in
> > > > > > other regions of other tables as well. Are we merging too fast?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Shahab
> > > > > >
> > > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Polling as you described is fine.
> > > > > > >
> > > > > > > catalogJanitor.cleanMergeQualifier() is called by
> > > > > > > DispatchMergingRegionHandler.
> > > > > > >
> > > > > > > If clean was successful, you would see the following:
> > > > > > >
> > > > > > >       LOG.debug("Deleting region " +
> > > regionA.getRegionNameAsString()
> > > > +
> > > > > "
> > > > > > > and "
> > > > > > >
> > > > > > >           + regionB.getRegionNameAsString()
> > > > > > >
> > > > > > >           + " from fs because merged region no longer holds
> > > > > references");
> > > > > > >
> > > > > > > Assuming there was no log below in your master log:
> > > > > > >
> > > > > > >       LOG.error("Merged region " +
> region.getRegionNameAsString()
> > > > > > >
> > > > > > >           + " has only one merge qualifier in META.");
> > > > > > >
> > > > > > > It would be the case that 7373f75181c71eb5061a6673cee15931
> still
> > > had
> > > > > > > reference file.
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
> > > > shahab.yunus@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Ted.
> > > > > > > >
> > > > > > > > The log bit is below at the end of the email. This is the
> > command
> > > > to
> > > > > > > merge
> > > > > > > > that I gave just now through hbase shell. forcible was false
> > but
> > > it
> > > > > > > behaves
> > > > > > > > similarly if forcible is true too. This is from master log.
> > > Indeed
> > > > > the
> > > > > > > > region merging was skipped! What does this mean? Data seems
> to
> > be
> > > > > > intact
> > > > > > > > for this table.
> > > > > > > >
> > > > > > > > Just to give you a background. This table was first merge by
> > the
> > > > auto
> > > > > > > mated
> > > > > > > > java application. What we are doing is that we are merging
> > tables
> > > > > > > > programmatically. As the HBaseAdmin.mergeRegions calls i
> async,
> > > we
> > > > > poll
> > > > > > > for
> > > > > > > > the number of regions getting lowered after this merge call.
> > The
> > > > > > > > application hangs and continues polling for ever as the
> > previous
> > > > > merge
> > > > > > > > didn't happen.
> > > > > > > >
> > > > > > > > In this poll loop, we do get the number of regions by a fresh
> > > call
> > > > to
> > > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
> > > > > > > >
> > > > > > > > What are these merge qualifiers and what are we doing wrong
> or
> > > > should
> > > > > > do?
> > > > > > > >
> > > > > > > > In the polling loop we can somehow retry merge again? But how
> > can
> > > > we
> > > > > > > know,
> > > > > > > > that we need to call merge again as it works for some
> regions.
> > Is
> > > > the
> > > > > > > table
> > > > > > > > meta corrupted for some reason by the above logic?
> > > > > > > >
> > > > > > > > Thanks a lot.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > ------------------------------------------------------------------------
> > > > > > > >
> > > > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper:
> > > > Session:
> > > > > > > > 0x348c7017707236b closed
> > > > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn:
> > > > > > EventThread
> > > > > > > > shut down
> > > > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper:
> > > > > Initiating
> > > > > > > > client connection,
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > > > sessionTimeout=60000
> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > > > > baseZNode=/hbase
> > > > > > > > 2014-11-14 11:25:02,645 INFO
> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> Process
> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
> connecting
> > to
> > > > > > > ZooKeeper
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn:
> > > > Opening
> > > > > > > > socket connection to server
> > ip-1010018.ec2.internal/1010019:2181.
> > > > > Will
> > > > > > > not
> > > > > > > > attempt to authenticate using SASL (unknown error)
> > > > > > > > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn:
> > > > Socket
> > > > > > > > connection established to
> ip-1010018.ec2.internal/1010019:2181,
> > > > > > > initiating
> > > > > > > > session
> > > > > > > > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn:
> > > > Session
> > > > > > > > establishment complete on server
> > > > > ip-1010018.ec2.internal/1010019:2181,
> > > > > > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
> > > > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper:
> > > > Session:
> > > > > > > > 0x348c7017707236c closed
> > > > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn:
> > > > > > EventThread
> > > > > > > > shut down
> > > > > > > > 2014-11-14 11:25:30,713 INFO
> > > > > > > >
> > > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> > > > > > Skip
> > > > > > > > merging regions
> > > > > > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > > > > > > > because region 7373f75181c71eb5061a6673cee15931 has merge
> > > qualifier
> > > > > > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper:
> > > > > Initiating
> > > > > > > > client connection,
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > > > sessionTimeout=60000
> > > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > > > > baseZNode=/hbase
> > > > > > > > 2014-11-14 11:25:41,384 INFO
> > > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
> Process
> > > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2
> connecting
> > to
> > > > > > > ZooKeeper
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > > > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn:
> > > > Opening
> > > > > > > > socket connection to server
> > ip-1010018.ec2.internal/1010019:2181.
> > > > > Will
> > > > > > > not
> > > > > > > > attempt to authenticate using SASL (unknown error)
> > > > > > > > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn:
> > > > Socket
> > > > > > > > connection established to
> ip-1010018.ec2.internal/1010019:2181,
> > > > > > > initiating
> > > > > > > > session
> > > > > > > > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn:
> > > > Session
> > > > > > > > establishment complete on server
> > > > > ip-1010018.ec2.internal/1010019:2181,
> > > > > > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
> > > > > > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper:
> > > > Session:
> > > > > > > > 0x348c7017707236e closed
> > > > > > > > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn:
> > > > > > EventThread
> > > > > > > > shut down
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Shahab
> > > > > > > >
> > > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <
> yuzhihong@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Looking at DispatchMergingRegionHandler, it does some check
> > > > before
> > > > > > > > > initiating the merge.
> > > > > > > > > e.g.:
> > > > > > > > >
> > > > > > > > >       LOG.info("Skip merging regions " +
> > > > > > > region_a.getRegionNameAsString()
> > > > > > > > >
> > > > > > > > >           + ", " + region_b.getRegionNameAsString() + ",
> > > because
> > > > > > > region "
> > > > > > > > >
> > > > > > > > >           + (regionAHasMergeQualifier ?
> > > > region_a.getEncodedName() :
> > > > > > > > > region_b
> > > > > > > > >
> > > > > > > > >               .getEncodedName()) + " has merge qualifier");
> > > > > > > > >
> > > > > > > > > Can you take a look at master log around the time merge
> > request
> > > > was
> > > > > > > > issued
> > > > > > > > > to see if you can get some clue ?
> > > > > > > > >
> > > > > > > > > Cheers
> > > > > > > > >
> > > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> > > > > > shahab.yunus@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > The documentation of online merge tool (merge_region)
> > states
> > > > that
> > > > > > if
> > > > > > > we
> > > > > > > > > > forcibly merge regions (by setting the 3rd attribute as
> > true)
> > > > > then
> > > > > > it
> > > > > > > > can
> > > > > > > > > > create overlapping regions. if this happens then will
> this
> > > > render
> > > > > > the
> > > > > > > > > > region or table unusable or it is just a performance
> hit? I
> > > > mean
> > > > > > how
> > > > > > > > > bigger
> > > > > > > > > > of a deal it is?
> > > > > > > > > >
> > > > > > > > > > Actually, we are merging regions using the programmatic
> API
> > > for
> > > > > > this
> > > > > > > > and
> > > > > > > > > > setting this flag ('forcible') as false. But for some
> > tables
> > > > (we
> > > > > > > > haven't
> > > > > > > > > > figured out a pattern yet, data is still accessible),
> merge
> > > of
> > > > > > > regions
> > > > > > > > do
> > > > > > > > > > not happen at all. Afterwards we tried with this flag =
> > true,
> > > > and
> > > > > > it
> > > > > > > > > still
> > > > > > > > > > doesn't merge them.
> > > > > > > > > >
> > > > > > > > > > CDH 5.1.0
> > > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Shahab
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Forcibly merging regions

Posted by Ted Yu <yu...@gmail.com>.

Shahab:
When was the last time compaction was run on this table ?

Cheers

On Fri, Nov 14, 2014 at 9:58 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> I see. Thanks.
>
> And if the region indeed has references, then can we somehow forcibly
> remove them? Is this even possible (if not advisable)? Basically what I am
> trying to ask is that let us say we do hit this scenario and we know it is
> OK to go ahead and merge. What steps can we follow after detection of such
> unwanted references.
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > For automated detection of such scenario, you can reference the code in
> > CatalogJanitor#cleanMergeRegion():
> >
> >       regionFs = HRegionFileSystem.openRegionFromFileSystem(
> >
> >           this.services.getConfiguration(), fs, tabledir, mergedRegion,
> > true
> > );
> >
> > ...
> >
> > Then regionFs.hasReferences(htd) would tell you whether the underlying
> > region has reference files.
> > Cheers
> >
> > On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> > > No. Not that I can recall but I can check.
> > >
> > > From resolution perspective, is there any way we can resolve this. More
> > > importantly, anyway we can automate the resolution, if we run into such
> > > issues in future? 'Cleaning the qualifier', that is.
> > >
> > > Regards,
> > > Shahab
> > >
> > > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > One possibility was that region 7373f75181c71eb5061a6673cee15931 was
> > > > involved in some hbase snapshot.
> > > >
> > > > Was the underlying table being snapshotted in recent past ?
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks again.
> > > > >
> > > > > But I have been polling for a while and it still doesn't merge. I
> > mean
> > > > this
> > > > > particular region example that I sent you, I am trying to merge it
> > > since
> > > > > yesterday. I ran the polling-base code all night and I have to kill
> > it.
> > > > > Then in the morning, I tried manual merging through hbase shell and
> > it
> > > > > still doesn't merge. Note that the current polling logic doesnot
> try
> > to
> > > > > call merge again. It just checks the region size.
> > > > >
> > > > > So how to clean it then? Or actually make it merge? Plus is this
> > > > something
> > > > > expected (a region keeping a reference)? How can we avoid it?
> > > > >
> > > > > Note that this is not limited to this table only. We are seeing
> this
> > in
> > > > > other regions of other tables as well. Are we merging too fast?
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > >
> > > > > > Polling as you described is fine.
> > > > > >
> > > > > > catalogJanitor.cleanMergeQualifier() is called by
> > > > > > DispatchMergingRegionHandler.
> > > > > >
> > > > > > If clean was successful, you would see the following:
> > > > > >
> > > > > >       LOG.debug("Deleting region " +
> > regionA.getRegionNameAsString()
> > > +
> > > > "
> > > > > > and "
> > > > > >
> > > > > >           + regionB.getRegionNameAsString()
> > > > > >
> > > > > >           + " from fs because merged region no longer holds
> > > > references");
> > > > > >
> > > > > > Assuming there was no log below in your master log:
> > > > > >
> > > > > >       LOG.error("Merged region " + region.getRegionNameAsString()
> > > > > >
> > > > > >           + " has only one merge qualifier in META.");
> > > > > >
> > > > > > It would be the case that 7373f75181c71eb5061a6673cee15931 still
> > had
> > > > > > reference file.
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
> > > shahab.yunus@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ted.
> > > > > > >
> > > > > > > The log bit is below at the end of the email. This is the
> command
> > > to
> > > > > > merge
> > > > > > > that I gave just now through hbase shell. forcible was false
> but
> > it
> > > > > > behaves
> > > > > > > similarly if forcible is true too. This is from master log.
> > Indeed
> > > > the
> > > > > > > region merging was skipped! What does this mean? Data seems to
> be
> > > > > intact
> > > > > > > for this table.
> > > > > > >
> > > > > > > Just to give you a background. This table was first merge by
> the
> > > auto
> > > > > > mated
> > > > > > > java application. What we are doing is that we are merging
> tables
> > > > > > > programmatically. As the HBaseAdmin.mergeRegions calls i async,
> > we
> > > > poll
> > > > > > for
> > > > > > > the number of regions getting lowered after this merge call.
> The
> > > > > > > application hangs and continues polling for ever as the
> previous
> > > > merge
> > > > > > > didn't happen.
> > > > > > >
> > > > > > > In this poll loop, we do get the number of regions by a fresh
> > call
> > > to
> > > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
> > > > > > >
> > > > > > > What are these merge qualifiers and what are we doing wrong or
> > > should
> > > > > do?
> > > > > > >
> > > > > > > In the polling loop we can somehow retry merge again? But how
> can
> > > we
> > > > > > know,
> > > > > > > that we need to call merge again as it works for some regions.
> Is
> > > the
> > > > > > table
> > > > > > > meta corrupted for some reason by the above logic?
> > > > > > >
> > > > > > > Thanks a lot.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > >
> ------------------------------------------------------------------------
> > > > > > >
> > > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper:
> > > Session:
> > > > > > > 0x348c7017707236b closed
> > > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn:
> > > > > EventThread
> > > > > > > shut down
> > > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper:
> > > > Initiating
> > > > > > > client connection,
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > > sessionTimeout=60000
> > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > > > baseZNode=/hbase
> > > > > > > 2014-11-14 11:25:02,645 INFO
> > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting
> to
> > > > > > ZooKeeper
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn:
> > > Opening
> > > > > > > socket connection to server
> ip-1010018.ec2.internal/1010019:2181.
> > > > Will
> > > > > > not
> > > > > > > attempt to authenticate using SASL (unknown error)
> > > > > > > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn:
> > > Socket
> > > > > > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > > > > > initiating
> > > > > > > session
> > > > > > > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn:
> > > Session
> > > > > > > establishment complete on server
> > > > ip-1010018.ec2.internal/1010019:2181,
> > > > > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
> > > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper:
> > > Session:
> > > > > > > 0x348c7017707236c closed
> > > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn:
> > > > > EventThread
> > > > > > > shut down
> > > > > > > 2014-11-14 11:25:30,713 INFO
> > > > > > >
> > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> > > > > Skip
> > > > > > > merging regions
> > > > > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > > > > > > because region 7373f75181c71eb5061a6673cee15931 has merge
> > qualifier
> > > > > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper:
> > > > Initiating
> > > > > > > client connection,
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > > sessionTimeout=60000
> > > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > > > baseZNode=/hbase
> > > > > > > 2014-11-14 11:25:41,384 INFO
> > > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting
> to
> > > > > > ZooKeeper
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn:
> > > Opening
> > > > > > > socket connection to server
> ip-1010018.ec2.internal/1010019:2181.
> > > > Will
> > > > > > not
> > > > > > > attempt to authenticate using SASL (unknown error)
> > > > > > > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn:
> > > Socket
> > > > > > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > > > > > initiating
> > > > > > > session
> > > > > > > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn:
> > > Session
> > > > > > > establishment complete on server
> > > > ip-1010018.ec2.internal/1010019:2181,
> > > > > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
> > > > > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper:
> > > Session:
> > > > > > > 0x348c7017707236e closed
> > > > > > > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn:
> > > > > EventThread
> > > > > > > shut down
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > > > > > >
> > > > > > > Regards,
> > > > > > > Shahab
> > > > > > >
> > > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yu...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > Looking at DispatchMergingRegionHandler, it does some check
> > > before
> > > > > > > > initiating the merge.
> > > > > > > > e.g.:
> > > > > > > >
> > > > > > > >       LOG.info("Skip merging regions " +
> > > > > > region_a.getRegionNameAsString()
> > > > > > > >
> > > > > > > >           + ", " + region_b.getRegionNameAsString() + ",
> > because
> > > > > > region "
> > > > > > > >
> > > > > > > >           + (regionAHasMergeQualifier ?
> > > region_a.getEncodedName() :
> > > > > > > > region_b
> > > > > > > >
> > > > > > > >               .getEncodedName()) + " has merge qualifier");
> > > > > > > >
> > > > > > > > Can you take a look at master log around the time merge
> request
> > > was
> > > > > > > issued
> > > > > > > > to see if you can get some clue ?
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> > > > > shahab.yunus@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > The documentation of online merge tool (merge_region)
> states
> > > that
> > > > > if
> > > > > > we
> > > > > > > > > forcibly merge regions (by setting the 3rd attribute as
> true)
> > > > then
> > > > > it
> > > > > > > can
> > > > > > > > > create overlapping regions. if this happens then will this
> > > render
> > > > > the
> > > > > > > > > region or table unusable or it is just a performance hit? I
> > > mean
> > > > > how
> > > > > > > > bigger
> > > > > > > > > of a deal it is?
> > > > > > > > >
> > > > > > > > > Actually, we are merging regions using the programmatic API
> > for
> > > > > this
> > > > > > > and
> > > > > > > > > setting this flag ('forcible') as false. But for some
> tables
> > > (we
> > > > > > > haven't
> > > > > > > > > figured out a pattern yet, data is still accessible), merge
> > of
> > > > > > regions
> > > > > > > do
> > > > > > > > > not happen at all. Afterwards we tried with this flag =
> true,
> > > and
> > > > > it
> > > > > > > > still
> > > > > > > > > doesn't merge them.
> > > > > > > > >
> > > > > > > > > CDH 5.1.0
> > > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Shahab
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Forcibly merging regions

Posted by Shahab Yunus <sh...@gmail.com>.

I see. Thanks.

And if the region indeed has references, then can we somehow forcibly
remove them? Is this even possible (if not advisable)? Basically what I am
trying to ask is that let us say we do hit this scenario and we know it is
OK to go ahead and merge. What steps can we follow after detection of such
unwanted references.

Regards,
Shahab

On Fri, Nov 14, 2014 at 12:50 PM, Ted Yu <yu...@gmail.com> wrote:

> For automated detection of such scenario, you can reference the code in
> CatalogJanitor#cleanMergeRegion():
>
>       regionFs = HRegionFileSystem.openRegionFromFileSystem(
>
>           this.services.getConfiguration(), fs, tabledir, mergedRegion,
> true
> );
>
> ...
>
> Then regionFs.hasReferences(htd) would tell you whether the underlying
> region has reference files.
> Cheers
>
> On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > No. Not that I can recall but I can check.
> >
> > From resolution perspective, is there any way we can resolve this. More
> > importantly, anyway we can automate the resolution, if we run into such
> > issues in future? 'Cleaning the qualifier', that is.
> >
> > Regards,
> > Shahab
> >
> > On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > One possibility was that region 7373f75181c71eb5061a6673cee15931 was
> > > involved in some hbase snapshot.
> > >
> > > Was the underlying table being snapshotted in recent past ?
> > >
> > > Cheers
> > >
> > > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <sh...@gmail.com>
> > > wrote:
> > >
> > > > Thanks again.
> > > >
> > > > But I have been polling for a while and it still doesn't merge. I
> mean
> > > this
> > > > particular region example that I sent you, I am trying to merge it
> > since
> > > > yesterday. I ran the polling-base code all night and I have to kill
> it.
> > > > Then in the morning, I tried manual merging through hbase shell and
> it
> > > > still doesn't merge. Note that the current polling logic doesnot try
> to
> > > > call merge again. It just checks the region size.
> > > >
> > > > So how to clean it then? Or actually make it merge? Plus is this
> > > something
> > > > expected (a region keeping a reference)? How can we avoid it?
> > > >
> > > > Note that this is not limited to this table only. We are seeing this
> in
> > > > other regions of other tables as well. Are we merging too fast?
> > > >
> > > >
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >
> > > > > Polling as you described is fine.
> > > > >
> > > > > catalogJanitor.cleanMergeQualifier() is called by
> > > > > DispatchMergingRegionHandler.
> > > > >
> > > > > If clean was successful, you would see the following:
> > > > >
> > > > >       LOG.debug("Deleting region " +
> regionA.getRegionNameAsString()
> > +
> > > "
> > > > > and "
> > > > >
> > > > >           + regionB.getRegionNameAsString()
> > > > >
> > > > >           + " from fs because merged region no longer holds
> > > references");
> > > > >
> > > > > Assuming there was no log below in your master log:
> > > > >
> > > > >       LOG.error("Merged region " + region.getRegionNameAsString()
> > > > >
> > > > >           + " has only one merge qualifier in META.");
> > > > >
> > > > > It would be the case that 7373f75181c71eb5061a6673cee15931 still
> had
> > > > > reference file.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
> > shahab.yunus@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Ted.
> > > > > >
> > > > > > The log bit is below at the end of the email. This is the command
> > to
> > > > > merge
> > > > > > that I gave just now through hbase shell. forcible was false but
> it
> > > > > behaves
> > > > > > similarly if forcible is true too. This is from master log.
> Indeed
> > > the
> > > > > > region merging was skipped! What does this mean? Data seems to be
> > > > intact
> > > > > > for this table.
> > > > > >
> > > > > > Just to give you a background. This table was first merge by the
> > auto
> > > > > mated
> > > > > > java application. What we are doing is that we are merging tables
> > > > > > programmatically. As the HBaseAdmin.mergeRegions calls i async,
> we
> > > poll
> > > > > for
> > > > > > the number of regions getting lowered after this merge call. The
> > > > > > application hangs and continues polling for ever as the previous
> > > merge
> > > > > > didn't happen.
> > > > > >
> > > > > > In this poll loop, we do get the number of regions by a fresh
> call
> > to
> > > > > > HBaseAdmin.getTableRegions(tableName).getSize().
> > > > > >
> > > > > > What are these merge qualifiers and what are we doing wrong or
> > should
> > > > do?
> > > > > >
> > > > > > In the polling loop we can somehow retry merge again? But how can
> > we
> > > > > know,
> > > > > > that we need to call merge again as it works for some regions. Is
> > the
> > > > > table
> > > > > > meta corrupted for some reason by the above logic?
> > > > > >
> > > > > > Thanks a lot.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > >
> > ------------------------------------------------------------------------
> > > > > >
> > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper:
> > Session:
> > > > > > 0x348c7017707236b closed
> > > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn:
> > > > EventThread
> > > > > > shut down
> > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper:
> > > Initiating
> > > > > > client connection,
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > sessionTimeout=60000
> > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > > baseZNode=/hbase
> > > > > > 2014-11-14 11:25:02,645 INFO
> > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
> > > > > ZooKeeper
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn:
> > Opening
> > > > > > socket connection to server ip-1010018.ec2.internal/1010019:2181.
> > > Will
> > > > > not
> > > > > > attempt to authenticate using SASL (unknown error)
> > > > > > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn:
> > Socket
> > > > > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > > > > initiating
> > > > > > session
> > > > > > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn:
> > Session
> > > > > > establishment complete on server
> > > ip-1010018.ec2.internal/1010019:2181,
> > > > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
> > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper:
> > Session:
> > > > > > 0x348c7017707236c closed
> > > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn:
> > > > EventThread
> > > > > > shut down
> > > > > > 2014-11-14 11:25:30,713 INFO
> > > > > >
> > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> > > > Skip
> > > > > > merging regions
> > > > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > > > > > because region 7373f75181c71eb5061a6673cee15931 has merge
> qualifier
> > > > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper:
> > > Initiating
> > > > > > client connection,
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > sessionTimeout=60000
> > > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > > baseZNode=/hbase
> > > > > > 2014-11-14 11:25:41,384 INFO
> > > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
> > > > > ZooKeeper
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn:
> > Opening
> > > > > > socket connection to server ip-1010018.ec2.internal/1010019:2181.
> > > Will
> > > > > not
> > > > > > attempt to authenticate using SASL (unknown error)
> > > > > > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn:
> > Socket
> > > > > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > > > > initiating
> > > > > > session
> > > > > > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn:
> > Session
> > > > > > establishment complete on server
> > > ip-1010018.ec2.internal/1010019:2181,
> > > > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
> > > > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper:
> > Session:
> > > > > > 0x348c7017707236e closed
> > > > > > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn:
> > > > EventThread
> > > > > > shut down
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > > > > >
> > > > > > Regards,
> > > > > > Shahab
> > > > > >
> > > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Looking at DispatchMergingRegionHandler, it does some check
> > before
> > > > > > > initiating the merge.
> > > > > > > e.g.:
> > > > > > >
> > > > > > >       LOG.info("Skip merging regions " +
> > > > > region_a.getRegionNameAsString()
> > > > > > >
> > > > > > >           + ", " + region_b.getRegionNameAsString() + ",
> because
> > > > > region "
> > > > > > >
> > > > > > >           + (regionAHasMergeQualifier ?
> > region_a.getEncodedName() :
> > > > > > > region_b
> > > > > > >
> > > > > > >               .getEncodedName()) + " has merge qualifier");
> > > > > > >
> > > > > > > Can you take a look at master log around the time merge request
> > was
> > > > > > issued
> > > > > > > to see if you can get some clue ?
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> > > > shahab.yunus@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > The documentation of online merge tool (merge_region) states
> > that
> > > > if
> > > > > we
> > > > > > > > forcibly merge regions (by setting the 3rd attribute as true)
> > > then
> > > > it
> > > > > > can
> > > > > > > > create overlapping regions. if this happens then will this
> > render
> > > > the
> > > > > > > > region or table unusable or it is just a performance hit? I
> > mean
> > > > how
> > > > > > > bigger
> > > > > > > > of a deal it is?
> > > > > > > >
> > > > > > > > Actually, we are merging regions using the programmatic API
> for
> > > > this
> > > > > > and
> > > > > > > > setting this flag ('forcible') as false. But for some tables
> > (we
> > > > > > haven't
> > > > > > > > figured out a pattern yet, data is still accessible), merge
> of
> > > > > regions
> > > > > > do
> > > > > > > > not happen at all. Afterwards we tried with this flag = true,
> > and
> > > > it
> > > > > > > still
> > > > > > > > doesn't merge them.
> > > > > > > >
> > > > > > > > CDH 5.1.0
> > > > > > > > (Hbase is 0.98.1-cdh5.1.0)
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Shahab
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Forcibly merging regions

Posted by Ted Yu <yu...@gmail.com>.

For automated detection of such scenario, you can reference the code in
CatalogJanitor#cleanMergeRegion():

      regionFs = HRegionFileSystem.openRegionFromFileSystem(

          this.services.getConfiguration(), fs, tabledir, mergedRegion, true
);

...

Then regionFs.hasReferences(htd) would tell you whether the underlying
region has reference files.
Cheers

On Fri, Nov 14, 2014 at 9:39 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> No. Not that I can recall but I can check.
>
> From resolution perspective, is there any way we can resolve this. More
> importantly, anyway we can automate the resolution, if we run into such
> issues in future? 'Cleaning the qualifier', that is.
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > One possibility was that region 7373f75181c71eb5061a6673cee15931 was
> > involved in some hbase snapshot.
> >
> > Was the underlying table being snapshotted in recent past ?
> >
> > Cheers
> >
> > On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> > > Thanks again.
> > >
> > > But I have been polling for a while and it still doesn't merge. I mean
> > this
> > > particular region example that I sent you, I am trying to merge it
> since
> > > yesterday. I ran the polling-base code all night and I have to kill it.
> > > Then in the morning, I tried manual merging through hbase shell and it
> > > still doesn't merge. Note that the current polling logic doesnot try to
> > > call merge again. It just checks the region size.
> > >
> > > So how to clean it then? Or actually make it merge? Plus is this
> > something
> > > expected (a region keeping a reference)? How can we avoid it?
> > >
> > > Note that this is not limited to this table only. We are seeing this in
> > > other regions of other tables as well. Are we merging too fast?
> > >
> > >
> > >
> > > Regards,
> > > Shahab
> > >
> > > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Polling as you described is fine.
> > > >
> > > > catalogJanitor.cleanMergeQualifier() is called by
> > > > DispatchMergingRegionHandler.
> > > >
> > > > If clean was successful, you would see the following:
> > > >
> > > >       LOG.debug("Deleting region " + regionA.getRegionNameAsString()
> +
> > "
> > > > and "
> > > >
> > > >           + regionB.getRegionNameAsString()
> > > >
> > > >           + " from fs because merged region no longer holds
> > references");
> > > >
> > > > Assuming there was no log below in your master log:
> > > >
> > > >       LOG.error("Merged region " + region.getRegionNameAsString()
> > > >
> > > >           + " has only one merge qualifier in META.");
> > > >
> > > > It would be the case that 7373f75181c71eb5061a6673cee15931 still had
> > > > reference file.
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Ted.
> > > > >
> > > > > The log bit is below at the end of the email. This is the command
> to
> > > > merge
> > > > > that I gave just now through hbase shell. forcible was false but it
> > > > behaves
> > > > > similarly if forcible is true too. This is from master log. Indeed
> > the
> > > > > region merging was skipped! What does this mean? Data seems to be
> > > intact
> > > > > for this table.
> > > > >
> > > > > Just to give you a background. This table was first merge by the
> auto
> > > > mated
> > > > > java application. What we are doing is that we are merging tables
> > > > > programmatically. As the HBaseAdmin.mergeRegions calls i async, we
> > poll
> > > > for
> > > > > the number of regions getting lowered after this merge call. The
> > > > > application hangs and continues polling for ever as the previous
> > merge
> > > > > didn't happen.
> > > > >
> > > > > In this poll loop, we do get the number of regions by a fresh call
> to
> > > > > HBaseAdmin.getTableRegions(tableName).getSize().
> > > > >
> > > > > What are these merge qualifiers and what are we doing wrong or
> should
> > > do?
> > > > >
> > > > > In the polling loop we can somehow retry merge again? But how can
> we
> > > > know,
> > > > > that we need to call merge again as it works for some regions. Is
> the
> > > > table
> > > > > meta corrupted for some reason by the above logic?
> > > > >
> > > > > Thanks a lot.
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> ------------------------------------------------------------------------
> > > > >
> > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper:
> Session:
> > > > > 0x348c7017707236b closed
> > > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn:
> > > EventThread
> > > > > shut down
> > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper:
> > Initiating
> > > > > client connection,
> > > > >
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > sessionTimeout=60000
> > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > >
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > baseZNode=/hbase
> > > > > 2014-11-14 11:25:02,645 INFO
> > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
> > > > ZooKeeper
> > > > >
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn:
> Opening
> > > > > socket connection to server ip-1010018.ec2.internal/1010019:2181.
> > Will
> > > > not
> > > > > attempt to authenticate using SASL (unknown error)
> > > > > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn:
> Socket
> > > > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > > > initiating
> > > > > session
> > > > > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn:
> Session
> > > > > establishment complete on server
> > ip-1010018.ec2.internal/1010019:2181,
> > > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
> > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper:
> Session:
> > > > > 0x348c7017707236c closed
> > > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn:
> > > EventThread
> > > > > shut down
> > > > > 2014-11-14 11:25:30,713 INFO
> > > > >
> org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> > > Skip
> > > > > merging regions
> > > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > > > >
> > > > >
> > > >
> > >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > > > > because region 7373f75181c71eb5061a6673cee15931 has merge qualifier
> > > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper:
> > Initiating
> > > > > client connection,
> > > > >
> > > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > sessionTimeout=60000
> > watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > > >
> > > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > > baseZNode=/hbase
> > > > > 2014-11-14 11:25:41,384 INFO
> > > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
> > > > ZooKeeper
> > > > >
> > > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn:
> Opening
> > > > > socket connection to server ip-1010018.ec2.internal/1010019:2181.
> > Will
> > > > not
> > > > > attempt to authenticate using SASL (unknown error)
> > > > > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn:
> Socket
> > > > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > > > initiating
> > > > > session
> > > > > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn:
> Session
> > > > > establishment complete on server
> > ip-1010018.ec2.internal/1010019:2181,
> > > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
> > > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper:
> Session:
> > > > > 0x348c7017707236e closed
> > > > > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn:
> > > EventThread
> > > > > shut down
> > > > >
> > > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > >
> > > > > > Looking at DispatchMergingRegionHandler, it does some check
> before
> > > > > > initiating the merge.
> > > > > > e.g.:
> > > > > >
> > > > > >       LOG.info("Skip merging regions " +
> > > > region_a.getRegionNameAsString()
> > > > > >
> > > > > >           + ", " + region_b.getRegionNameAsString() + ", because
> > > > region "
> > > > > >
> > > > > >           + (regionAHasMergeQualifier ?
> region_a.getEncodedName() :
> > > > > > region_b
> > > > > >
> > > > > >               .getEncodedName()) + " has merge qualifier");
> > > > > >
> > > > > > Can you take a look at master log around the time merge request
> was
> > > > > issued
> > > > > > to see if you can get some clue ?
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> > > shahab.yunus@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > The documentation of online merge tool (merge_region) states
> that
> > > if
> > > > we
> > > > > > > forcibly merge regions (by setting the 3rd attribute as true)
> > then
> > > it
> > > > > can
> > > > > > > create overlapping regions. if this happens then will this
> render
> > > the
> > > > > > > region or table unusable or it is just a performance hit? I
> mean
> > > how
> > > > > > bigger
> > > > > > > of a deal it is?
> > > > > > >
> > > > > > > Actually, we are merging regions using the programmatic API for
> > > this
> > > > > and
> > > > > > > setting this flag ('forcible') as false. But for some tables
> (we
> > > > > haven't
> > > > > > > figured out a pattern yet, data is still accessible), merge of
> > > > regions
> > > > > do
> > > > > > > not happen at all. Afterwards we tried with this flag = true,
> and
> > > it
> > > > > > still
> > > > > > > doesn't merge them.
> > > > > > >
> > > > > > > CDH 5.1.0
> > > > > > > (Hbase is 0.98.1-cdh5.1.0)
> > > > > > >
> > > > > > > Regards,
> > > > > > > Shahab
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Forcibly merging regions

Posted by Shahab Yunus <sh...@gmail.com>.

No. Not that I can recall but I can check.

>From resolution perspective, is there any way we can resolve this. More
importantly, anyway we can automate the resolution, if we run into such
issues in future? 'Cleaning the qualifier', that is.

Regards,
Shahab

On Fri, Nov 14, 2014 at 12:12 PM, Ted Yu <yu...@gmail.com> wrote:

> One possibility was that region 7373f75181c71eb5061a6673cee15931 was
> involved in some hbase snapshot.
>
> Was the underlying table being snapshotted in recent past ?
>
> Cheers
>
> On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > Thanks again.
> >
> > But I have been polling for a while and it still doesn't merge. I mean
> this
> > particular region example that I sent you, I am trying to merge it since
> > yesterday. I ran the polling-base code all night and I have to kill it.
> > Then in the morning, I tried manual merging through hbase shell and it
> > still doesn't merge. Note that the current polling logic doesnot try to
> > call merge again. It just checks the region size.
> >
> > So how to clean it then? Or actually make it merge? Plus is this
> something
> > expected (a region keeping a reference)? How can we avoid it?
> >
> > Note that this is not limited to this table only. We are seeing this in
> > other regions of other tables as well. Are we merging too fast?
> >
> >
> >
> > Regards,
> > Shahab
> >
> > On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Polling as you described is fine.
> > >
> > > catalogJanitor.cleanMergeQualifier() is called by
> > > DispatchMergingRegionHandler.
> > >
> > > If clean was successful, you would see the following:
> > >
> > >       LOG.debug("Deleting region " + regionA.getRegionNameAsString() +
> "
> > > and "
> > >
> > >           + regionB.getRegionNameAsString()
> > >
> > >           + " from fs because merged region no longer holds
> references");
> > >
> > > Assuming there was no log below in your master log:
> > >
> > >       LOG.error("Merged region " + region.getRegionNameAsString()
> > >
> > >           + " has only one merge qualifier in META.");
> > >
> > > It would be the case that 7373f75181c71eb5061a6673cee15931 still had
> > > reference file.
> > >
> > > Cheers
> > >
> > > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <sh...@gmail.com>
> > > wrote:
> > >
> > > > Hi Ted.
> > > >
> > > > The log bit is below at the end of the email. This is the command to
> > > merge
> > > > that I gave just now through hbase shell. forcible was false but it
> > > behaves
> > > > similarly if forcible is true too. This is from master log. Indeed
> the
> > > > region merging was skipped! What does this mean? Data seems to be
> > intact
> > > > for this table.
> > > >
> > > > Just to give you a background. This table was first merge by the auto
> > > mated
> > > > java application. What we are doing is that we are merging tables
> > > > programmatically. As the HBaseAdmin.mergeRegions calls i async, we
> poll
> > > for
> > > > the number of regions getting lowered after this merge call. The
> > > > application hangs and continues polling for ever as the previous
> merge
> > > > didn't happen.
> > > >
> > > > In this poll loop, we do get the number of regions by a fresh call to
> > > > HBaseAdmin.getTableRegions(tableName).getSize().
> > > >
> > > > What are these merge qualifiers and what are we doing wrong or should
> > do?
> > > >
> > > > In the polling loop we can somehow retry merge again? But how can we
> > > know,
> > > > that we need to call merge again as it works for some regions. Is the
> > > table
> > > > meta corrupted for some reason by the above logic?
> > > >
> > > > Thanks a lot.
> > > >
> > > >
> > > >
> > > >
> > ------------------------------------------------------------------------
> > > >
> > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper: Session:
> > > > 0x348c7017707236b closed
> > > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn:
> > EventThread
> > > > shut down
> > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper:
> Initiating
> > > > client connection,
> > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > sessionTimeout=60000
> watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > baseZNode=/hbase
> > > > 2014-11-14 11:25:02,645 INFO
> > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
> > > ZooKeeper
> > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn: Opening
> > > > socket connection to server ip-1010018.ec2.internal/1010019:2181.
> Will
> > > not
> > > > attempt to authenticate using SASL (unknown error)
> > > > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn: Socket
> > > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > > initiating
> > > > session
> > > > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn: Session
> > > > establishment complete on server
> ip-1010018.ec2.internal/1010019:2181,
> > > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
> > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper: Session:
> > > > 0x348c7017707236c closed
> > > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn:
> > EventThread
> > > > shut down
> > > > 2014-11-14 11:25:30,713 INFO
> > > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> > Skip
> > > > merging regions
> > > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > > >
> > > >
> > >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > > > because region 7373f75181c71eb5061a6673cee15931 has merge qualifier
> > > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper:
> Initiating
> > > > client connection,
> > > >
> > > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > sessionTimeout=60000
> watcher=catalogtracker-on-hconnection-0x47d865f2,
> > > >
> > > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > > baseZNode=/hbase
> > > > 2014-11-14 11:25:41,384 INFO
> > > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
> > > ZooKeeper
> > > >
> > > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn: Opening
> > > > socket connection to server ip-1010018.ec2.internal/1010019:2181.
> Will
> > > not
> > > > attempt to authenticate using SASL (unknown error)
> > > > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn: Socket
> > > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > > initiating
> > > > session
> > > > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn: Session
> > > > establishment complete on server
> ip-1010018.ec2.internal/1010019:2181,
> > > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
> > > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper: Session:
> > > > 0x348c7017707236e closed
> > > > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn:
> > EventThread
> > > > shut down
> > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >
> > > > > Looking at DispatchMergingRegionHandler, it does some check before
> > > > > initiating the merge.
> > > > > e.g.:
> > > > >
> > > > >       LOG.info("Skip merging regions " +
> > > region_a.getRegionNameAsString()
> > > > >
> > > > >           + ", " + region_b.getRegionNameAsString() + ", because
> > > region "
> > > > >
> > > > >           + (regionAHasMergeQualifier ? region_a.getEncodedName() :
> > > > > region_b
> > > > >
> > > > >               .getEncodedName()) + " has merge qualifier");
> > > > >
> > > > > Can you take a look at master log around the time merge request was
> > > > issued
> > > > > to see if you can get some clue ?
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> > shahab.yunus@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > The documentation of online merge tool (merge_region) states that
> > if
> > > we
> > > > > > forcibly merge regions (by setting the 3rd attribute as true)
> then
> > it
> > > > can
> > > > > > create overlapping regions. if this happens then will this render
> > the
> > > > > > region or table unusable or it is just a performance hit? I mean
> > how
> > > > > bigger
> > > > > > of a deal it is?
> > > > > >
> > > > > > Actually, we are merging regions using the programmatic API for
> > this
> > > > and
> > > > > > setting this flag ('forcible') as false. But for some tables (we
> > > > haven't
> > > > > > figured out a pattern yet, data is still accessible), merge of
> > > regions
> > > > do
> > > > > > not happen at all. Afterwards we tried with this flag = true, and
> > it
> > > > > still
> > > > > > doesn't merge them.
> > > > > >
> > > > > > CDH 5.1.0
> > > > > > (Hbase is 0.98.1-cdh5.1.0)
> > > > > >
> > > > > > Regards,
> > > > > > Shahab
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Forcibly merging regions

Posted by Ted Yu <yu...@gmail.com>.

One possibility was that region 7373f75181c71eb5061a6673cee15931 was
involved in some hbase snapshot.

Was the underlying table being snapshotted in recent past ?

Cheers

On Fri, Nov 14, 2014 at 9:05 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> Thanks again.
>
> But I have been polling for a while and it still doesn't merge. I mean this
> particular region example that I sent you, I am trying to merge it since
> yesterday. I ran the polling-base code all night and I have to kill it.
> Then in the morning, I tried manual merging through hbase shell and it
> still doesn't merge. Note that the current polling logic doesnot try to
> call merge again. It just checks the region size.
>
> So how to clean it then? Or actually make it merge? Plus is this something
> expected (a region keeping a reference)? How can we avoid it?
>
> Note that this is not limited to this table only. We are seeing this in
> other regions of other tables as well. Are we merging too fast?
>
>
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Polling as you described is fine.
> >
> > catalogJanitor.cleanMergeQualifier() is called by
> > DispatchMergingRegionHandler.
> >
> > If clean was successful, you would see the following:
> >
> >       LOG.debug("Deleting region " + regionA.getRegionNameAsString() + "
> > and "
> >
> >           + regionB.getRegionNameAsString()
> >
> >           + " from fs because merged region no longer holds references");
> >
> > Assuming there was no log below in your master log:
> >
> >       LOG.error("Merged region " + region.getRegionNameAsString()
> >
> >           + " has only one merge qualifier in META.");
> >
> > It would be the case that 7373f75181c71eb5061a6673cee15931 still had
> > reference file.
> >
> > Cheers
> >
> > On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> > > Hi Ted.
> > >
> > > The log bit is below at the end of the email. This is the command to
> > merge
> > > that I gave just now through hbase shell. forcible was false but it
> > behaves
> > > similarly if forcible is true too. This is from master log. Indeed the
> > > region merging was skipped! What does this mean? Data seems to be
> intact
> > > for this table.
> > >
> > > Just to give you a background. This table was first merge by the auto
> > mated
> > > java application. What we are doing is that we are merging tables
> > > programmatically. As the HBaseAdmin.mergeRegions calls i async, we poll
> > for
> > > the number of regions getting lowered after this merge call. The
> > > application hangs and continues polling for ever as the previous merge
> > > didn't happen.
> > >
> > > In this poll loop, we do get the number of regions by a fresh call to
> > > HBaseAdmin.getTableRegions(tableName).getSize().
> > >
> > > What are these merge qualifiers and what are we doing wrong or should
> do?
> > >
> > > In the polling loop we can somehow retry merge again? But how can we
> > know,
> > > that we need to call merge again as it works for some regions. Is the
> > table
> > > meta corrupted for some reason by the above logic?
> > >
> > > Thanks a lot.
> > >
> > >
> > >
> > >
> ------------------------------------------------------------------------
> > >
> > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper: Session:
> > > 0x348c7017707236b closed
> > > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn:
> EventThread
> > > shut down
> > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper: Initiating
> > > client connection,
> > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2,
> > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > baseZNode=/hbase
> > > 2014-11-14 11:25:02,645 INFO
> > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
> > ZooKeeper
> > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn: Opening
> > > socket connection to server ip-1010018.ec2.internal/1010019:2181. Will
> > not
> > > attempt to authenticate using SASL (unknown error)
> > > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn: Socket
> > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > initiating
> > > session
> > > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn: Session
> > > establishment complete on server ip-1010018.ec2.internal/1010019:2181,
> > > sessionid = 0x348c7017707236c, negotiated timeout = 60000
> > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper: Session:
> > > 0x348c7017707236c closed
> > > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn:
> EventThread
> > > shut down
> > > 2014-11-14 11:25:30,713 INFO
> > > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler:
> Skip
> > > merging regions
> > > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> > >
> > >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > > because region 7373f75181c71eb5061a6673cee15931 has merge qualifier
> > > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper: Initiating
> > > client connection,
> > >
> > >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2,
> > >
> > >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > > baseZNode=/hbase
> > > 2014-11-14 11:25:41,384 INFO
> > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
> > ZooKeeper
> > >
> > >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn: Opening
> > > socket connection to server ip-1010018.ec2.internal/1010019:2181. Will
> > not
> > > attempt to authenticate using SASL (unknown error)
> > > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn: Socket
> > > connection established to ip-1010018.ec2.internal/1010019:2181,
> > initiating
> > > session
> > > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn: Session
> > > establishment complete on server ip-1010018.ec2.internal/1010019:2181,
> > > sessionid = 0x348c7017707236e, negotiated timeout = 60000
> > > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper: Session:
> > > 0x348c7017707236e closed
> > > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn:
> EventThread
> > > shut down
> > >
> > >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> > >
> > > Regards,
> > > Shahab
> > >
> > > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Looking at DispatchMergingRegionHandler, it does some check before
> > > > initiating the merge.
> > > > e.g.:
> > > >
> > > >       LOG.info("Skip merging regions " +
> > region_a.getRegionNameAsString()
> > > >
> > > >           + ", " + region_b.getRegionNameAsString() + ", because
> > region "
> > > >
> > > >           + (regionAHasMergeQualifier ? region_a.getEncodedName() :
> > > > region_b
> > > >
> > > >               .getEncodedName()) + " has merge qualifier");
> > > >
> > > > Can you take a look at master log around the time merge request was
> > > issued
> > > > to see if you can get some clue ?
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <
> shahab.yunus@gmail.com>
> > > > wrote:
> > > >
> > > > > The documentation of online merge tool (merge_region) states that
> if
> > we
> > > > > forcibly merge regions (by setting the 3rd attribute as true) then
> it
> > > can
> > > > > create overlapping regions. if this happens then will this render
> the
> > > > > region or table unusable or it is just a performance hit? I mean
> how
> > > > bigger
> > > > > of a deal it is?
> > > > >
> > > > > Actually, we are merging regions using the programmatic API for
> this
> > > and
> > > > > setting this flag ('forcible') as false. But for some tables (we
> > > haven't
> > > > > figured out a pattern yet, data is still accessible), merge of
> > regions
> > > do
> > > > > not happen at all. Afterwards we tried with this flag = true, and
> it
> > > > still
> > > > > doesn't merge them.
> > > > >
> > > > > CDH 5.1.0
> > > > > (Hbase is 0.98.1-cdh5.1.0)
> > > > >
> > > > > Regards,
> > > > > Shahab
> > > > >
> > > >
> > >
> >
>

Re: Forcibly merging regions

Posted by Shahab Yunus <sh...@gmail.com>.

Thanks again.

But I have been polling for a while and it still doesn't merge. I mean this
particular region example that I sent you, I am trying to merge it since
yesterday. I ran the polling-base code all night and I have to kill it.
Then in the morning, I tried manual merging through hbase shell and it
still doesn't merge. Note that the current polling logic doesnot try to
call merge again. It just checks the region size.

So how to clean it then? Or actually make it merge? Plus is this something
expected (a region keeping a reference)? How can we avoid it?

Note that this is not limited to this table only. We are seeing this in
other regions of other tables as well. Are we merging too fast?



Regards,
Shahab

On Fri, Nov 14, 2014 at 11:58 AM, Ted Yu <yu...@gmail.com> wrote:

> Polling as you described is fine.
>
> catalogJanitor.cleanMergeQualifier() is called by
> DispatchMergingRegionHandler.
>
> If clean was successful, you would see the following:
>
>       LOG.debug("Deleting region " + regionA.getRegionNameAsString() + "
> and "
>
>           + regionB.getRegionNameAsString()
>
>           + " from fs because merged region no longer holds references");
>
> Assuming there was no log below in your master log:
>
>       LOG.error("Merged region " + region.getRegionNameAsString()
>
>           + " has only one merge qualifier in META.");
>
> It would be the case that 7373f75181c71eb5061a6673cee15931 still had
> reference file.
>
> Cheers
>
> On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > Hi Ted.
> >
> > The log bit is below at the end of the email. This is the command to
> merge
> > that I gave just now through hbase shell. forcible was false but it
> behaves
> > similarly if forcible is true too. This is from master log. Indeed the
> > region merging was skipped! What does this mean? Data seems to be intact
> > for this table.
> >
> > Just to give you a background. This table was first merge by the auto
> mated
> > java application. What we are doing is that we are merging tables
> > programmatically. As the HBaseAdmin.mergeRegions calls i async, we poll
> for
> > the number of regions getting lowered after this merge call. The
> > application hangs and continues polling for ever as the previous merge
> > didn't happen.
> >
> > In this poll loop, we do get the number of regions by a fresh call to
> > HBaseAdmin.getTableRegions(tableName).getSize().
> >
> > What are these merge qualifiers and what are we doing wrong or should do?
> >
> > In the polling loop we can somehow retry merge again? But how can we
> know,
> > that we need to call merge again as it works for some regions. Is the
> table
> > meta corrupted for some reason by the above logic?
> >
> > Thanks a lot.
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper: Session:
> > 0x348c7017707236b closed
> > 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn: EventThread
> > shut down
> > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper: Initiating
> > client connection,
> >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2,
> >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > baseZNode=/hbase
> > 2014-11-14 11:25:02,645 INFO
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
> ZooKeeper
> >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server ip-1010018.ec2.internal/1010019:2181. Will
> not
> > attempt to authenticate using SASL (unknown error)
> > 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn: Socket
> > connection established to ip-1010018.ec2.internal/1010019:2181,
> initiating
> > session
> > 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn: Session
> > establishment complete on server ip-1010018.ec2.internal/1010019:2181,
> > sessionid = 0x348c7017707236c, negotiated timeout = 60000
> > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper: Session:
> > 0x348c7017707236c closed
> > 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn: EventThread
> > shut down
> > 2014-11-14 11:25:30,713 INFO
> > org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler: Skip
> > merging regions
> > TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> >
> >
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> > because region 7373f75181c71eb5061a6673cee15931 has merge qualifier
> > 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper: Initiating
> > client connection,
> >
> >
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2,
> >
> >
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> > baseZNode=/hbase
> > 2014-11-14 11:25:41,384 INFO
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> > identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to
> ZooKeeper
> >
> >
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> > 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server ip-1010018.ec2.internal/1010019:2181. Will
> not
> > attempt to authenticate using SASL (unknown error)
> > 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn: Socket
> > connection established to ip-1010018.ec2.internal/1010019:2181,
> initiating
> > session
> > 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn: Session
> > establishment complete on server ip-1010018.ec2.internal/1010019:2181,
> > sessionid = 0x348c7017707236e, negotiated timeout = 60000
> > 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper: Session:
> > 0x348c7017707236e closed
> > 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn: EventThread
> > shut down
> >
> >
> ------------------------------------------------------------------------------------------------------------------------------------
> >
> > Regards,
> > Shahab
> >
> > On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Looking at DispatchMergingRegionHandler, it does some check before
> > > initiating the merge.
> > > e.g.:
> > >
> > >       LOG.info("Skip merging regions " +
> region_a.getRegionNameAsString()
> > >
> > >           + ", " + region_b.getRegionNameAsString() + ", because
> region "
> > >
> > >           + (regionAHasMergeQualifier ? region_a.getEncodedName() :
> > > region_b
> > >
> > >               .getEncodedName()) + " has merge qualifier");
> > >
> > > Can you take a look at master log around the time merge request was
> > issued
> > > to see if you can get some clue ?
> > >
> > > Cheers
> > >
> > > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <sh...@gmail.com>
> > > wrote:
> > >
> > > > The documentation of online merge tool (merge_region) states that if
> we
> > > > forcibly merge regions (by setting the 3rd attribute as true) then it
> > can
> > > > create overlapping regions. if this happens then will this render the
> > > > region or table unusable or it is just a performance hit? I mean how
> > > bigger
> > > > of a deal it is?
> > > >
> > > > Actually, we are merging regions using the programmatic API for this
> > and
> > > > setting this flag ('forcible') as false. But for some tables (we
> > haven't
> > > > figured out a pattern yet, data is still accessible), merge of
> regions
> > do
> > > > not happen at all. Afterwards we tried with this flag = true, and it
> > > still
> > > > doesn't merge them.
> > > >
> > > > CDH 5.1.0
> > > > (Hbase is 0.98.1-cdh5.1.0)
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > >
> >
>

Re: Forcibly merging regions

Posted by Ted Yu <yu...@gmail.com>.

Polling as you described is fine.

catalogJanitor.cleanMergeQualifier() is called by
DispatchMergingRegionHandler.

If clean was successful, you would see the following:

      LOG.debug("Deleting region " + regionA.getRegionNameAsString() + "
and "

          + regionB.getRegionNameAsString()

          + " from fs because merged region no longer holds references");

Assuming there was no log below in your master log:

      LOG.error("Merged region " + region.getRegionNameAsString()

          + " has only one merge qualifier in META.");

It would be the case that 7373f75181c71eb5061a6673cee15931 still had
reference file.

Cheers

On Fri, Nov 14, 2014 at 8:35 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> Hi Ted.
>
> The log bit is below at the end of the email. This is the command to merge
> that I gave just now through hbase shell. forcible was false but it behaves
> similarly if forcible is true too. This is from master log. Indeed the
> region merging was skipped! What does this mean? Data seems to be intact
> for this table.
>
> Just to give you a background. This table was first merge by the auto mated
> java application. What we are doing is that we are merging tables
> programmatically. As the HBaseAdmin.mergeRegions calls i async, we poll for
> the number of regions getting lowered after this merge call. The
> application hangs and continues polling for ever as the previous merge
> didn't happen.
>
> In this poll loop, we do get the number of regions by a fresh call to
> HBaseAdmin.getTableRegions(tableName).getSize().
>
> What are these merge qualifiers and what are we doing wrong or should do?
>
> In the polling loop we can somehow retry merge again? But how can we know,
> that we need to call merge again as it works for some regions. Is the table
> meta corrupted for some reason by the above logic?
>
> Thanks a lot.
>
>
>
> ------------------------------------------------------------------------
>
> 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x348c7017707236b closed
> 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection,
>
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2,
>
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> baseZNode=/hbase
> 2014-11-14 11:25:02,645 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to ZooKeeper
>
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server ip-1010018.ec2.internal/1010019:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to ip-1010018.ec2.internal/1010019:2181, initiating
> session
> 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server ip-1010018.ec2.internal/1010019:2181,
> sessionid = 0x348c7017707236c, negotiated timeout = 60000
> 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x348c7017707236c closed
> 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2014-11-14 11:25:30,713 INFO
> org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler: Skip
> merging regions
> TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
>
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> because region 7373f75181c71eb5061a6673cee15931 has merge qualifier
> 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection,
>
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2,
>
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> baseZNode=/hbase
> 2014-11-14 11:25:41,384 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to ZooKeeper
>
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server ip-1010018.ec2.internal/1010019:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to ip-1010018.ec2.internal/1010019:2181, initiating
> session
> 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server ip-1010018.ec2.internal/1010019:2181,
> sessionid = 0x348c7017707236e, negotiated timeout = 60000
> 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x348c7017707236e closed
> 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Looking at DispatchMergingRegionHandler, it does some check before
> > initiating the merge.
> > e.g.:
> >
> >       LOG.info("Skip merging regions " + region_a.getRegionNameAsString()
> >
> >           + ", " + region_b.getRegionNameAsString() + ", because region "
> >
> >           + (regionAHasMergeQualifier ? region_a.getEncodedName() :
> > region_b
> >
> >               .getEncodedName()) + " has merge qualifier");
> >
> > Can you take a look at master log around the time merge request was
> issued
> > to see if you can get some clue ?
> >
> > Cheers
> >
> > On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <sh...@gmail.com>
> > wrote:
> >
> > > The documentation of online merge tool (merge_region) states that if we
> > > forcibly merge regions (by setting the 3rd attribute as true) then it
> can
> > > create overlapping regions. if this happens then will this render the
> > > region or table unusable or it is just a performance hit? I mean how
> > bigger
> > > of a deal it is?
> > >
> > > Actually, we are merging regions using the programmatic API for this
> and
> > > setting this flag ('forcible') as false. But for some tables (we
> haven't
> > > figured out a pattern yet, data is still accessible), merge of regions
> do
> > > not happen at all. Afterwards we tried with this flag = true, and it
> > still
> > > doesn't merge them.
> > >
> > > CDH 5.1.0
> > > (Hbase is 0.98.1-cdh5.1.0)
> > >
> > > Regards,
> > > Shahab
> > >
> >
>

Re: Forcibly merging regions

Posted by Shahab Yunus <sh...@gmail.com>.

FYI, Ted, I see this exact similar issue being discussed in the past here
as well:

http://mail-archives.apache.org/mod_mbox/hbase-user/201406.mbox/%3CCAKrkF=thi8g4Ks=viqgC+Y=iVUQysOGOQ41RmKUTFRiUnaL1mQ@mail.gmail.com%3E

Regards,
Shahab

On Fri, Nov 14, 2014 at 11:35 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> Hi Ted.
>
> The log bit is below at the end of the email. This is the command to merge
> that I gave just now through hbase shell. forcible was false but it behaves
> similarly if forcible is true too. This is from master log. Indeed the
> region merging was skipped! What does this mean? Data seems to be intact
> for this table.
>
> Just to give you a background. This table was first merge by the auto
> mated java application. What we are doing is that we are merging tables
> programmatically. As the HBaseAdmin.mergeRegions calls i async, we poll for
> the number of regions getting lowered after this merge call. The
> application hangs and continues polling for ever as the previous merge
> didn't happen.
>
> In this poll loop, we do get the number of regions by a fresh call to
> HBaseAdmin.getTableRegions(tableName).getSize().
>
> What are these merge qualifiers and what are we doing wrong or should do?
>
> In the polling loop we can somehow retry merge again? But how can we know,
> that we need to call merge again as it works for some regions. Is the table
> meta corrupted for some reason by the above logic?
>
> Thanks a lot.
>
>
>
> ------------------------------------------------------------------------
>
> 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x348c7017707236b closed
> 2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection,
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2,
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> baseZNode=/hbase
> 2014-11-14 11:25:02,645 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to ZooKeeper
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> 2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server ip-1010018.ec2.internal/1010019:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to ip-1010018.ec2.internal/1010019:2181, initiating
> session
> 2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server ip-1010018.ec2.internal/1010019:2181,
> sessionid = 0x348c7017707236c, negotiated timeout = 60000
> 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x348c7017707236c closed
> 2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2014-11-14 11:25:30,713 INFO
> org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler: Skip
> merging regions
> TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
> TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
> because region 7373f75181c71eb5061a6673cee15931 has merge qualifier
> 2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection,
> connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2,
> quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
> baseZNode=/hbase
> 2014-11-14 11:25:41,384 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
> identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to ZooKeeper
> ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
> 2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server ip-1010018.ec2.internal/1010019:2181. Will not
> attempt to authenticate using SASL (unknown error)
> 2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to ip-1010018.ec2.internal/1010019:2181, initiating
> session
> 2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server ip-1010018.ec2.internal/1010019:2181,
> sessionid = 0x348c7017707236e, negotiated timeout = 60000
> 2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x348c7017707236e closed
> 2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Regards,
> Shahab
>
> On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Looking at DispatchMergingRegionHandler, it does some check before
>> initiating the merge.
>> e.g.:
>>
>>       LOG.info("Skip merging regions " + region_a.getRegionNameAsString()
>>
>>           + ", " + region_b.getRegionNameAsString() + ", because region "
>>
>>           + (regionAHasMergeQualifier ? region_a.getEncodedName() :
>> region_b
>>
>>               .getEncodedName()) + " has merge qualifier");
>>
>> Can you take a look at master log around the time merge request was issued
>> to see if you can get some clue ?
>>
>> Cheers
>>
>> On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <sh...@gmail.com>
>> wrote:
>>
>> > The documentation of online merge tool (merge_region) states that if we
>> > forcibly merge regions (by setting the 3rd attribute as true) then it
>> can
>> > create overlapping regions. if this happens then will this render the
>> > region or table unusable or it is just a performance hit? I mean how
>> bigger
>> > of a deal it is?
>> >
>> > Actually, we are merging regions using the programmatic API for this and
>> > setting this flag ('forcible') as false. But for some tables (we haven't
>> > figured out a pattern yet, data is still accessible), merge of regions
>> do
>> > not happen at all. Afterwards we tried with this flag = true, and it
>> still
>> > doesn't merge them.
>> >
>> > CDH 5.1.0
>> > (Hbase is 0.98.1-cdh5.1.0)
>> >
>> > Regards,
>> > Shahab
>> >
>>
>
>

Re: Forcibly merging regions

Posted by Shahab Yunus <sh...@gmail.com>.

Hi Ted.

The log bit is below at the end of the email. This is the command to merge
that I gave just now through hbase shell. forcible was false but it behaves
similarly if forcible is true too. This is from master log. Indeed the
region merging was skipped! What does this mean? Data seems to be intact
for this table.

Just to give you a background. This table was first merge by the auto mated
java application. What we are doing is that we are merging tables
programmatically. As the HBaseAdmin.mergeRegions calls i async, we poll for
the number of regions getting lowered after this merge call. The
application hangs and continues polling for ever as the previous merge
didn't happen.

In this poll loop, we do get the number of regions by a fresh call to
HBaseAdmin.getTableRegions(tableName).getSize().

What are these merge qualifiers and what are we doing wrong or should do?

In the polling loop we can somehow retry merge again? But how can we know,
that we need to call merge again as it works for some regions. Is the table
meta corrupted for some reason by the above logic?

Thanks a lot.



------------------------------------------------------------------------

2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ZooKeeper: Session:
0x348c7017707236b closed
2014-11-14 11:25:02,643 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection,
connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2,
quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
baseZNode=/hbase
2014-11-14 11:25:02,645 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to ZooKeeper
ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
2014-11-14 11:25:02,645 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server ip-1010018.ec2.internal/1010019:2181. Will not
attempt to authenticate using SASL (unknown error)
2014-11-14 11:25:02,646 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to ip-1010018.ec2.internal/1010019:2181, initiating
session
2014-11-14 11:25:02,648 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server ip-1010018.ec2.internal/1010019:2181,
sessionid = 0x348c7017707236c, negotiated timeout = 60000
2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ZooKeeper: Session:
0x348c7017707236c closed
2014-11-14 11:25:02,703 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2014-11-14 11:25:30,713 INFO
org.apache.hadoop.hbase.master.handler.DispatchMergingRegionHandler: Skip
merging regions
TABLE_NAME,,1415915112497.7373f75181c71eb5061a6673cee15931.,
TABLE_NAME,\x02\xFA\xF0\x80\x00\x00\x01I\xAA\xD5\x87\xA8\x19\x99\x99\x99\x99\x99\x99\x90,1415910559217.43f4d3685d113d3ae18eea9f189de096.,
because region 7373f75181c71eb5061a6673cee15931 has merge qualifier
2014-11-14 11:25:41,383 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection,
connectString=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
sessionTimeout=60000 watcher=catalogtracker-on-hconnection-0x47d865f2,
quorum=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181,
baseZNode=/hbase
2014-11-14 11:25:41,384 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process
identifier=catalogtracker-on-hconnection-0x47d865f2 connecting to ZooKeeper
ensemble=ip-1010019.ec2.internal:2181,ip-1010017.ec2.internal:2181,ip-1010018.ec2.internal:2181
2014-11-14 11:25:41,384 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server ip-1010018.ec2.internal/1010019:2181. Will not
attempt to authenticate using SASL (unknown error)
2014-11-14 11:25:41,386 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to ip-1010018.ec2.internal/1010019:2181, initiating
session
2014-11-14 11:25:41,389 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server ip-1010018.ec2.internal/1010019:2181,
sessionid = 0x348c7017707236e, negotiated timeout = 60000
2014-11-14 11:25:41,397 INFO org.apache.zookeeper.ZooKeeper: Session:
0x348c7017707236e closed
2014-11-14 11:25:41,398 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
------------------------------------------------------------------------------------------------------------------------------------

Regards,
Shahab

On Fri, Nov 14, 2014 at 10:56 AM, Ted Yu <yu...@gmail.com> wrote:

> Looking at DispatchMergingRegionHandler, it does some check before
> initiating the merge.
> e.g.:
>
>       LOG.info("Skip merging regions " + region_a.getRegionNameAsString()
>
>           + ", " + region_b.getRegionNameAsString() + ", because region "
>
>           + (regionAHasMergeQualifier ? region_a.getEncodedName() :
> region_b
>
>               .getEncodedName()) + " has merge qualifier");
>
> Can you take a look at master log around the time merge request was issued
> to see if you can get some clue ?
>
> Cheers
>
> On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <sh...@gmail.com>
> wrote:
>
> > The documentation of online merge tool (merge_region) states that if we
> > forcibly merge regions (by setting the 3rd attribute as true) then it can
> > create overlapping regions. if this happens then will this render the
> > region or table unusable or it is just a performance hit? I mean how
> bigger
> > of a deal it is?
> >
> > Actually, we are merging regions using the programmatic API for this and
> > setting this flag ('forcible') as false. But for some tables (we haven't
> > figured out a pattern yet, data is still accessible), merge of regions do
> > not happen at all. Afterwards we tried with this flag = true, and it
> still
> > doesn't merge them.
> >
> > CDH 5.1.0
> > (Hbase is 0.98.1-cdh5.1.0)
> >
> > Regards,
> > Shahab
> >
>

Re: Forcibly merging regions

Posted by Ted Yu <yu...@gmail.com>.

Looking at DispatchMergingRegionHandler, it does some check before
initiating the merge.
e.g.:

      LOG.info("Skip merging regions " + region_a.getRegionNameAsString()

          + ", " + region_b.getRegionNameAsString() + ", because region "

          + (regionAHasMergeQualifier ? region_a.getEncodedName() : region_b

              .getEncodedName()) + " has merge qualifier");

Can you take a look at master log around the time merge request was issued
to see if you can get some clue ?

Cheers

On Fri, Nov 14, 2014 at 6:41 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> The documentation of online merge tool (merge_region) states that if we
> forcibly merge regions (by setting the 3rd attribute as true) then it can
> create overlapping regions. if this happens then will this render the
> region or table unusable or it is just a performance hit? I mean how bigger
> of a deal it is?
>
> Actually, we are merging regions using the programmatic API for this and
> setting this flag ('forcible') as false. But for some tables (we haven't
> figured out a pattern yet, data is still accessible), merge of regions do
> not happen at all. Afterwards we tried with this flag = true, and it still
> doesn't merge them.
>
> CDH 5.1.0
> (Hbase is 0.98.1-cdh5.1.0)
>
> Regards,
> Shahab
>

Re: Forcibly merging regions

Posted by Shahab Yunus <sh...@gmail.com>.

Related to this...can hbase.hregion.max.filesize setting prevent merging of
table regions? Or regions are merged anyway but they get split again during
compaction?

Regards,
Shahab

On Fri, Nov 14, 2014 at 9:41 AM, Shahab Yunus <sh...@gmail.com>
wrote:

> The documentation of online merge tool (merge_region) states that if we
> forcibly merge regions (by setting the 3rd attribute as true) then it can
> create overlapping regions. if this happens then will this render the
> region or table unusable or it is just a performance hit? I mean how bigger
> of a deal it is?
>
> Actually, we are merging regions using the programmatic API for this and
> setting this flag ('forcible') as false. But for some tables (we haven't
> figured out a pattern yet, data is still accessible), merge of regions do
> not happen at all. Afterwards we tried with this flag = true, and it still
> doesn't merge them.
>
> CDH 5.1.0
> (Hbase is 0.98.1-cdh5.1.0)
>
> Regards,
> Shahab
>