You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Lars George <la...@gmail.com> on 2016/09/28 14:32:37 UTC

Re: Merge and HMerge

Hey,

Sorry to resurrect this old thread, but working on the book update, I
came across the same today, i.e. we have Merge and HMerge. I tried and
Merge works fine now. It is also the only one of the two flagged as
being a tool. Should HMerge be removed? At least deprecated?

Cheers,
Lars


On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yu...@gmail.com> wrote:
>>> there is already an issue to do this but not revamp of these Merge
> classes
> I guess the issue is HBASE-1621
>
> On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net> wrote:
>
>> Yeah, can you file an issue Lars.  This stuff is ancient and needs to
>> be redone AND redone so we can do merging while table is online (there
>> is already an issue to do this but not revamp of these Merge classes).
>>  The unit tests for Merge are also all junit3 and do whacky stuff to
>> put up multiple regions.  This should be redone too (they are often
>> first thing broke when major change and putting them back together is
>> a headache since they do not follow the usual pattern).
>>
>> St.Ack
>>
>> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <la...@gmail.com>
>> wrote:
>> > Hi Ted,
>> >
>> > The log is from an earlier attempt, I tried this a few times. This is all
>> local, after rm'ing the /hbase. So the files are all pretty empty, but since
>> I put data in I was assuming it should work. Once you gotten into this
>> state, you also get funny error messages in the shell:
>> >
>> > hbase(main):001:0> list
>> > TABLE
>> > 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
>> org.apache.hadoop.hbase.ipc.HMasterInterface
>> >
>> > ERROR: undefined method `map' for nil:NilClass
>> >
>> > Here is some help for this command:
>> > List all tables in hbase. Optional regular expression parameter could
>> > be used to filter the output. Examples:
>> >
>> >  hbase> list
>> >  hbase> list 'abc.*'
>> >
>> >
>> > hbase(main):002:0>
>> >
>> > I am assuming this is collateral, but why? The UI works but the table is
>> gone too.
>> >
>> > Lars
>> >
>> > On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
>> >
>> >> There is TestMergeTool which tests Merge.
>> >>
>> >> From the log you provided, I got a little confused as why
>> >> 'testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.'
>> didn't
>> >> appear in your command line or the output from .META. scanning.
>> >>
>> >> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <la...@gmail.com>
>> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> These two seem both in a bit of a weird state: HMerge is scoped package
>> >>> local, therefore no one but the package can call the merge()
>> functions...
>> >>> and no one does that but the unit test. But it would be good to have
>> this on
>> >>> the CLI and shell as a command (and in the shell maybe with a
>> confirmation
>> >>> message?), but it is not available AFAIK.
>> >>>
>> >>> HMerge can merge regions of tables that are disabled. It also merges
>> all
>> >>> that qualify, i.e. where the merged region is less than or equal of
>> half the
>> >>> configured max file size.
>> >>>
>> >>> Merge on the other hand does have a main(), so can be invoked:
>> >>>
>> >>> $ hbase org.apache.hadoop.hbase.util.Merge
>> >>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
>> >>>
>> >>> Note how the help insinuates that you can use it as a tool, but that is
>> not
>> >>> correct. Also, it only merges two given regions, and the cluster must
>> be
>> >>> shut down (only the HBase daemons). So that is a step back.
>> >>>
>> >>> What is worse is that I cannot get it to work. I tried in the shell:
>> >>>
>> >>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS =>
>> >>> ['row-10','row-20','row-30','row-40','row-50']}
>> >>> 0 row(s) in 0.2640 seconds
>> >>>
>> >>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do put
>> >>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
>> >>> 0 row(s) in 1.0450 seconds
>> >>>
>> >>> hbase(main):003:0> flush 'testtable'
>> >>> 0 row(s) in 0.2000 seconds
>> >>>
>> >>> hbase(main):004:0> scan '.META.', { COLUMNS => ['info:regioninfo']}
>> >>> ROW                                  COLUMN+CELL
>> >>> testtable,,1309614509037.612d1e0112 column=info:regioninfo,
>> >>> timestamp=130...
>> >>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY => 'row-10'
>> >>> testtable,row-10,1309614509040.2fba column=info:regioninfo,
>> >>> timestamp=130...
>> >>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10', ENDKEY =>
>> >>> 'row-20'
>> >>> testtable,row-20,1309614509041.e7c1 column=info:regioninfo,
>> >>> timestamp=130...
>> >>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20', ENDKEY =>
>> >>> 'row-30'
>> >>> testtable,row-30,1309614509041.a9cd column=info:regioninfo,
>> >>> timestamp=130...
>> >>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30', ENDKEY =>
>> >>> 'row-40'
>> >>> testtable,row-40,1309614509041.d458 column=info:regioninfo,
>> >>> timestamp=130...
>> >>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40', ENDKEY =>
>> >>> 'row-50'
>> >>> testtable,row-50,1309614509041.74a5 column=info:regioninfo,
>> >>> timestamp=130...
>> >>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50', ENDKEY => ''
>> >>> 6 row(s) in 0.0440 seconds
>> >>>
>> >>> hbase(main):005:0> exit
>> >>>
>> >>> $ ./bin/stop-hbase.sh
>> >>>
>> >>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
>> >>> testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f982. \
>> >>> testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30ad8.
>> >>>
>> >>> But I get consistently errors:
>> >>>
>> >>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
>> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0. and
>> >>> testtable,row-30,1309613053987.3664920956c30ac5ff2a7726e4e6 in table
>> >>> testtable
>> >>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration: blocksize=32 MB,
>> >>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=1000ms
>> >>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
>> >>>
>> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_1309616449171/hlog.1309616449181
>> >>> 11/07/02 07:20:49 INFO wal.HLog: getNumCurrentReplicas--HDFS-826 not
>> >>> available; hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@25961581,
>> >>>
>> exception=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
>> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up tabledescriptor
>> >>> config now ...
>> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
>> -ROOT-,,0.70236052;
>> >>> next sequenceid=1
>> >>> info: null
>> >>> region1: [B@48fd918a
>> >>> region2: [B@7f5e2075
>> >>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
>> >>> java.io.IOException: Could not find meta region for
>> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.
>> >>>       at
>> >>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.java:211)
>> >>>       at org.apache.hadoop.hbase.util.Merge.run(Merge.java:111)
>> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >>>       at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386)
>> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up tabledescriptor
>> >>> config now ...
>> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
>> .META.,,1.1028785192;
>> >>> next sequenceid=1
>> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed -ROOT-,,0.70236052
>> >>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
>> >>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
>> >>> java.lang.NullPointerException
>> >>>       at
>> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:119)
>> >>>       at
>> >>>
>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(MetaUtils.java:229)
>> >>>       at
>> >>>
>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(MetaUtils.java:258)
>> >>>       at org.apache.hadoop.hbase.util.Merge.run(Merge.java:116)
>> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >>>       at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386)
>> >>>
>> >>> After which I most of the times have shot .META. with an error
>> >>>
>> >>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.master.HMaster:
>> Failed
>> >>> getting all descriptors
>> >>> java.io.FileNotFoundException: No status for
>> >>> hdfs://localhost:8020/hbase/.corrupt
>> >>>       at
>> >>>
>> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
>> >>>       at
>> >>>
>> org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
>> >>>       at
>> >>>
>> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
>> >>>       at
>> >>>
>> org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1429)
>> >>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >>>       at
>> >>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >>>       at
>> >>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >>>       at java.lang.reflect.Method.invoke(Method.java:597)
>> >>>       at
>> >>>
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:312)
>> >>>       at
>> >>>
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1065)
>> >>>
>> >>> Lars
>> >
>> >
>>

Re: Merge and HMerge

Posted by Ted Yu <yu...@gmail.com>.
Deprecating HMerge is fine.

On Wed, Sep 28, 2016 at 7:32 AM, Lars George <la...@gmail.com> wrote:

> Hey,
>
> Sorry to resurrect this old thread, but working on the book update, I
> came across the same today, i.e. we have Merge and HMerge. I tried and
> Merge works fine now. It is also the only one of the two flagged as
> being a tool. Should HMerge be removed? At least deprecated?
>
> Cheers,
> Lars
>
>
> On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yu...@gmail.com> wrote:
> >>> there is already an issue to do this but not revamp of these Merge
> > classes
> > I guess the issue is HBASE-1621
> >
> > On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net> wrote:
> >
> >> Yeah, can you file an issue Lars.  This stuff is ancient and needs to
> >> be redone AND redone so we can do merging while table is online (there
> >> is already an issue to do this but not revamp of these Merge classes).
> >>  The unit tests for Merge are also all junit3 and do whacky stuff to
> >> put up multiple regions.  This should be redone too (they are often
> >> first thing broke when major change and putting them back together is
> >> a headache since they do not follow the usual pattern).
> >>
> >> St.Ack
> >>
> >> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <la...@gmail.com>
> >> wrote:
> >> > Hi Ted,
> >> >
> >> > The log is from an earlier attempt, I tried this a few times. This is
> all
> >> local, after rm'ing the /hbase. So the files are all pretty empty, but
> since
> >> I put data in I was assuming it should work. Once you gotten into this
> >> state, you also get funny error messages in the shell:
> >> >
> >> > hbase(main):001:0> list
> >> > TABLE
> >> > 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
> >> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
> >> org.apache.hadoop.hbase.ipc.HMasterInterface
> >> >
> >> > ERROR: undefined method `map' for nil:NilClass
> >> >
> >> > Here is some help for this command:
> >> > List all tables in hbase. Optional regular expression parameter could
> >> > be used to filter the output. Examples:
> >> >
> >> >  hbase> list
> >> >  hbase> list 'abc.*'
> >> >
> >> >
> >> > hbase(main):002:0>
> >> >
> >> > I am assuming this is collateral, but why? The UI works but the table
> is
> >> gone too.
> >> >
> >> > Lars
> >> >
> >> > On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
> >> >
> >> >> There is TestMergeTool which tests Merge.
> >> >>
> >> >> From the log you provided, I got a little confused as why
> >> >> 'testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.'
> >> didn't
> >> >> appear in your command line or the output from .META. scanning.
> >> >>
> >> >> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <la...@gmail.com>
> >> wrote:
> >> >>
> >> >>> Hi,
> >> >>>
> >> >>> These two seem both in a bit of a weird state: HMerge is scoped
> package
> >> >>> local, therefore no one but the package can call the merge()
> >> functions...
> >> >>> and no one does that but the unit test. But it would be good to have
> >> this on
> >> >>> the CLI and shell as a command (and in the shell maybe with a
> >> confirmation
> >> >>> message?), but it is not available AFAIK.
> >> >>>
> >> >>> HMerge can merge regions of tables that are disabled. It also merges
> >> all
> >> >>> that qualify, i.e. where the merged region is less than or equal of
> >> half the
> >> >>> configured max file size.
> >> >>>
> >> >>> Merge on the other hand does have a main(), so can be invoked:
> >> >>>
> >> >>> $ hbase org.apache.hadoop.hbase.util.Merge
> >> >>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
> >> >>>
> >> >>> Note how the help insinuates that you can use it as a tool, but
> that is
> >> not
> >> >>> correct. Also, it only merges two given regions, and the cluster
> must
> >> be
> >> >>> shut down (only the HBase daemons). So that is a step back.
> >> >>>
> >> >>> What is worse is that I cannot get it to work. I tried in the shell:
> >> >>>
> >> >>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS =>
> >> >>> ['row-10','row-20','row-30','row-40','row-50']}
> >> >>> 0 row(s) in 0.2640 seconds
> >> >>>
> >> >>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do put
> >> >>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
> >> >>> 0 row(s) in 1.0450 seconds
> >> >>>
> >> >>> hbase(main):003:0> flush 'testtable'
> >> >>> 0 row(s) in 0.2000 seconds
> >> >>>
> >> >>> hbase(main):004:0> scan '.META.', { COLUMNS => ['info:regioninfo']}
> >> >>> ROW                                  COLUMN+CELL
> >> >>> testtable,,1309614509037.612d1e0112 column=info:regioninfo,
> >> >>> timestamp=130...
> >> >>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY =>
> 'row-10'
> >> >>> testtable,row-10,1309614509040.2fba column=info:regioninfo,
> >> >>> timestamp=130...
> >> >>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10', ENDKEY =>
> >> >>> 'row-20'
> >> >>> testtable,row-20,1309614509041.e7c1 column=info:regioninfo,
> >> >>> timestamp=130...
> >> >>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20', ENDKEY =>
> >> >>> 'row-30'
> >> >>> testtable,row-30,1309614509041.a9cd column=info:regioninfo,
> >> >>> timestamp=130...
> >> >>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30', ENDKEY =>
> >> >>> 'row-40'
> >> >>> testtable,row-40,1309614509041.d458 column=info:regioninfo,
> >> >>> timestamp=130...
> >> >>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40', ENDKEY =>
> >> >>> 'row-50'
> >> >>> testtable,row-50,1309614509041.74a5 column=info:regioninfo,
> >> >>> timestamp=130...
> >> >>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50', ENDKEY =>
> ''
> >> >>> 6 row(s) in 0.0440 seconds
> >> >>>
> >> >>> hbase(main):005:0> exit
> >> >>>
> >> >>> $ ./bin/stop-hbase.sh
> >> >>>
> >> >>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
> >> >>> testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f982. \
> >> >>> testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30ad8.
> >> >>>
> >> >>> But I get consistently errors:
> >> >>>
> >> >>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
> >> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.
> and
> >> >>> testtable,row-30,1309613053987.3664920956c30ac5ff2a7726e4e6 in
> table
> >> >>> testtable
> >> >>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration: blocksize=32
> MB,
> >> >>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=1000ms
> >> >>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
> >> >>>
> >> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_1309616449171/hlog.
> 1309616449181
> >> >>> 11/07/02 07:20:49 INFO wal.HLog: getNumCurrentReplicas--HDFS-826
> not
> >> >>> available; hdfs_out=org.apache.hadoop.fs.
> FSDataOutputStream@25961581,
> >> >>>
> >> exception=org.apache.hadoop.fs.ChecksumFileSystem$
> ChecksumFSOutputSummer.getNumCurrentReplicas()
> >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> tabledescriptor
> >> >>> config now ...
> >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> >> -ROOT-,,0.70236052;
> >> >>> next sequenceid=1
> >> >>> info: null
> >> >>> region1: [B@48fd918a
> >> >>> region2: [B@7f5e2075
> >> >>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
> >> >>> java.io.IOException: Could not find meta region for
> >> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.
> >> >>>       at
> >> >>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.java:211)
> >> >>>       at org.apache.hadoop.hbase.util.Merge.run(Merge.java:111)
> >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >> >>>       at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386)
> >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> tabledescriptor
> >> >>> config now ...
> >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> >> .META.,,1.1028785192;
> >> >>> next sequenceid=1
> >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed
> -ROOT-,,0.70236052
> >> >>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
> >> >>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
> >> >>> java.lang.NullPointerException
> >> >>>       at
> >> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:119)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> MetaUtils.java:229)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> MetaUtils.java:258)
> >> >>>       at org.apache.hadoop.hbase.util.Merge.run(Merge.java:116)
> >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >> >>>       at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386)
> >> >>>
> >> >>> After which I most of the times have shot .META. with an error
> >> >>>
> >> >>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.
> master.HMaster:
> >> Failed
> >> >>> getting all descriptors
> >> >>> java.io.FileNotFoundException: No status for
> >> >>> hdfs://localhost:8020/hbase/.corrupt
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(
> FSUtils.java:888)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.util.FSTableDescriptors.get(
> FSTableDescriptors.java:122)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(
> FSTableDescriptors.java:149)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.
> java:1429)
> >> >>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> >> >>>       at
> >> >>>
> >> sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:39)
> >> >>>       at
> >> >>>
> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:25)
> >> >>>       at java.lang.reflect.Method.invoke(Method.java:597)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
> WritableRpcEngine.java:312)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
> HBaseServer.java:1065)
> >> >>>
> >> >>> Lars
> >> >
> >> >
> >>
>

Re: Merge and HMerge

Posted by Apekshit Sharma <ap...@cloudera.com>.
+1, although kind of late since it's already done.
But great to see this 5+ years old issue finally resolved.

On Mon, Jan 16, 2017 at 9:24 PM, Stack <st...@duboce.net> wrote:

> On Sat, Jan 14, 2017 at 9:50 PM, Lars George <la...@gmail.com>
> wrote:
>
> > I think that makes sense. The tool with its custom code dates back to
> > where we had no built in version. I am all for removing all of the tools
> > and leave the API call only. That is the same for an admin then compared
> to
> > calling flush or split.
> >
> > No?
> >
> >
> Sounds good to me.
> St.Ack
>
>
>
> > Lars
> >
> > Sent from my iPhone
> >
> > On 15 Jan 2017, at 04:25, Stephen Jiang <sy...@gmail.com> wrote:
> >
> > >> If you remove the util.Merge tool, how then does an operator ask for a
> > merge
> > > in its absence?
> > >
> > > We have a shell command to merge region.  In the past, it calls the
> same
> > RS
> > > side code.  I don't think there is a need to have util.Merge (even if
> we
> > > really want, we can ask this utility to call HBaseAdmin.mergeRegions,
> > which
> > > is the same path from the merge command through 'hbase shell').
> > >
> > > Thanks
> > > Stephen
> > >
> > >> On Fri, Jan 13, 2017 at 11:29 PM, Stack <st...@duboce.net> wrote:
> > >>
> > >> On Fri, Jan 13, 2017 at 7:16 PM, Stephen Jiang <
> syuanjiangdev@gmail.com
> > >
> > >> wrote:
> > >>
> > >>> Revive this thread
> > >>>
> > >>> I am in the process of removing Region Server side merge (and split)
> > >>> transaction code in master branch; as now we have merge (and split)
> > >>> procedure(s) from master doing the same thing.
> > >>>
> > >>>
> > >> Good (Issue?)
> > >>
> > >>
> > >>> The Merge tool depends on RS-side merge code.  I'd like to use this
> > >> chance
> > >>> to remove the util.Merge tool.  This is for 2.0 and up releases only.
> > >>> Deprecation does not work here; as keeping the RS-side merge code
> would
> > >>> have duplicate logic in source code and make the new Assignment
> manager
> > >>> code more complicated.
> > >>>
> > >>>
> > >> Could util.Merge be changed to ask the Master run the merge (via
> AMv2)?
> > >>
> > >> If you remove the util.Merge tool, how then does an operator ask for a
> > >> merge in its absence?
> > >>
> > >> Thanks Stephen
> > >>
> > >> S
> > >>
> > >>
> > >>> Please let me know whether you have objection.
> > >>>
> > >>> Thanks
> > >>> Stephen
> > >>>
> > >>> PS.  I could deprecated HMerge code if anyone is really using it.  It
> > has
> > >>> its own logic and standalone (supposed to dangerously work offline
> and
> > >>> merge more than 2 regions - the util.Merge and shell not support
> these
> > >>> functionality for now).
> > >>>
> > >>> On Wed, Nov 16, 2016 at 11:04 AM, Enis Söztutar <en...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> @Appy what is not clear from above?
> > >>>>
> > >>>> I think we should get rid of both Merge and HMerge.
> > >>>>
> > >>>> We should not have any tool which will work in offline mode by going
> > >> over
> > >>>> the HDFS data. Seems very brittle to be broken when things get
> > changed.
> > >>>> Only use case I can think of is that somehow you end up with a lot
> of
> > >>>> regions and you cannot bring the cluster back up because of OOMs,
> etc
> > >> and
> > >>>> you have to reduce the number of regions in offline mode. However,
> we
> > >> did
> > >>>> not see this kind of thing in any of our customers for the last
> couple
> > >> of
> > >>>> years so far.
> > >>>>
> > >>>> I think we should seriously look into improving normalizer and
> > enabling
> > >>>> that by default for all the tables. Ideally, normalizer should be
> > >> running
> > >>>> much more frequently, and should be configured with higher-level
> goals
> > >>> and
> > >>>> heuristics. Like on average how many regions per node, etc and
> should
> > >> be
> > >>>> looking at the global state (like the balancer) to decide on split /
> > >>> merge
> > >>>> points.
> > >>>>
> > >>>> Enis
> > >>>>
> > >>>> On Wed, Nov 16, 2016 at 1:17 AM, Apekshit Sharma <appy@cloudera.com
> >
> > >>>> wrote:
> > >>>>
> > >>>>> bq. HMerge can merge multiple regions by going over the list of
> > >>>>> regions and checking
> > >>>>> their sizes.
> > >>>>> bq. But both of these tools (Merge and HMerge) are very dangerous
> > >>>>>
> > >>>>> I came across HMerge and it looks like dead code. Isn't referenced
> > >> from
> > >>>>> anywhere except one test. (This is what lars also pointed out in
> the
> > >>>> first
> > >>>>> email too).
> > >>>>> It would make perfect sense if it was a tool or was being
> referenced
> > >>> from
> > >>>>> somewhere, but with lack of either of that, am a bit confused here.
> > >>>>> @Enis, you seem to know everything about them, please educate me.
> > >>>>> Thanks
> > >>>>> - Appy
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar <
> enis.soz@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Merge has very limited usability singe it can do a single merge
> and
> > >>> can
> > >>>>>> only run when HBase is offline.
> > >>>>>> HMerge can merge multiple regions by going over the list of
> regions
> > >>> and
> > >>>>>> checking their sizes.
> > >>>>>> And of course we have the "supported" online merge which is the
> > >> shell
> > >>>>>> command.
> > >>>>>>
> > >>>>>> But both of these tools (Merge and HMerge) are very dangerous I
> > >>> think.
> > >>>> I
> > >>>>>> would say we should deprecate both to be replaced by the online
> > >>> merger
> > >>>>>> tool. We should not allow offline merge at all. I fail to see the
> > >>>> usecase
> > >>>>>> that you have to use an offline merge.
> > >>>>>>
> > >>>>>> Enis
> > >>>>>>
> > >>>>>> On Wed, Sep 28, 2016 at 7:32 AM, Lars George <
> > >> lars.george@gmail.com>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hey,
> > >>>>>>>
> > >>>>>>> Sorry to resurrect this old thread, but working on the book
> > >>> update, I
> > >>>>>>> came across the same today, i.e. we have Merge and HMerge. I
> > >> tried
> > >>>> and
> > >>>>>>> Merge works fine now. It is also the only one of the two flagged
> > >> as
> > >>>>>>> being a tool. Should HMerge be removed? At least deprecated?
> > >>>>>>>
> > >>>>>>> Cheers,
> > >>>>>>> Lars
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yu...@gmail.com>
> > >>> wrote:
> > >>>>>>>>>> there is already an issue to do this but not revamp of these
> > >>>> Merge
> > >>>>>>>> classes
> > >>>>>>>> I guess the issue is HBASE-1621
> > >>>>>>>>
> > >>>>>>>> On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net>
> > >> wrote:
> > >>>>>>>>
> > >>>>>>>>> Yeah, can you file an issue Lars.  This stuff is ancient and
> > >>> needs
> > >>>>> to
> > >>>>>>>>> be redone AND redone so we can do merging while table is
> > >> online
> > >>>>> (there
> > >>>>>>>>> is already an issue to do this but not revamp of these Merge
> > >>>>> classes).
> > >>>>>>>>> The unit tests for Merge are also all junit3 and do whacky
> > >>> stuff
> > >>>> to
> > >>>>>>>>> put up multiple regions.  This should be redone too (they are
> > >>>> often
> > >>>>>>>>> first thing broke when major change and putting them back
> > >>> together
> > >>>>> is
> > >>>>>>>>> a headache since they do not follow the usual pattern).
> > >>>>>>>>>
> > >>>>>>>>> St.Ack
> > >>>>>>>>>
> > >>>>>>>>> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <
> > >>>> lars.george@gmail.com
> > >>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>> Hi Ted,
> > >>>>>>>>>>
> > >>>>>>>>>> The log is from an earlier attempt, I tried this a few
> > >> times.
> > >>>> This
> > >>>>>> is
> > >>>>>>> all
> > >>>>>>>>> local, after rm'ing the /hbase. So the files are all pretty
> > >>> empty,
> > >>>>> but
> > >>>>>>> since
> > >>>>>>>>> I put data in I was assuming it should work. Once you gotten
> > >>> into
> > >>>>> this
> > >>>>>>>>> state, you also get funny error messages in the shell:
> > >>>>>>>>>>
> > >>>>>>>>>> hbase(main):001:0> list
> > >>>>>>>>>> TABLE
> > >>>>>>>>>> 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
> > >>>>>>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
> > >>>>>>>>> org.apache.hadoop.hbase.ipc.HMasterInterface
> > >>>>>>>>>>
> > >>>>>>>>>> ERROR: undefined method `map' for nil:NilClass
> > >>>>>>>>>>
> > >>>>>>>>>> Here is some help for this command:
> > >>>>>>>>>> List all tables in hbase. Optional regular expression
> > >>> parameter
> > >>>>>> could
> > >>>>>>>>>> be used to filter the output. Examples:
> > >>>>>>>>>>
> > >>>>>>>>>> hbase> list
> > >>>>>>>>>> hbase> list 'abc.*'
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> hbase(main):002:0>
> > >>>>>>>>>>
> > >>>>>>>>>> I am assuming this is collateral, but why? The UI works but
> > >>> the
> > >>>>>> table
> > >>>>>>> is
> > >>>>>>>>> gone too.
> > >>>>>>>>>>
> > >>>>>>>>>> Lars
> > >>>>>>>>>>
> > >>>>>>>>>>> On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> There is TestMergeTool which tests Merge.
> > >>>>>>>>>>>
> > >>>>>>>>>>> From the log you provided, I got a little confused as why
> > >>>>>>>>>>> 'testtable,row-20,1309613053987.
> > >>> 23a35ac696bdf4a8023dcc4c5b8419
> > >>>>> e0.'
> > >>>>>>>>> didn't
> > >>>>>>>>>>> appear in your command line or the output from .META.
> > >>> scanning.
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <
> > >>>>>> lars.george@gmail.com>
> > >>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> These two seem both in a bit of a weird state: HMerge is
> > >>>> scoped
> > >>>>>>> package
> > >>>>>>>>>>>> local, therefore no one but the package can call the
> > >> merge()
> > >>>>>>>>> functions...
> > >>>>>>>>>>>> and no one does that but the unit test. But it would be
> > >> good
> > >>>> to
> > >>>>>> have
> > >>>>>>>>> this on
> > >>>>>>>>>>>> the CLI and shell as a command (and in the shell maybe
> > >> with
> > >>> a
> > >>>>>>>>> confirmation
> > >>>>>>>>>>>> message?), but it is not available AFAIK.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> HMerge can merge regions of tables that are disabled. It
> > >>> also
> > >>>>>> merges
> > >>>>>>>>> all
> > >>>>>>>>>>>> that qualify, i.e. where the merged region is less than or
> > >>>> equal
> > >>>>>> of
> > >>>>>>>>> half the
> > >>>>>>>>>>>> configured max file size.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Merge on the other hand does have a main(), so can be
> > >>> invoked:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> $ hbase org.apache.hadoop.hbase.util.Merge
> > >>>>>>>>>>>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Note how the help insinuates that you can use it as a
> > >> tool,
> > >>>> but
> > >>>>>>> that is
> > >>>>>>>>> not
> > >>>>>>>>>>>> correct. Also, it only merges two given regions, and the
> > >>>> cluster
> > >>>>>>> must
> > >>>>>>>>> be
> > >>>>>>>>>>>> shut down (only the HBase daemons). So that is a step
> > >> back.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> What is worse is that I cannot get it to work. I tried in
> > >>> the
> > >>>>>> shell:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS
> > >>> =>
> > >>>>>>>>>>>> ['row-10','row-20','row-30','row-40','row-50']}
> > >>>>>>>>>>>> 0 row(s) in 0.2640 seconds
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9'
> > >> do
> > >>>> put
> > >>>>>>>>>>>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end
> > >> end
> > >>>>>>>>>>>> 0 row(s) in 1.0450 seconds
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> hbase(main):003:0> flush 'testtable'
> > >>>>>>>>>>>> 0 row(s) in 0.2000 seconds
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> hbase(main):004:0> scan '.META.', { COLUMNS =>
> > >>>>>> ['info:regioninfo']}
> > >>>>>>>>>>>> ROW                                  COLUMN+CELL
> > >>>>>>>>>>>> testtable,,1309614509037.612d1e0112
> > >> column=info:regioninfo,
> > >>>>>>>>>>>> timestamp=130...
> > >>>>>>>>>>>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY
> > >>> =>
> > >>>>>>> 'row-10'
> > >>>>>>>>>>>> testtable,row-10,1309614509040.2fba
> > >> column=info:regioninfo,
> > >>>>>>>>>>>> timestamp=130...
> > >>>>>>>>>>>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10',
> > >>>> ENDKEY
> > >>>>>> =>
> > >>>>>>>>>>>> 'row-20'
> > >>>>>>>>>>>> testtable,row-20,1309614509041.e7c1
> > >> column=info:regioninfo,
> > >>>>>>>>>>>> timestamp=130...
> > >>>>>>>>>>>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20',
> > >>>> ENDKEY
> > >>>>>> =>
> > >>>>>>>>>>>> 'row-30'
> > >>>>>>>>>>>> testtable,row-30,1309614509041.a9cd
> > >> column=info:regioninfo,
> > >>>>>>>>>>>> timestamp=130...
> > >>>>>>>>>>>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30',
> > >>>> ENDKEY
> > >>>>>> =>
> > >>>>>>>>>>>> 'row-40'
> > >>>>>>>>>>>> testtable,row-40,1309614509041.d458
> > >> column=info:regioninfo,
> > >>>>>>>>>>>> timestamp=130...
> > >>>>>>>>>>>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40',
> > >>>> ENDKEY
> > >>>>>> =>
> > >>>>>>>>>>>> 'row-50'
> > >>>>>>>>>>>> testtable,row-50,1309614509041.74a5
> > >> column=info:regioninfo,
> > >>>>>>>>>>>> timestamp=130...
> > >>>>>>>>>>>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50',
> > >>>> ENDKEY
> > >>>>>> =>
> > >>>>>>> ''
> > >>>>>>>>>>>> 6 row(s) in 0.0440 seconds
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> hbase(main):005:0> exit
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> $ ./bin/stop-hbase.sh
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
> > >>>>>>>>>>>> testtable,row-20,1309614509041.
> > >>> e7c16267eb30e147e5d988c63d40f9
> > >>>>> 82.
> > >>>>>> \
> > >>>>>>>>>>>> testtable,row-30,1309614509041.
> > >>> a9cde1cbc7d1a21b1aca2ac7fda30a
> > >>>>> d8.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> But I get consistently errors:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
> > >>>>>>>>>>>> testtable,row-20,1309613053987.
> > >>> 23a35ac696bdf4a8023dcc4c5b8419
> > >>>>> e0.
> > >>>>>>> and
> > >>>>>>>>>>>> testtable,row-30,1309613053987.
> > >> 3664920956c30ac5ff2a7726e4e6
> > >>>> in
> > >>>>>>> table
> > >>>>>>>>>>>> testtable
> > >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration:
> > >>>>> blocksize=32
> > >>>>>>> MB,
> > >>>>>>>>>>>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=
> > >>>> 1000ms
> > >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
> > >>>>>>>>>>>>
> > >>>>>>>>> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_
> > >>> 1309616449171/hlog.
> > >>>>>>> 1309616449181
> > >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog:
> > >>> getNumCurrentReplicas--HDFS-
> > >>>>> 826
> > >>>>>>> not
> > >>>>>>>>>>>> available; hdfs_out=org.apache.hadoop.fs.
> > >>>>>>> FSDataOutputStream@25961581,
> > >>>>>>>>>>>>
> > >>>>>>>>> exception=org.apache.hadoop.fs.ChecksumFileSystem$
> > >>>>>>> ChecksumFSOutputSummer.getNumCurrentReplicas()
> > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > >>>>>>> tabledescriptor
> > >>>>>>>>>>>> config now ...
> > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > >>>>>>>>> -ROOT-,,0.70236052;
> > >>>>>>>>>>>> next sequenceid=1
> > >>>>>>>>>>>> info: null
> > >>>>>>>>>>>> region1: [B@48fd918a
> > >>>>>>>>>>>> region2: [B@7f5e2075
> > >>>>>>>>>>>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
> > >>>>>>>>>>>> java.io.IOException: Could not find meta region for
> > >>>>>>>>>>>> testtable,row-20,1309613053987.
> > >>> 23a35ac696bdf4a8023dcc4c5b8419
> > >>>>> e0.
> > >>>>>>>>>>>>      at
> > >>>>>>>>>>>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.
> > >>>>>> java:211)
> > >>>>>>>>>>>>      at org.apache.hadoop.hbase.util.
> > >>>> Merge.run(Merge.java:111)
> > >>>>>>>>>>>>      at org.apache.hadoop.util.
> > >> ToolRunner.run(ToolRunner.
> > >>>>>> java:65)
> > >>>>>>>>>>>>      at org.apache.hadoop.hbase.util.
> > >>>>> Merge.main(Merge.java:386)
> > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > >>>>>>> tabledescriptor
> > >>>>>>>>>>>> config now ...
> > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > >>>>>>>>> .META.,,1.1028785192;
> > >>>>>>>>>>>> next sequenceid=1
> > >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed
> > >>>>>>> -ROOT-,,0.70236052
> > >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
> > >>>>>>>>>>>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
> > >>>>>>>>>>>> java.lang.NullPointerException
> > >>>>>>>>>>>>      at
> > >>>>>>>>> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:
> > >> 119)
> > >>>>>>>>>>>>      at
> > >>>>>>>>>>>>
> > >>>>>>>>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > >>>>>>> MetaUtils.java:229)
> > >>>>>>>>>>>>      at
> > >>>>>>>>>>>>
> > >>>>>>>>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > >>>>>>> MetaUtils.java:258)
> > >>>>>>>>>>>>      at org.apache.hadoop.hbase.util.
> > >>>> Merge.run(Merge.java:116)
> > >>>>>>>>>>>>      at org.apache.hadoop.util.
> > >> ToolRunner.run(ToolRunner.
> > >>>>>> java:65)
> > >>>>>>>>>>>>      at org.apache.hadoop.hbase.util.
> > >>>>> Merge.main(Merge.java:386)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> After which I most of the times have shot .META. with an
> > >>> error
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.
> > >>>>>>> master.HMaster:
> > >>>>>>>>> Failed
> > >>>>>>>>>>>> getting all descriptors
> > >>>>>>>>>>>> java.io.FileNotFoundException: No status for
> > >>>>>>>>>>>> hdfs://localhost:8020/hbase/.corrupt
> > >>>>>>>>>>>>      at
> > >>>>>>>>>>>>
> > >>>>>>>>> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(
> > >>>>>>> FSUtils.java:888)
> > >>>>>>>>>>>>      at
> > >>>>>>>>>>>>
> > >>>>>>>>> org.apache.hadoop.hbase.util.FSTableDescriptors.get(
> > >>>>>>> FSTableDescriptors.java:122)
> > >>>>>>>>>>>>      at
> > >>>>>>>>>>>>
> > >>>>>>>>> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(
> > >>>>>>> FSTableDescriptors.java:149)
> > >>>>>>>>>>>>      at
> > >>>>>>>>>>>>
> > >>>>>>>>> org.apache.hadoop.hbase.master.HMaster.
> > >>>>> getHTableDescriptors(HMaster.
> > >>>>>>> java:1429)
> > >>>>>>>>>>>>      at sun.reflect.NativeMethodAccessorImpl.
> > >>> invoke0(Native
> > >>>>>>> Method)
> > >>>>>>>>>>>>      at
> > >>>>>>>>>>>>
> > >>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(
> > >>>>>>> NativeMethodAccessorImpl.java:39)
> > >>>>>>>>>>>>      at
> > >>>>>>>>>>>>
> > >>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > >>>>>>> DelegatingMethodAccessorImpl.java:25)
> > >>>>>>>>>>>>      at java.lang.reflect.Method.invoke(Method.java:597)
> > >>>>>>>>>>>>      at
> > >>>>>>>>>>>>
> > >>>>>>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
> > >>>>>>> WritableRpcEngine.java:312)
> > >>>>>>>>>>>>      at
> > >>>>>>>>>>>>
> > >>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
> > >>>>>>> HBaseServer.java:1065)
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Lars
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>>
> > >>>>> -- Appy
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
>



-- 

-- Appy

Re: Merge and HMerge

Posted by Stack <st...@duboce.net>.
On Sat, Jan 14, 2017 at 9:50 PM, Lars George <la...@gmail.com> wrote:

> I think that makes sense. The tool with its custom code dates back to
> where we had no built in version. I am all for removing all of the tools
> and leave the API call only. That is the same for an admin then compared to
> calling flush or split.
>
> No?
>
>
Sounds good to me.
St.Ack



> Lars
>
> Sent from my iPhone
>
> On 15 Jan 2017, at 04:25, Stephen Jiang <sy...@gmail.com> wrote:
>
> >> If you remove the util.Merge tool, how then does an operator ask for a
> merge
> > in its absence?
> >
> > We have a shell command to merge region.  In the past, it calls the same
> RS
> > side code.  I don't think there is a need to have util.Merge (even if we
> > really want, we can ask this utility to call HBaseAdmin.mergeRegions,
> which
> > is the same path from the merge command through 'hbase shell').
> >
> > Thanks
> > Stephen
> >
> >> On Fri, Jan 13, 2017 at 11:29 PM, Stack <st...@duboce.net> wrote:
> >>
> >> On Fri, Jan 13, 2017 at 7:16 PM, Stephen Jiang <syuanjiangdev@gmail.com
> >
> >> wrote:
> >>
> >>> Revive this thread
> >>>
> >>> I am in the process of removing Region Server side merge (and split)
> >>> transaction code in master branch; as now we have merge (and split)
> >>> procedure(s) from master doing the same thing.
> >>>
> >>>
> >> Good (Issue?)
> >>
> >>
> >>> The Merge tool depends on RS-side merge code.  I'd like to use this
> >> chance
> >>> to remove the util.Merge tool.  This is for 2.0 and up releases only.
> >>> Deprecation does not work here; as keeping the RS-side merge code would
> >>> have duplicate logic in source code and make the new Assignment manager
> >>> code more complicated.
> >>>
> >>>
> >> Could util.Merge be changed to ask the Master run the merge (via AMv2)?
> >>
> >> If you remove the util.Merge tool, how then does an operator ask for a
> >> merge in its absence?
> >>
> >> Thanks Stephen
> >>
> >> S
> >>
> >>
> >>> Please let me know whether you have objection.
> >>>
> >>> Thanks
> >>> Stephen
> >>>
> >>> PS.  I could deprecated HMerge code if anyone is really using it.  It
> has
> >>> its own logic and standalone (supposed to dangerously work offline and
> >>> merge more than 2 regions - the util.Merge and shell not support these
> >>> functionality for now).
> >>>
> >>> On Wed, Nov 16, 2016 at 11:04 AM, Enis Söztutar <en...@gmail.com>
> >>> wrote:
> >>>
> >>>> @Appy what is not clear from above?
> >>>>
> >>>> I think we should get rid of both Merge and HMerge.
> >>>>
> >>>> We should not have any tool which will work in offline mode by going
> >> over
> >>>> the HDFS data. Seems very brittle to be broken when things get
> changed.
> >>>> Only use case I can think of is that somehow you end up with a lot of
> >>>> regions and you cannot bring the cluster back up because of OOMs, etc
> >> and
> >>>> you have to reduce the number of regions in offline mode. However, we
> >> did
> >>>> not see this kind of thing in any of our customers for the last couple
> >> of
> >>>> years so far.
> >>>>
> >>>> I think we should seriously look into improving normalizer and
> enabling
> >>>> that by default for all the tables. Ideally, normalizer should be
> >> running
> >>>> much more frequently, and should be configured with higher-level goals
> >>> and
> >>>> heuristics. Like on average how many regions per node, etc and should
> >> be
> >>>> looking at the global state (like the balancer) to decide on split /
> >>> merge
> >>>> points.
> >>>>
> >>>> Enis
> >>>>
> >>>> On Wed, Nov 16, 2016 at 1:17 AM, Apekshit Sharma <ap...@cloudera.com>
> >>>> wrote:
> >>>>
> >>>>> bq. HMerge can merge multiple regions by going over the list of
> >>>>> regions and checking
> >>>>> their sizes.
> >>>>> bq. But both of these tools (Merge and HMerge) are very dangerous
> >>>>>
> >>>>> I came across HMerge and it looks like dead code. Isn't referenced
> >> from
> >>>>> anywhere except one test. (This is what lars also pointed out in the
> >>>> first
> >>>>> email too).
> >>>>> It would make perfect sense if it was a tool or was being referenced
> >>> from
> >>>>> somewhere, but with lack of either of that, am a bit confused here.
> >>>>> @Enis, you seem to know everything about them, please educate me.
> >>>>> Thanks
> >>>>> - Appy
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar <en...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Merge has very limited usability singe it can do a single merge and
> >>> can
> >>>>>> only run when HBase is offline.
> >>>>>> HMerge can merge multiple regions by going over the list of regions
> >>> and
> >>>>>> checking their sizes.
> >>>>>> And of course we have the "supported" online merge which is the
> >> shell
> >>>>>> command.
> >>>>>>
> >>>>>> But both of these tools (Merge and HMerge) are very dangerous I
> >>> think.
> >>>> I
> >>>>>> would say we should deprecate both to be replaced by the online
> >>> merger
> >>>>>> tool. We should not allow offline merge at all. I fail to see the
> >>>> usecase
> >>>>>> that you have to use an offline merge.
> >>>>>>
> >>>>>> Enis
> >>>>>>
> >>>>>> On Wed, Sep 28, 2016 at 7:32 AM, Lars George <
> >> lars.george@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hey,
> >>>>>>>
> >>>>>>> Sorry to resurrect this old thread, but working on the book
> >>> update, I
> >>>>>>> came across the same today, i.e. we have Merge and HMerge. I
> >> tried
> >>>> and
> >>>>>>> Merge works fine now. It is also the only one of the two flagged
> >> as
> >>>>>>> being a tool. Should HMerge be removed? At least deprecated?
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Lars
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yu...@gmail.com>
> >>> wrote:
> >>>>>>>>>> there is already an issue to do this but not revamp of these
> >>>> Merge
> >>>>>>>> classes
> >>>>>>>> I guess the issue is HBASE-1621
> >>>>>>>>
> >>>>>>>> On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net>
> >> wrote:
> >>>>>>>>
> >>>>>>>>> Yeah, can you file an issue Lars.  This stuff is ancient and
> >>> needs
> >>>>> to
> >>>>>>>>> be redone AND redone so we can do merging while table is
> >> online
> >>>>> (there
> >>>>>>>>> is already an issue to do this but not revamp of these Merge
> >>>>> classes).
> >>>>>>>>> The unit tests for Merge are also all junit3 and do whacky
> >>> stuff
> >>>> to
> >>>>>>>>> put up multiple regions.  This should be redone too (they are
> >>>> often
> >>>>>>>>> first thing broke when major change and putting them back
> >>> together
> >>>>> is
> >>>>>>>>> a headache since they do not follow the usual pattern).
> >>>>>>>>>
> >>>>>>>>> St.Ack
> >>>>>>>>>
> >>>>>>>>> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <
> >>>> lars.george@gmail.com
> >>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>> Hi Ted,
> >>>>>>>>>>
> >>>>>>>>>> The log is from an earlier attempt, I tried this a few
> >> times.
> >>>> This
> >>>>>> is
> >>>>>>> all
> >>>>>>>>> local, after rm'ing the /hbase. So the files are all pretty
> >>> empty,
> >>>>> but
> >>>>>>> since
> >>>>>>>>> I put data in I was assuming it should work. Once you gotten
> >>> into
> >>>>> this
> >>>>>>>>> state, you also get funny error messages in the shell:
> >>>>>>>>>>
> >>>>>>>>>> hbase(main):001:0> list
> >>>>>>>>>> TABLE
> >>>>>>>>>> 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
> >>>>>>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
> >>>>>>>>> org.apache.hadoop.hbase.ipc.HMasterInterface
> >>>>>>>>>>
> >>>>>>>>>> ERROR: undefined method `map' for nil:NilClass
> >>>>>>>>>>
> >>>>>>>>>> Here is some help for this command:
> >>>>>>>>>> List all tables in hbase. Optional regular expression
> >>> parameter
> >>>>>> could
> >>>>>>>>>> be used to filter the output. Examples:
> >>>>>>>>>>
> >>>>>>>>>> hbase> list
> >>>>>>>>>> hbase> list 'abc.*'
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> hbase(main):002:0>
> >>>>>>>>>>
> >>>>>>>>>> I am assuming this is collateral, but why? The UI works but
> >>> the
> >>>>>> table
> >>>>>>> is
> >>>>>>>>> gone too.
> >>>>>>>>>>
> >>>>>>>>>> Lars
> >>>>>>>>>>
> >>>>>>>>>>> On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> There is TestMergeTool which tests Merge.
> >>>>>>>>>>>
> >>>>>>>>>>> From the log you provided, I got a little confused as why
> >>>>>>>>>>> 'testtable,row-20,1309613053987.
> >>> 23a35ac696bdf4a8023dcc4c5b8419
> >>>>> e0.'
> >>>>>>>>> didn't
> >>>>>>>>>>> appear in your command line or the output from .META.
> >>> scanning.
> >>>>>>>>>>>
> >>>>>>>>>>> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <
> >>>>>> lars.george@gmail.com>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> These two seem both in a bit of a weird state: HMerge is
> >>>> scoped
> >>>>>>> package
> >>>>>>>>>>>> local, therefore no one but the package can call the
> >> merge()
> >>>>>>>>> functions...
> >>>>>>>>>>>> and no one does that but the unit test. But it would be
> >> good
> >>>> to
> >>>>>> have
> >>>>>>>>> this on
> >>>>>>>>>>>> the CLI and shell as a command (and in the shell maybe
> >> with
> >>> a
> >>>>>>>>> confirmation
> >>>>>>>>>>>> message?), but it is not available AFAIK.
> >>>>>>>>>>>>
> >>>>>>>>>>>> HMerge can merge regions of tables that are disabled. It
> >>> also
> >>>>>> merges
> >>>>>>>>> all
> >>>>>>>>>>>> that qualify, i.e. where the merged region is less than or
> >>>> equal
> >>>>>> of
> >>>>>>>>> half the
> >>>>>>>>>>>> configured max file size.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Merge on the other hand does have a main(), so can be
> >>> invoked:
> >>>>>>>>>>>>
> >>>>>>>>>>>> $ hbase org.apache.hadoop.hbase.util.Merge
> >>>>>>>>>>>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Note how the help insinuates that you can use it as a
> >> tool,
> >>>> but
> >>>>>>> that is
> >>>>>>>>> not
> >>>>>>>>>>>> correct. Also, it only merges two given regions, and the
> >>>> cluster
> >>>>>>> must
> >>>>>>>>> be
> >>>>>>>>>>>> shut down (only the HBase daemons). So that is a step
> >> back.
> >>>>>>>>>>>>
> >>>>>>>>>>>> What is worse is that I cannot get it to work. I tried in
> >>> the
> >>>>>> shell:
> >>>>>>>>>>>>
> >>>>>>>>>>>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS
> >>> =>
> >>>>>>>>>>>> ['row-10','row-20','row-30','row-40','row-50']}
> >>>>>>>>>>>> 0 row(s) in 0.2640 seconds
> >>>>>>>>>>>>
> >>>>>>>>>>>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9'
> >> do
> >>>> put
> >>>>>>>>>>>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end
> >> end
> >>>>>>>>>>>> 0 row(s) in 1.0450 seconds
> >>>>>>>>>>>>
> >>>>>>>>>>>> hbase(main):003:0> flush 'testtable'
> >>>>>>>>>>>> 0 row(s) in 0.2000 seconds
> >>>>>>>>>>>>
> >>>>>>>>>>>> hbase(main):004:0> scan '.META.', { COLUMNS =>
> >>>>>> ['info:regioninfo']}
> >>>>>>>>>>>> ROW                                  COLUMN+CELL
> >>>>>>>>>>>> testtable,,1309614509037.612d1e0112
> >> column=info:regioninfo,
> >>>>>>>>>>>> timestamp=130...
> >>>>>>>>>>>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY
> >>> =>
> >>>>>>> 'row-10'
> >>>>>>>>>>>> testtable,row-10,1309614509040.2fba
> >> column=info:regioninfo,
> >>>>>>>>>>>> timestamp=130...
> >>>>>>>>>>>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10',
> >>>> ENDKEY
> >>>>>> =>
> >>>>>>>>>>>> 'row-20'
> >>>>>>>>>>>> testtable,row-20,1309614509041.e7c1
> >> column=info:regioninfo,
> >>>>>>>>>>>> timestamp=130...
> >>>>>>>>>>>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20',
> >>>> ENDKEY
> >>>>>> =>
> >>>>>>>>>>>> 'row-30'
> >>>>>>>>>>>> testtable,row-30,1309614509041.a9cd
> >> column=info:regioninfo,
> >>>>>>>>>>>> timestamp=130...
> >>>>>>>>>>>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30',
> >>>> ENDKEY
> >>>>>> =>
> >>>>>>>>>>>> 'row-40'
> >>>>>>>>>>>> testtable,row-40,1309614509041.d458
> >> column=info:regioninfo,
> >>>>>>>>>>>> timestamp=130...
> >>>>>>>>>>>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40',
> >>>> ENDKEY
> >>>>>> =>
> >>>>>>>>>>>> 'row-50'
> >>>>>>>>>>>> testtable,row-50,1309614509041.74a5
> >> column=info:regioninfo,
> >>>>>>>>>>>> timestamp=130...
> >>>>>>>>>>>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50',
> >>>> ENDKEY
> >>>>>> =>
> >>>>>>> ''
> >>>>>>>>>>>> 6 row(s) in 0.0440 seconds
> >>>>>>>>>>>>
> >>>>>>>>>>>> hbase(main):005:0> exit
> >>>>>>>>>>>>
> >>>>>>>>>>>> $ ./bin/stop-hbase.sh
> >>>>>>>>>>>>
> >>>>>>>>>>>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
> >>>>>>>>>>>> testtable,row-20,1309614509041.
> >>> e7c16267eb30e147e5d988c63d40f9
> >>>>> 82.
> >>>>>> \
> >>>>>>>>>>>> testtable,row-30,1309614509041.
> >>> a9cde1cbc7d1a21b1aca2ac7fda30a
> >>>>> d8.
> >>>>>>>>>>>>
> >>>>>>>>>>>> But I get consistently errors:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
> >>>>>>>>>>>> testtable,row-20,1309613053987.
> >>> 23a35ac696bdf4a8023dcc4c5b8419
> >>>>> e0.
> >>>>>>> and
> >>>>>>>>>>>> testtable,row-30,1309613053987.
> >> 3664920956c30ac5ff2a7726e4e6
> >>>> in
> >>>>>>> table
> >>>>>>>>>>>> testtable
> >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration:
> >>>>> blocksize=32
> >>>>>>> MB,
> >>>>>>>>>>>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=
> >>>> 1000ms
> >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
> >>>>>>>>>>>>
> >>>>>>>>> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_
> >>> 1309616449171/hlog.
> >>>>>>> 1309616449181
> >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog:
> >>> getNumCurrentReplicas--HDFS-
> >>>>> 826
> >>>>>>> not
> >>>>>>>>>>>> available; hdfs_out=org.apache.hadoop.fs.
> >>>>>>> FSDataOutputStream@25961581,
> >>>>>>>>>>>>
> >>>>>>>>> exception=org.apache.hadoop.fs.ChecksumFileSystem$
> >>>>>>> ChecksumFSOutputSummer.getNumCurrentReplicas()
> >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> >>>>>>> tabledescriptor
> >>>>>>>>>>>> config now ...
> >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> >>>>>>>>> -ROOT-,,0.70236052;
> >>>>>>>>>>>> next sequenceid=1
> >>>>>>>>>>>> info: null
> >>>>>>>>>>>> region1: [B@48fd918a
> >>>>>>>>>>>> region2: [B@7f5e2075
> >>>>>>>>>>>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
> >>>>>>>>>>>> java.io.IOException: Could not find meta region for
> >>>>>>>>>>>> testtable,row-20,1309613053987.
> >>> 23a35ac696bdf4a8023dcc4c5b8419
> >>>>> e0.
> >>>>>>>>>>>>      at
> >>>>>>>>>>>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.
> >>>>>> java:211)
> >>>>>>>>>>>>      at org.apache.hadoop.hbase.util.
> >>>> Merge.run(Merge.java:111)
> >>>>>>>>>>>>      at org.apache.hadoop.util.
> >> ToolRunner.run(ToolRunner.
> >>>>>> java:65)
> >>>>>>>>>>>>      at org.apache.hadoop.hbase.util.
> >>>>> Merge.main(Merge.java:386)
> >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> >>>>>>> tabledescriptor
> >>>>>>>>>>>> config now ...
> >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> >>>>>>>>> .META.,,1.1028785192;
> >>>>>>>>>>>> next sequenceid=1
> >>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed
> >>>>>>> -ROOT-,,0.70236052
> >>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
> >>>>>>>>>>>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
> >>>>>>>>>>>> java.lang.NullPointerException
> >>>>>>>>>>>>      at
> >>>>>>>>> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:
> >> 119)
> >>>>>>>>>>>>      at
> >>>>>>>>>>>>
> >>>>>>>>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> >>>>>>> MetaUtils.java:229)
> >>>>>>>>>>>>      at
> >>>>>>>>>>>>
> >>>>>>>>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> >>>>>>> MetaUtils.java:258)
> >>>>>>>>>>>>      at org.apache.hadoop.hbase.util.
> >>>> Merge.run(Merge.java:116)
> >>>>>>>>>>>>      at org.apache.hadoop.util.
> >> ToolRunner.run(ToolRunner.
> >>>>>> java:65)
> >>>>>>>>>>>>      at org.apache.hadoop.hbase.util.
> >>>>> Merge.main(Merge.java:386)
> >>>>>>>>>>>>
> >>>>>>>>>>>> After which I most of the times have shot .META. with an
> >>> error
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.
> >>>>>>> master.HMaster:
> >>>>>>>>> Failed
> >>>>>>>>>>>> getting all descriptors
> >>>>>>>>>>>> java.io.FileNotFoundException: No status for
> >>>>>>>>>>>> hdfs://localhost:8020/hbase/.corrupt
> >>>>>>>>>>>>      at
> >>>>>>>>>>>>
> >>>>>>>>> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(
> >>>>>>> FSUtils.java:888)
> >>>>>>>>>>>>      at
> >>>>>>>>>>>>
> >>>>>>>>> org.apache.hadoop.hbase.util.FSTableDescriptors.get(
> >>>>>>> FSTableDescriptors.java:122)
> >>>>>>>>>>>>      at
> >>>>>>>>>>>>
> >>>>>>>>> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(
> >>>>>>> FSTableDescriptors.java:149)
> >>>>>>>>>>>>      at
> >>>>>>>>>>>>
> >>>>>>>>> org.apache.hadoop.hbase.master.HMaster.
> >>>>> getHTableDescriptors(HMaster.
> >>>>>>> java:1429)
> >>>>>>>>>>>>      at sun.reflect.NativeMethodAccessorImpl.
> >>> invoke0(Native
> >>>>>>> Method)
> >>>>>>>>>>>>      at
> >>>>>>>>>>>>
> >>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(
> >>>>>>> NativeMethodAccessorImpl.java:39)
> >>>>>>>>>>>>      at
> >>>>>>>>>>>>
> >>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> >>>>>>> DelegatingMethodAccessorImpl.java:25)
> >>>>>>>>>>>>      at java.lang.reflect.Method.invoke(Method.java:597)
> >>>>>>>>>>>>      at
> >>>>>>>>>>>>
> >>>>>>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
> >>>>>>> WritableRpcEngine.java:312)
> >>>>>>>>>>>>      at
> >>>>>>>>>>>>
> >>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
> >>>>>>> HBaseServer.java:1065)
> >>>>>>>>>>>>
> >>>>>>>>>>>> Lars
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> -- Appy
> >>>>>
> >>>>
> >>>
> >>
>

Re: Merge and HMerge

Posted by Lars George <la...@gmail.com>.
I think that makes sense. The tool with its custom code dates back to where we had no built in version. I am all for removing all of the tools and leave the API call only. That is the same for an admin then compared to calling flush or split. 

No?

Lars

Sent from my iPhone

On 15 Jan 2017, at 04:25, Stephen Jiang <sy...@gmail.com> wrote:

>> If you remove the util.Merge tool, how then does an operator ask for a merge
> in its absence?
> 
> We have a shell command to merge region.  In the past, it calls the same RS
> side code.  I don't think there is a need to have util.Merge (even if we
> really want, we can ask this utility to call HBaseAdmin.mergeRegions, which
> is the same path from the merge command through 'hbase shell').
> 
> Thanks
> Stephen
> 
>> On Fri, Jan 13, 2017 at 11:29 PM, Stack <st...@duboce.net> wrote:
>> 
>> On Fri, Jan 13, 2017 at 7:16 PM, Stephen Jiang <sy...@gmail.com>
>> wrote:
>> 
>>> Revive this thread
>>> 
>>> I am in the process of removing Region Server side merge (and split)
>>> transaction code in master branch; as now we have merge (and split)
>>> procedure(s) from master doing the same thing.
>>> 
>>> 
>> Good (Issue?)
>> 
>> 
>>> The Merge tool depends on RS-side merge code.  I'd like to use this
>> chance
>>> to remove the util.Merge tool.  This is for 2.0 and up releases only.
>>> Deprecation does not work here; as keeping the RS-side merge code would
>>> have duplicate logic in source code and make the new Assignment manager
>>> code more complicated.
>>> 
>>> 
>> Could util.Merge be changed to ask the Master run the merge (via AMv2)?
>> 
>> If you remove the util.Merge tool, how then does an operator ask for a
>> merge in its absence?
>> 
>> Thanks Stephen
>> 
>> S
>> 
>> 
>>> Please let me know whether you have objection.
>>> 
>>> Thanks
>>> Stephen
>>> 
>>> PS.  I could deprecated HMerge code if anyone is really using it.  It has
>>> its own logic and standalone (supposed to dangerously work offline and
>>> merge more than 2 regions - the util.Merge and shell not support these
>>> functionality for now).
>>> 
>>> On Wed, Nov 16, 2016 at 11:04 AM, Enis Söztutar <en...@gmail.com>
>>> wrote:
>>> 
>>>> @Appy what is not clear from above?
>>>> 
>>>> I think we should get rid of both Merge and HMerge.
>>>> 
>>>> We should not have any tool which will work in offline mode by going
>> over
>>>> the HDFS data. Seems very brittle to be broken when things get changed.
>>>> Only use case I can think of is that somehow you end up with a lot of
>>>> regions and you cannot bring the cluster back up because of OOMs, etc
>> and
>>>> you have to reduce the number of regions in offline mode. However, we
>> did
>>>> not see this kind of thing in any of our customers for the last couple
>> of
>>>> years so far.
>>>> 
>>>> I think we should seriously look into improving normalizer and enabling
>>>> that by default for all the tables. Ideally, normalizer should be
>> running
>>>> much more frequently, and should be configured with higher-level goals
>>> and
>>>> heuristics. Like on average how many regions per node, etc and should
>> be
>>>> looking at the global state (like the balancer) to decide on split /
>>> merge
>>>> points.
>>>> 
>>>> Enis
>>>> 
>>>> On Wed, Nov 16, 2016 at 1:17 AM, Apekshit Sharma <ap...@cloudera.com>
>>>> wrote:
>>>> 
>>>>> bq. HMerge can merge multiple regions by going over the list of
>>>>> regions and checking
>>>>> their sizes.
>>>>> bq. But both of these tools (Merge and HMerge) are very dangerous
>>>>> 
>>>>> I came across HMerge and it looks like dead code. Isn't referenced
>> from
>>>>> anywhere except one test. (This is what lars also pointed out in the
>>>> first
>>>>> email too).
>>>>> It would make perfect sense if it was a tool or was being referenced
>>> from
>>>>> somewhere, but with lack of either of that, am a bit confused here.
>>>>> @Enis, you seem to know everything about them, please educate me.
>>>>> Thanks
>>>>> - Appy
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar <en...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Merge has very limited usability singe it can do a single merge and
>>> can
>>>>>> only run when HBase is offline.
>>>>>> HMerge can merge multiple regions by going over the list of regions
>>> and
>>>>>> checking their sizes.
>>>>>> And of course we have the "supported" online merge which is the
>> shell
>>>>>> command.
>>>>>> 
>>>>>> But both of these tools (Merge and HMerge) are very dangerous I
>>> think.
>>>> I
>>>>>> would say we should deprecate both to be replaced by the online
>>> merger
>>>>>> tool. We should not allow offline merge at all. I fail to see the
>>>> usecase
>>>>>> that you have to use an offline merge.
>>>>>> 
>>>>>> Enis
>>>>>> 
>>>>>> On Wed, Sep 28, 2016 at 7:32 AM, Lars George <
>> lars.george@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hey,
>>>>>>> 
>>>>>>> Sorry to resurrect this old thread, but working on the book
>>> update, I
>>>>>>> came across the same today, i.e. we have Merge and HMerge. I
>> tried
>>>> and
>>>>>>> Merge works fine now. It is also the only one of the two flagged
>> as
>>>>>>> being a tool. Should HMerge be removed? At least deprecated?
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Lars
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yu...@gmail.com>
>>> wrote:
>>>>>>>>>> there is already an issue to do this but not revamp of these
>>>> Merge
>>>>>>>> classes
>>>>>>>> I guess the issue is HBASE-1621
>>>>>>>> 
>>>>>>>> On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net>
>> wrote:
>>>>>>>> 
>>>>>>>>> Yeah, can you file an issue Lars.  This stuff is ancient and
>>> needs
>>>>> to
>>>>>>>>> be redone AND redone so we can do merging while table is
>> online
>>>>> (there
>>>>>>>>> is already an issue to do this but not revamp of these Merge
>>>>> classes).
>>>>>>>>> The unit tests for Merge are also all junit3 and do whacky
>>> stuff
>>>> to
>>>>>>>>> put up multiple regions.  This should be redone too (they are
>>>> often
>>>>>>>>> first thing broke when major change and putting them back
>>> together
>>>>> is
>>>>>>>>> a headache since they do not follow the usual pattern).
>>>>>>>>> 
>>>>>>>>> St.Ack
>>>>>>>>> 
>>>>>>>>> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <
>>>> lars.george@gmail.com
>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>>> Hi Ted,
>>>>>>>>>> 
>>>>>>>>>> The log is from an earlier attempt, I tried this a few
>> times.
>>>> This
>>>>>> is
>>>>>>> all
>>>>>>>>> local, after rm'ing the /hbase. So the files are all pretty
>>> empty,
>>>>> but
>>>>>>> since
>>>>>>>>> I put data in I was assuming it should work. Once you gotten
>>> into
>>>>> this
>>>>>>>>> state, you also get funny error messages in the shell:
>>>>>>>>>> 
>>>>>>>>>> hbase(main):001:0> list
>>>>>>>>>> TABLE
>>>>>>>>>> 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
>>>>>>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
>>>>>>>>> org.apache.hadoop.hbase.ipc.HMasterInterface
>>>>>>>>>> 
>>>>>>>>>> ERROR: undefined method `map' for nil:NilClass
>>>>>>>>>> 
>>>>>>>>>> Here is some help for this command:
>>>>>>>>>> List all tables in hbase. Optional regular expression
>>> parameter
>>>>>> could
>>>>>>>>>> be used to filter the output. Examples:
>>>>>>>>>> 
>>>>>>>>>> hbase> list
>>>>>>>>>> hbase> list 'abc.*'
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> hbase(main):002:0>
>>>>>>>>>> 
>>>>>>>>>> I am assuming this is collateral, but why? The UI works but
>>> the
>>>>>> table
>>>>>>> is
>>>>>>>>> gone too.
>>>>>>>>>> 
>>>>>>>>>> Lars
>>>>>>>>>> 
>>>>>>>>>>> On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
>>>>>>>>>>> 
>>>>>>>>>>> There is TestMergeTool which tests Merge.
>>>>>>>>>>> 
>>>>>>>>>>> From the log you provided, I got a little confused as why
>>>>>>>>>>> 'testtable,row-20,1309613053987.
>>> 23a35ac696bdf4a8023dcc4c5b8419
>>>>> e0.'
>>>>>>>>> didn't
>>>>>>>>>>> appear in your command line or the output from .META.
>>> scanning.
>>>>>>>>>>> 
>>>>>>>>>>> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <
>>>>>> lars.george@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> These two seem both in a bit of a weird state: HMerge is
>>>> scoped
>>>>>>> package
>>>>>>>>>>>> local, therefore no one but the package can call the
>> merge()
>>>>>>>>> functions...
>>>>>>>>>>>> and no one does that but the unit test. But it would be
>> good
>>>> to
>>>>>> have
>>>>>>>>> this on
>>>>>>>>>>>> the CLI and shell as a command (and in the shell maybe
>> with
>>> a
>>>>>>>>> confirmation
>>>>>>>>>>>> message?), but it is not available AFAIK.
>>>>>>>>>>>> 
>>>>>>>>>>>> HMerge can merge regions of tables that are disabled. It
>>> also
>>>>>> merges
>>>>>>>>> all
>>>>>>>>>>>> that qualify, i.e. where the merged region is less than or
>>>> equal
>>>>>> of
>>>>>>>>> half the
>>>>>>>>>>>> configured max file size.
>>>>>>>>>>>> 
>>>>>>>>>>>> Merge on the other hand does have a main(), so can be
>>> invoked:
>>>>>>>>>>>> 
>>>>>>>>>>>> $ hbase org.apache.hadoop.hbase.util.Merge
>>>>>>>>>>>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
>>>>>>>>>>>> 
>>>>>>>>>>>> Note how the help insinuates that you can use it as a
>> tool,
>>>> but
>>>>>>> that is
>>>>>>>>> not
>>>>>>>>>>>> correct. Also, it only merges two given regions, and the
>>>> cluster
>>>>>>> must
>>>>>>>>> be
>>>>>>>>>>>> shut down (only the HBase daemons). So that is a step
>> back.
>>>>>>>>>>>> 
>>>>>>>>>>>> What is worse is that I cannot get it to work. I tried in
>>> the
>>>>>> shell:
>>>>>>>>>>>> 
>>>>>>>>>>>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS
>>> =>
>>>>>>>>>>>> ['row-10','row-20','row-30','row-40','row-50']}
>>>>>>>>>>>> 0 row(s) in 0.2640 seconds
>>>>>>>>>>>> 
>>>>>>>>>>>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9'
>> do
>>>> put
>>>>>>>>>>>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end
>> end
>>>>>>>>>>>> 0 row(s) in 1.0450 seconds
>>>>>>>>>>>> 
>>>>>>>>>>>> hbase(main):003:0> flush 'testtable'
>>>>>>>>>>>> 0 row(s) in 0.2000 seconds
>>>>>>>>>>>> 
>>>>>>>>>>>> hbase(main):004:0> scan '.META.', { COLUMNS =>
>>>>>> ['info:regioninfo']}
>>>>>>>>>>>> ROW                                  COLUMN+CELL
>>>>>>>>>>>> testtable,,1309614509037.612d1e0112
>> column=info:regioninfo,
>>>>>>>>>>>> timestamp=130...
>>>>>>>>>>>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY
>>> =>
>>>>>>> 'row-10'
>>>>>>>>>>>> testtable,row-10,1309614509040.2fba
>> column=info:regioninfo,
>>>>>>>>>>>> timestamp=130...
>>>>>>>>>>>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10',
>>>> ENDKEY
>>>>>> =>
>>>>>>>>>>>> 'row-20'
>>>>>>>>>>>> testtable,row-20,1309614509041.e7c1
>> column=info:regioninfo,
>>>>>>>>>>>> timestamp=130...
>>>>>>>>>>>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20',
>>>> ENDKEY
>>>>>> =>
>>>>>>>>>>>> 'row-30'
>>>>>>>>>>>> testtable,row-30,1309614509041.a9cd
>> column=info:regioninfo,
>>>>>>>>>>>> timestamp=130...
>>>>>>>>>>>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30',
>>>> ENDKEY
>>>>>> =>
>>>>>>>>>>>> 'row-40'
>>>>>>>>>>>> testtable,row-40,1309614509041.d458
>> column=info:regioninfo,
>>>>>>>>>>>> timestamp=130...
>>>>>>>>>>>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40',
>>>> ENDKEY
>>>>>> =>
>>>>>>>>>>>> 'row-50'
>>>>>>>>>>>> testtable,row-50,1309614509041.74a5
>> column=info:regioninfo,
>>>>>>>>>>>> timestamp=130...
>>>>>>>>>>>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50',
>>>> ENDKEY
>>>>>> =>
>>>>>>> ''
>>>>>>>>>>>> 6 row(s) in 0.0440 seconds
>>>>>>>>>>>> 
>>>>>>>>>>>> hbase(main):005:0> exit
>>>>>>>>>>>> 
>>>>>>>>>>>> $ ./bin/stop-hbase.sh
>>>>>>>>>>>> 
>>>>>>>>>>>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
>>>>>>>>>>>> testtable,row-20,1309614509041.
>>> e7c16267eb30e147e5d988c63d40f9
>>>>> 82.
>>>>>> \
>>>>>>>>>>>> testtable,row-30,1309614509041.
>>> a9cde1cbc7d1a21b1aca2ac7fda30a
>>>>> d8.
>>>>>>>>>>>> 
>>>>>>>>>>>> But I get consistently errors:
>>>>>>>>>>>> 
>>>>>>>>>>>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
>>>>>>>>>>>> testtable,row-20,1309613053987.
>>> 23a35ac696bdf4a8023dcc4c5b8419
>>>>> e0.
>>>>>>> and
>>>>>>>>>>>> testtable,row-30,1309613053987.
>> 3664920956c30ac5ff2a7726e4e6
>>>> in
>>>>>>> table
>>>>>>>>>>>> testtable
>>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration:
>>>>> blocksize=32
>>>>>>> MB,
>>>>>>>>>>>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=
>>>> 1000ms
>>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
>>>>>>>>>>>> 
>>>>>>>>> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_
>>> 1309616449171/hlog.
>>>>>>> 1309616449181
>>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog:
>>> getNumCurrentReplicas--HDFS-
>>>>> 826
>>>>>>> not
>>>>>>>>>>>> available; hdfs_out=org.apache.hadoop.fs.
>>>>>>> FSDataOutputStream@25961581,
>>>>>>>>>>>> 
>>>>>>>>> exception=org.apache.hadoop.fs.ChecksumFileSystem$
>>>>>>> ChecksumFSOutputSummer.getNumCurrentReplicas()
>>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
>>>>>>> tabledescriptor
>>>>>>>>>>>> config now ...
>>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
>>>>>>>>> -ROOT-,,0.70236052;
>>>>>>>>>>>> next sequenceid=1
>>>>>>>>>>>> info: null
>>>>>>>>>>>> region1: [B@48fd918a
>>>>>>>>>>>> region2: [B@7f5e2075
>>>>>>>>>>>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
>>>>>>>>>>>> java.io.IOException: Could not find meta region for
>>>>>>>>>>>> testtable,row-20,1309613053987.
>>> 23a35ac696bdf4a8023dcc4c5b8419
>>>>> e0.
>>>>>>>>>>>>      at
>>>>>>>>>>>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.
>>>>>> java:211)
>>>>>>>>>>>>      at org.apache.hadoop.hbase.util.
>>>> Merge.run(Merge.java:111)
>>>>>>>>>>>>      at org.apache.hadoop.util.
>> ToolRunner.run(ToolRunner.
>>>>>> java:65)
>>>>>>>>>>>>      at org.apache.hadoop.hbase.util.
>>>>> Merge.main(Merge.java:386)
>>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
>>>>>>> tabledescriptor
>>>>>>>>>>>> config now ...
>>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
>>>>>>>>> .META.,,1.1028785192;
>>>>>>>>>>>> next sequenceid=1
>>>>>>>>>>>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed
>>>>>>> -ROOT-,,0.70236052
>>>>>>>>>>>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
>>>>>>>>>>>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
>>>>>>>>>>>> java.lang.NullPointerException
>>>>>>>>>>>>      at
>>>>>>>>> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:
>> 119)
>>>>>>>>>>>>      at
>>>>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
>>>>>>> MetaUtils.java:229)
>>>>>>>>>>>>      at
>>>>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
>>>>>>> MetaUtils.java:258)
>>>>>>>>>>>>      at org.apache.hadoop.hbase.util.
>>>> Merge.run(Merge.java:116)
>>>>>>>>>>>>      at org.apache.hadoop.util.
>> ToolRunner.run(ToolRunner.
>>>>>> java:65)
>>>>>>>>>>>>      at org.apache.hadoop.hbase.util.
>>>>> Merge.main(Merge.java:386)
>>>>>>>>>>>> 
>>>>>>>>>>>> After which I most of the times have shot .META. with an
>>> error
>>>>>>>>>>>> 
>>>>>>>>>>>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.
>>>>>>> master.HMaster:
>>>>>>>>> Failed
>>>>>>>>>>>> getting all descriptors
>>>>>>>>>>>> java.io.FileNotFoundException: No status for
>>>>>>>>>>>> hdfs://localhost:8020/hbase/.corrupt
>>>>>>>>>>>>      at
>>>>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(
>>>>>>> FSUtils.java:888)
>>>>>>>>>>>>      at
>>>>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.util.FSTableDescriptors.get(
>>>>>>> FSTableDescriptors.java:122)
>>>>>>>>>>>>      at
>>>>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(
>>>>>>> FSTableDescriptors.java:149)
>>>>>>>>>>>>      at
>>>>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.master.HMaster.
>>>>> getHTableDescriptors(HMaster.
>>>>>>> java:1429)
>>>>>>>>>>>>      at sun.reflect.NativeMethodAccessorImpl.
>>> invoke0(Native
>>>>>>> Method)
>>>>>>>>>>>>      at
>>>>>>>>>>>> 
>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(
>>>>>>> NativeMethodAccessorImpl.java:39)
>>>>>>>>>>>>      at
>>>>>>>>>>>> 
>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(
>>>>>>> DelegatingMethodAccessorImpl.java:25)
>>>>>>>>>>>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>>>>>      at
>>>>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
>>>>>>> WritableRpcEngine.java:312)
>>>>>>>>>>>>      at
>>>>>>>>>>>> 
>>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
>>>>>>> HBaseServer.java:1065)
>>>>>>>>>>>> 
>>>>>>>>>>>> Lars
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> -- Appy
>>>>> 
>>>> 
>>> 
>> 

Re: Merge and HMerge

Posted by Stephen Jiang <sy...@gmail.com>.
>If you remove the util.Merge tool, how then does an operator ask for a merge
in its absence?

We have a shell command to merge region.  In the past, it calls the same RS
side code.  I don't think there is a need to have util.Merge (even if we
really want, we can ask this utility to call HBaseAdmin.mergeRegions, which
is the same path from the merge command through 'hbase shell').

Thanks
Stephen

On Fri, Jan 13, 2017 at 11:29 PM, Stack <st...@duboce.net> wrote:

> On Fri, Jan 13, 2017 at 7:16 PM, Stephen Jiang <sy...@gmail.com>
> wrote:
>
> > Revive this thread
> >
> > I am in the process of removing Region Server side merge (and split)
> > transaction code in master branch; as now we have merge (and split)
> > procedure(s) from master doing the same thing.
> >
> >
> Good (Issue?)
>
>
> > The Merge tool depends on RS-side merge code.  I'd like to use this
> chance
> > to remove the util.Merge tool.  This is for 2.0 and up releases only.
> > Deprecation does not work here; as keeping the RS-side merge code would
> > have duplicate logic in source code and make the new Assignment manager
> > code more complicated.
> >
> >
> Could util.Merge be changed to ask the Master run the merge (via AMv2)?
>
> If you remove the util.Merge tool, how then does an operator ask for a
> merge in its absence?
>
> Thanks Stephen
>
> S
>
>
> > Please let me know whether you have objection.
> >
> > Thanks
> > Stephen
> >
> > PS.  I could deprecated HMerge code if anyone is really using it.  It has
> > its own logic and standalone (supposed to dangerously work offline and
> > merge more than 2 regions - the util.Merge and shell not support these
> > functionality for now).
> >
> > On Wed, Nov 16, 2016 at 11:04 AM, Enis Söztutar <en...@gmail.com>
> > wrote:
> >
> > > @Appy what is not clear from above?
> > >
> > > I think we should get rid of both Merge and HMerge.
> > >
> > > We should not have any tool which will work in offline mode by going
> over
> > > the HDFS data. Seems very brittle to be broken when things get changed.
> > > Only use case I can think of is that somehow you end up with a lot of
> > > regions and you cannot bring the cluster back up because of OOMs, etc
> and
> > > you have to reduce the number of regions in offline mode. However, we
> did
> > > not see this kind of thing in any of our customers for the last couple
> of
> > > years so far.
> > >
> > > I think we should seriously look into improving normalizer and enabling
> > > that by default for all the tables. Ideally, normalizer should be
> running
> > > much more frequently, and should be configured with higher-level goals
> > and
> > > heuristics. Like on average how many regions per node, etc and should
> be
> > > looking at the global state (like the balancer) to decide on split /
> > merge
> > > points.
> > >
> > > Enis
> > >
> > > On Wed, Nov 16, 2016 at 1:17 AM, Apekshit Sharma <ap...@cloudera.com>
> > > wrote:
> > >
> > > > bq. HMerge can merge multiple regions by going over the list of
> > > > regions and checking
> > > > their sizes.
> > > > bq. But both of these tools (Merge and HMerge) are very dangerous
> > > >
> > > > I came across HMerge and it looks like dead code. Isn't referenced
> from
> > > > anywhere except one test. (This is what lars also pointed out in the
> > > first
> > > > email too).
> > > > It would make perfect sense if it was a tool or was being referenced
> > from
> > > > somewhere, but with lack of either of that, am a bit confused here.
> > > > @Enis, you seem to know everything about them, please educate me.
> > > > Thanks
> > > > - Appy
> > > >
> > > >
> > > >
> > > > On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar <en...@gmail.com>
> > > > wrote:
> > > >
> > > > > Merge has very limited usability singe it can do a single merge and
> > can
> > > > > only run when HBase is offline.
> > > > > HMerge can merge multiple regions by going over the list of regions
> > and
> > > > > checking their sizes.
> > > > > And of course we have the "supported" online merge which is the
> shell
> > > > > command.
> > > > >
> > > > > But both of these tools (Merge and HMerge) are very dangerous I
> > think.
> > > I
> > > > > would say we should deprecate both to be replaced by the online
> > merger
> > > > > tool. We should not allow offline merge at all. I fail to see the
> > > usecase
> > > > > that you have to use an offline merge.
> > > > >
> > > > > Enis
> > > > >
> > > > > On Wed, Sep 28, 2016 at 7:32 AM, Lars George <
> lars.george@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hey,
> > > > > >
> > > > > > Sorry to resurrect this old thread, but working on the book
> > update, I
> > > > > > came across the same today, i.e. we have Merge and HMerge. I
> tried
> > > and
> > > > > > Merge works fine now. It is also the only one of the two flagged
> as
> > > > > > being a tool. Should HMerge be removed? At least deprecated?
> > > > > >
> > > > > > Cheers,
> > > > > > Lars
> > > > > >
> > > > > >
> > > > > > On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > > > >>> there is already an issue to do this but not revamp of these
> > > Merge
> > > > > > > classes
> > > > > > > I guess the issue is HBASE-1621
> > > > > > >
> > > > > > > On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net>
> wrote:
> > > > > > >
> > > > > > >> Yeah, can you file an issue Lars.  This stuff is ancient and
> > needs
> > > > to
> > > > > > >> be redone AND redone so we can do merging while table is
> online
> > > > (there
> > > > > > >> is already an issue to do this but not revamp of these Merge
> > > > classes).
> > > > > > >>  The unit tests for Merge are also all junit3 and do whacky
> > stuff
> > > to
> > > > > > >> put up multiple regions.  This should be redone too (they are
> > > often
> > > > > > >> first thing broke when major change and putting them back
> > together
> > > > is
> > > > > > >> a headache since they do not follow the usual pattern).
> > > > > > >>
> > > > > > >> St.Ack
> > > > > > >>
> > > > > > >> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <
> > > lars.george@gmail.com
> > > > >
> > > > > > >> wrote:
> > > > > > >> > Hi Ted,
> > > > > > >> >
> > > > > > >> > The log is from an earlier attempt, I tried this a few
> times.
> > > This
> > > > > is
> > > > > > all
> > > > > > >> local, after rm'ing the /hbase. So the files are all pretty
> > empty,
> > > > but
> > > > > > since
> > > > > > >> I put data in I was assuming it should work. Once you gotten
> > into
> > > > this
> > > > > > >> state, you also get funny error messages in the shell:
> > > > > > >> >
> > > > > > >> > hbase(main):001:0> list
> > > > > > >> > TABLE
> > > > > > >> > 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
> > > > > > >> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
> > > > > > >> org.apache.hadoop.hbase.ipc.HMasterInterface
> > > > > > >> >
> > > > > > >> > ERROR: undefined method `map' for nil:NilClass
> > > > > > >> >
> > > > > > >> > Here is some help for this command:
> > > > > > >> > List all tables in hbase. Optional regular expression
> > parameter
> > > > > could
> > > > > > >> > be used to filter the output. Examples:
> > > > > > >> >
> > > > > > >> >  hbase> list
> > > > > > >> >  hbase> list 'abc.*'
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > hbase(main):002:0>
> > > > > > >> >
> > > > > > >> > I am assuming this is collateral, but why? The UI works but
> > the
> > > > > table
> > > > > > is
> > > > > > >> gone too.
> > > > > > >> >
> > > > > > >> > Lars
> > > > > > >> >
> > > > > > >> > On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
> > > > > > >> >
> > > > > > >> >> There is TestMergeTool which tests Merge.
> > > > > > >> >>
> > > > > > >> >> From the log you provided, I got a little confused as why
> > > > > > >> >> 'testtable,row-20,1309613053987.
> > 23a35ac696bdf4a8023dcc4c5b8419
> > > > e0.'
> > > > > > >> didn't
> > > > > > >> >> appear in your command line or the output from .META.
> > scanning.
> > > > > > >> >>
> > > > > > >> >> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <
> > > > > lars.george@gmail.com>
> > > > > > >> wrote:
> > > > > > >> >>
> > > > > > >> >>> Hi,
> > > > > > >> >>>
> > > > > > >> >>> These two seem both in a bit of a weird state: HMerge is
> > > scoped
> > > > > > package
> > > > > > >> >>> local, therefore no one but the package can call the
> merge()
> > > > > > >> functions...
> > > > > > >> >>> and no one does that but the unit test. But it would be
> good
> > > to
> > > > > have
> > > > > > >> this on
> > > > > > >> >>> the CLI and shell as a command (and in the shell maybe
> with
> > a
> > > > > > >> confirmation
> > > > > > >> >>> message?), but it is not available AFAIK.
> > > > > > >> >>>
> > > > > > >> >>> HMerge can merge regions of tables that are disabled. It
> > also
> > > > > merges
> > > > > > >> all
> > > > > > >> >>> that qualify, i.e. where the merged region is less than or
> > > equal
> > > > > of
> > > > > > >> half the
> > > > > > >> >>> configured max file size.
> > > > > > >> >>>
> > > > > > >> >>> Merge on the other hand does have a main(), so can be
> > invoked:
> > > > > > >> >>>
> > > > > > >> >>> $ hbase org.apache.hadoop.hbase.util.Merge
> > > > > > >> >>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
> > > > > > >> >>>
> > > > > > >> >>> Note how the help insinuates that you can use it as a
> tool,
> > > but
> > > > > > that is
> > > > > > >> not
> > > > > > >> >>> correct. Also, it only merges two given regions, and the
> > > cluster
> > > > > > must
> > > > > > >> be
> > > > > > >> >>> shut down (only the HBase daemons). So that is a step
> back.
> > > > > > >> >>>
> > > > > > >> >>> What is worse is that I cannot get it to work. I tried in
> > the
> > > > > shell:
> > > > > > >> >>>
> > > > > > >> >>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS
> > =>
> > > > > > >> >>> ['row-10','row-20','row-30','row-40','row-50']}
> > > > > > >> >>> 0 row(s) in 0.2640 seconds
> > > > > > >> >>>
> > > > > > >> >>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9'
> do
> > > put
> > > > > > >> >>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end
> end
> > > > > > >> >>> 0 row(s) in 1.0450 seconds
> > > > > > >> >>>
> > > > > > >> >>> hbase(main):003:0> flush 'testtable'
> > > > > > >> >>> 0 row(s) in 0.2000 seconds
> > > > > > >> >>>
> > > > > > >> >>> hbase(main):004:0> scan '.META.', { COLUMNS =>
> > > > > ['info:regioninfo']}
> > > > > > >> >>> ROW                                  COLUMN+CELL
> > > > > > >> >>> testtable,,1309614509037.612d1e0112
> column=info:regioninfo,
> > > > > > >> >>> timestamp=130...
> > > > > > >> >>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY
> > =>
> > > > > > 'row-10'
> > > > > > >> >>> testtable,row-10,1309614509040.2fba
> column=info:regioninfo,
> > > > > > >> >>> timestamp=130...
> > > > > > >> >>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10',
> > > ENDKEY
> > > > > =>
> > > > > > >> >>> 'row-20'
> > > > > > >> >>> testtable,row-20,1309614509041.e7c1
> column=info:regioninfo,
> > > > > > >> >>> timestamp=130...
> > > > > > >> >>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20',
> > > ENDKEY
> > > > > =>
> > > > > > >> >>> 'row-30'
> > > > > > >> >>> testtable,row-30,1309614509041.a9cd
> column=info:regioninfo,
> > > > > > >> >>> timestamp=130...
> > > > > > >> >>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30',
> > > ENDKEY
> > > > > =>
> > > > > > >> >>> 'row-40'
> > > > > > >> >>> testtable,row-40,1309614509041.d458
> column=info:regioninfo,
> > > > > > >> >>> timestamp=130...
> > > > > > >> >>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40',
> > > ENDKEY
> > > > > =>
> > > > > > >> >>> 'row-50'
> > > > > > >> >>> testtable,row-50,1309614509041.74a5
> column=info:regioninfo,
> > > > > > >> >>> timestamp=130...
> > > > > > >> >>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50',
> > > ENDKEY
> > > > > =>
> > > > > > ''
> > > > > > >> >>> 6 row(s) in 0.0440 seconds
> > > > > > >> >>>
> > > > > > >> >>> hbase(main):005:0> exit
> > > > > > >> >>>
> > > > > > >> >>> $ ./bin/stop-hbase.sh
> > > > > > >> >>>
> > > > > > >> >>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
> > > > > > >> >>> testtable,row-20,1309614509041.
> > e7c16267eb30e147e5d988c63d40f9
> > > > 82.
> > > > > \
> > > > > > >> >>> testtable,row-30,1309614509041.
> > a9cde1cbc7d1a21b1aca2ac7fda30a
> > > > d8.
> > > > > > >> >>>
> > > > > > >> >>> But I get consistently errors:
> > > > > > >> >>>
> > > > > > >> >>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
> > > > > > >> >>> testtable,row-20,1309613053987.
> > 23a35ac696bdf4a8023dcc4c5b8419
> > > > e0.
> > > > > > and
> > > > > > >> >>> testtable,row-30,1309613053987.
> 3664920956c30ac5ff2a7726e4e6
> > > in
> > > > > > table
> > > > > > >> >>> testtable
> > > > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration:
> > > > blocksize=32
> > > > > > MB,
> > > > > > >> >>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=
> > > 1000ms
> > > > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
> > > > > > >> >>>
> > > > > > >> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_
> > 1309616449171/hlog.
> > > > > > 1309616449181
> > > > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog:
> > getNumCurrentReplicas--HDFS-
> > > > 826
> > > > > > not
> > > > > > >> >>> available; hdfs_out=org.apache.hadoop.fs.
> > > > > > FSDataOutputStream@25961581,
> > > > > > >> >>>
> > > > > > >> exception=org.apache.hadoop.fs.ChecksumFileSystem$
> > > > > > ChecksumFSOutputSummer.getNumCurrentReplicas()
> > > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > > > > > tabledescriptor
> > > > > > >> >>> config now ...
> > > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > > > > > >> -ROOT-,,0.70236052;
> > > > > > >> >>> next sequenceid=1
> > > > > > >> >>> info: null
> > > > > > >> >>> region1: [B@48fd918a
> > > > > > >> >>> region2: [B@7f5e2075
> > > > > > >> >>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
> > > > > > >> >>> java.io.IOException: Could not find meta region for
> > > > > > >> >>> testtable,row-20,1309613053987.
> > 23a35ac696bdf4a8023dcc4c5b8419
> > > > e0.
> > > > > > >> >>>       at
> > > > > > >> >>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.
> > > > > java:211)
> > > > > > >> >>>       at org.apache.hadoop.hbase.util.
> > > Merge.run(Merge.java:111)
> > > > > > >> >>>       at org.apache.hadoop.util.
> ToolRunner.run(ToolRunner.
> > > > > java:65)
> > > > > > >> >>>       at org.apache.hadoop.hbase.util.
> > > > Merge.main(Merge.java:386)
> > > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > > > > > tabledescriptor
> > > > > > >> >>> config now ...
> > > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > > > > > >> .META.,,1.1028785192;
> > > > > > >> >>> next sequenceid=1
> > > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed
> > > > > > -ROOT-,,0.70236052
> > > > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
> > > > > > >> >>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
> > > > > > >> >>> java.lang.NullPointerException
> > > > > > >> >>>       at
> > > > > > >> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:
> 119)
> > > > > > >> >>>       at
> > > > > > >> >>>
> > > > > > >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > > > > > MetaUtils.java:229)
> > > > > > >> >>>       at
> > > > > > >> >>>
> > > > > > >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > > > > > MetaUtils.java:258)
> > > > > > >> >>>       at org.apache.hadoop.hbase.util.
> > > Merge.run(Merge.java:116)
> > > > > > >> >>>       at org.apache.hadoop.util.
> ToolRunner.run(ToolRunner.
> > > > > java:65)
> > > > > > >> >>>       at org.apache.hadoop.hbase.util.
> > > > Merge.main(Merge.java:386)
> > > > > > >> >>>
> > > > > > >> >>> After which I most of the times have shot .META. with an
> > error
> > > > > > >> >>>
> > > > > > >> >>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.
> > > > > > master.HMaster:
> > > > > > >> Failed
> > > > > > >> >>> getting all descriptors
> > > > > > >> >>> java.io.FileNotFoundException: No status for
> > > > > > >> >>> hdfs://localhost:8020/hbase/.corrupt
> > > > > > >> >>>       at
> > > > > > >> >>>
> > > > > > >> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(
> > > > > > FSUtils.java:888)
> > > > > > >> >>>       at
> > > > > > >> >>>
> > > > > > >> org.apache.hadoop.hbase.util.FSTableDescriptors.get(
> > > > > > FSTableDescriptors.java:122)
> > > > > > >> >>>       at
> > > > > > >> >>>
> > > > > > >> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(
> > > > > > FSTableDescriptors.java:149)
> > > > > > >> >>>       at
> > > > > > >> >>>
> > > > > > >> org.apache.hadoop.hbase.master.HMaster.
> > > > getHTableDescriptors(HMaster.
> > > > > > java:1429)
> > > > > > >> >>>       at sun.reflect.NativeMethodAccessorImpl.
> > invoke0(Native
> > > > > > Method)
> > > > > > >> >>>       at
> > > > > > >> >>>
> > > > > > >> sun.reflect.NativeMethodAccessorImpl.invoke(
> > > > > > NativeMethodAccessorImpl.java:39)
> > > > > > >> >>>       at
> > > > > > >> >>>
> > > > > > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > > > > > DelegatingMethodAccessorImpl.java:25)
> > > > > > >> >>>       at java.lang.reflect.Method.invoke(Method.java:597)
> > > > > > >> >>>       at
> > > > > > >> >>>
> > > > > > >> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
> > > > > > WritableRpcEngine.java:312)
> > > > > > >> >>>       at
> > > > > > >> >>>
> > > > > > >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
> > > > > > HBaseServer.java:1065)
> > > > > > >> >>>
> > > > > > >> >>> Lars
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > -- Appy
> > > >
> > >
> >
>

Re: Merge and HMerge

Posted by Stack <st...@duboce.net>.
On Fri, Jan 13, 2017 at 7:16 PM, Stephen Jiang <sy...@gmail.com>
wrote:

> Revive this thread
>
> I am in the process of removing Region Server side merge (and split)
> transaction code in master branch; as now we have merge (and split)
> procedure(s) from master doing the same thing.
>
>
Good (Issue?)


> The Merge tool depends on RS-side merge code.  I'd like to use this chance
> to remove the util.Merge tool.  This is for 2.0 and up releases only.
> Deprecation does not work here; as keeping the RS-side merge code would
> have duplicate logic in source code and make the new Assignment manager
> code more complicated.
>
>
Could util.Merge be changed to ask the Master run the merge (via AMv2)?

If you remove the util.Merge tool, how then does an operator ask for a
merge in its absence?

Thanks Stephen

S


> Please let me know whether you have objection.
>
> Thanks
> Stephen
>
> PS.  I could deprecated HMerge code if anyone is really using it.  It has
> its own logic and standalone (supposed to dangerously work offline and
> merge more than 2 regions - the util.Merge and shell not support these
> functionality for now).
>
> On Wed, Nov 16, 2016 at 11:04 AM, Enis Söztutar <en...@gmail.com>
> wrote:
>
> > @Appy what is not clear from above?
> >
> > I think we should get rid of both Merge and HMerge.
> >
> > We should not have any tool which will work in offline mode by going over
> > the HDFS data. Seems very brittle to be broken when things get changed.
> > Only use case I can think of is that somehow you end up with a lot of
> > regions and you cannot bring the cluster back up because of OOMs, etc and
> > you have to reduce the number of regions in offline mode. However, we did
> > not see this kind of thing in any of our customers for the last couple of
> > years so far.
> >
> > I think we should seriously look into improving normalizer and enabling
> > that by default for all the tables. Ideally, normalizer should be running
> > much more frequently, and should be configured with higher-level goals
> and
> > heuristics. Like on average how many regions per node, etc and should be
> > looking at the global state (like the balancer) to decide on split /
> merge
> > points.
> >
> > Enis
> >
> > On Wed, Nov 16, 2016 at 1:17 AM, Apekshit Sharma <ap...@cloudera.com>
> > wrote:
> >
> > > bq. HMerge can merge multiple regions by going over the list of
> > > regions and checking
> > > their sizes.
> > > bq. But both of these tools (Merge and HMerge) are very dangerous
> > >
> > > I came across HMerge and it looks like dead code. Isn't referenced from
> > > anywhere except one test. (This is what lars also pointed out in the
> > first
> > > email too).
> > > It would make perfect sense if it was a tool or was being referenced
> from
> > > somewhere, but with lack of either of that, am a bit confused here.
> > > @Enis, you seem to know everything about them, please educate me.
> > > Thanks
> > > - Appy
> > >
> > >
> > >
> > > On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar <en...@gmail.com>
> > > wrote:
> > >
> > > > Merge has very limited usability singe it can do a single merge and
> can
> > > > only run when HBase is offline.
> > > > HMerge can merge multiple regions by going over the list of regions
> and
> > > > checking their sizes.
> > > > And of course we have the "supported" online merge which is the shell
> > > > command.
> > > >
> > > > But both of these tools (Merge and HMerge) are very dangerous I
> think.
> > I
> > > > would say we should deprecate both to be replaced by the online
> merger
> > > > tool. We should not allow offline merge at all. I fail to see the
> > usecase
> > > > that you have to use an offline merge.
> > > >
> > > > Enis
> > > >
> > > > On Wed, Sep 28, 2016 at 7:32 AM, Lars George <la...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hey,
> > > > >
> > > > > Sorry to resurrect this old thread, but working on the book
> update, I
> > > > > came across the same today, i.e. we have Merge and HMerge. I tried
> > and
> > > > > Merge works fine now. It is also the only one of the two flagged as
> > > > > being a tool. Should HMerge be removed? At least deprecated?
> > > > >
> > > > > Cheers,
> > > > > Lars
> > > > >
> > > > >
> > > > > On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yu...@gmail.com>
> wrote:
> > > > > >>> there is already an issue to do this but not revamp of these
> > Merge
> > > > > > classes
> > > > > > I guess the issue is HBASE-1621
> > > > > >
> > > > > > On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net> wrote:
> > > > > >
> > > > > >> Yeah, can you file an issue Lars.  This stuff is ancient and
> needs
> > > to
> > > > > >> be redone AND redone so we can do merging while table is online
> > > (there
> > > > > >> is already an issue to do this but not revamp of these Merge
> > > classes).
> > > > > >>  The unit tests for Merge are also all junit3 and do whacky
> stuff
> > to
> > > > > >> put up multiple regions.  This should be redone too (they are
> > often
> > > > > >> first thing broke when major change and putting them back
> together
> > > is
> > > > > >> a headache since they do not follow the usual pattern).
> > > > > >>
> > > > > >> St.Ack
> > > > > >>
> > > > > >> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <
> > lars.george@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >> > Hi Ted,
> > > > > >> >
> > > > > >> > The log is from an earlier attempt, I tried this a few times.
> > This
> > > > is
> > > > > all
> > > > > >> local, after rm'ing the /hbase. So the files are all pretty
> empty,
> > > but
> > > > > since
> > > > > >> I put data in I was assuming it should work. Once you gotten
> into
> > > this
> > > > > >> state, you also get funny error messages in the shell:
> > > > > >> >
> > > > > >> > hbase(main):001:0> list
> > > > > >> > TABLE
> > > > > >> > 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
> > > > > >> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
> > > > > >> org.apache.hadoop.hbase.ipc.HMasterInterface
> > > > > >> >
> > > > > >> > ERROR: undefined method `map' for nil:NilClass
> > > > > >> >
> > > > > >> > Here is some help for this command:
> > > > > >> > List all tables in hbase. Optional regular expression
> parameter
> > > > could
> > > > > >> > be used to filter the output. Examples:
> > > > > >> >
> > > > > >> >  hbase> list
> > > > > >> >  hbase> list 'abc.*'
> > > > > >> >
> > > > > >> >
> > > > > >> > hbase(main):002:0>
> > > > > >> >
> > > > > >> > I am assuming this is collateral, but why? The UI works but
> the
> > > > table
> > > > > is
> > > > > >> gone too.
> > > > > >> >
> > > > > >> > Lars
> > > > > >> >
> > > > > >> > On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
> > > > > >> >
> > > > > >> >> There is TestMergeTool which tests Merge.
> > > > > >> >>
> > > > > >> >> From the log you provided, I got a little confused as why
> > > > > >> >> 'testtable,row-20,1309613053987.
> 23a35ac696bdf4a8023dcc4c5b8419
> > > e0.'
> > > > > >> didn't
> > > > > >> >> appear in your command line or the output from .META.
> scanning.
> > > > > >> >>
> > > > > >> >> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <
> > > > lars.george@gmail.com>
> > > > > >> wrote:
> > > > > >> >>
> > > > > >> >>> Hi,
> > > > > >> >>>
> > > > > >> >>> These two seem both in a bit of a weird state: HMerge is
> > scoped
> > > > > package
> > > > > >> >>> local, therefore no one but the package can call the merge()
> > > > > >> functions...
> > > > > >> >>> and no one does that but the unit test. But it would be good
> > to
> > > > have
> > > > > >> this on
> > > > > >> >>> the CLI and shell as a command (and in the shell maybe with
> a
> > > > > >> confirmation
> > > > > >> >>> message?), but it is not available AFAIK.
> > > > > >> >>>
> > > > > >> >>> HMerge can merge regions of tables that are disabled. It
> also
> > > > merges
> > > > > >> all
> > > > > >> >>> that qualify, i.e. where the merged region is less than or
> > equal
> > > > of
> > > > > >> half the
> > > > > >> >>> configured max file size.
> > > > > >> >>>
> > > > > >> >>> Merge on the other hand does have a main(), so can be
> invoked:
> > > > > >> >>>
> > > > > >> >>> $ hbase org.apache.hadoop.hbase.util.Merge
> > > > > >> >>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
> > > > > >> >>>
> > > > > >> >>> Note how the help insinuates that you can use it as a tool,
> > but
> > > > > that is
> > > > > >> not
> > > > > >> >>> correct. Also, it only merges two given regions, and the
> > cluster
> > > > > must
> > > > > >> be
> > > > > >> >>> shut down (only the HBase daemons). So that is a step back.
> > > > > >> >>>
> > > > > >> >>> What is worse is that I cannot get it to work. I tried in
> the
> > > > shell:
> > > > > >> >>>
> > > > > >> >>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS
> =>
> > > > > >> >>> ['row-10','row-20','row-30','row-40','row-50']}
> > > > > >> >>> 0 row(s) in 0.2640 seconds
> > > > > >> >>>
> > > > > >> >>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do
> > put
> > > > > >> >>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
> > > > > >> >>> 0 row(s) in 1.0450 seconds
> > > > > >> >>>
> > > > > >> >>> hbase(main):003:0> flush 'testtable'
> > > > > >> >>> 0 row(s) in 0.2000 seconds
> > > > > >> >>>
> > > > > >> >>> hbase(main):004:0> scan '.META.', { COLUMNS =>
> > > > ['info:regioninfo']}
> > > > > >> >>> ROW                                  COLUMN+CELL
> > > > > >> >>> testtable,,1309614509037.612d1e0112 column=info:regioninfo,
> > > > > >> >>> timestamp=130...
> > > > > >> >>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY
> =>
> > > > > 'row-10'
> > > > > >> >>> testtable,row-10,1309614509040.2fba column=info:regioninfo,
> > > > > >> >>> timestamp=130...
> > > > > >> >>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10',
> > ENDKEY
> > > > =>
> > > > > >> >>> 'row-20'
> > > > > >> >>> testtable,row-20,1309614509041.e7c1 column=info:regioninfo,
> > > > > >> >>> timestamp=130...
> > > > > >> >>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20',
> > ENDKEY
> > > > =>
> > > > > >> >>> 'row-30'
> > > > > >> >>> testtable,row-30,1309614509041.a9cd column=info:regioninfo,
> > > > > >> >>> timestamp=130...
> > > > > >> >>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30',
> > ENDKEY
> > > > =>
> > > > > >> >>> 'row-40'
> > > > > >> >>> testtable,row-40,1309614509041.d458 column=info:regioninfo,
> > > > > >> >>> timestamp=130...
> > > > > >> >>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40',
> > ENDKEY
> > > > =>
> > > > > >> >>> 'row-50'
> > > > > >> >>> testtable,row-50,1309614509041.74a5 column=info:regioninfo,
> > > > > >> >>> timestamp=130...
> > > > > >> >>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50',
> > ENDKEY
> > > > =>
> > > > > ''
> > > > > >> >>> 6 row(s) in 0.0440 seconds
> > > > > >> >>>
> > > > > >> >>> hbase(main):005:0> exit
> > > > > >> >>>
> > > > > >> >>> $ ./bin/stop-hbase.sh
> > > > > >> >>>
> > > > > >> >>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
> > > > > >> >>> testtable,row-20,1309614509041.
> e7c16267eb30e147e5d988c63d40f9
> > > 82.
> > > > \
> > > > > >> >>> testtable,row-30,1309614509041.
> a9cde1cbc7d1a21b1aca2ac7fda30a
> > > d8.
> > > > > >> >>>
> > > > > >> >>> But I get consistently errors:
> > > > > >> >>>
> > > > > >> >>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
> > > > > >> >>> testtable,row-20,1309613053987.
> 23a35ac696bdf4a8023dcc4c5b8419
> > > e0.
> > > > > and
> > > > > >> >>> testtable,row-30,1309613053987.3664920956c30ac5ff2a7726e4e6
> > in
> > > > > table
> > > > > >> >>> testtable
> > > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration:
> > > blocksize=32
> > > > > MB,
> > > > > >> >>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=
> > 1000ms
> > > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
> > > > > >> >>>
> > > > > >> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_
> 1309616449171/hlog.
> > > > > 1309616449181
> > > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog:
> getNumCurrentReplicas--HDFS-
> > > 826
> > > > > not
> > > > > >> >>> available; hdfs_out=org.apache.hadoop.fs.
> > > > > FSDataOutputStream@25961581,
> > > > > >> >>>
> > > > > >> exception=org.apache.hadoop.fs.ChecksumFileSystem$
> > > > > ChecksumFSOutputSummer.getNumCurrentReplicas()
> > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > > > > tabledescriptor
> > > > > >> >>> config now ...
> > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > > > > >> -ROOT-,,0.70236052;
> > > > > >> >>> next sequenceid=1
> > > > > >> >>> info: null
> > > > > >> >>> region1: [B@48fd918a
> > > > > >> >>> region2: [B@7f5e2075
> > > > > >> >>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
> > > > > >> >>> java.io.IOException: Could not find meta region for
> > > > > >> >>> testtable,row-20,1309613053987.
> 23a35ac696bdf4a8023dcc4c5b8419
> > > e0.
> > > > > >> >>>       at
> > > > > >> >>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.
> > > > java:211)
> > > > > >> >>>       at org.apache.hadoop.hbase.util.
> > Merge.run(Merge.java:111)
> > > > > >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.
> > > > java:65)
> > > > > >> >>>       at org.apache.hadoop.hbase.util.
> > > Merge.main(Merge.java:386)
> > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > > > > tabledescriptor
> > > > > >> >>> config now ...
> > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > > > > >> .META.,,1.1028785192;
> > > > > >> >>> next sequenceid=1
> > > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed
> > > > > -ROOT-,,0.70236052
> > > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
> > > > > >> >>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
> > > > > >> >>> java.lang.NullPointerException
> > > > > >> >>>       at
> > > > > >> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:119)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > > > > MetaUtils.java:229)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > > > > MetaUtils.java:258)
> > > > > >> >>>       at org.apache.hadoop.hbase.util.
> > Merge.run(Merge.java:116)
> > > > > >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.
> > > > java:65)
> > > > > >> >>>       at org.apache.hadoop.hbase.util.
> > > Merge.main(Merge.java:386)
> > > > > >> >>>
> > > > > >> >>> After which I most of the times have shot .META. with an
> error
> > > > > >> >>>
> > > > > >> >>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.
> > > > > master.HMaster:
> > > > > >> Failed
> > > > > >> >>> getting all descriptors
> > > > > >> >>> java.io.FileNotFoundException: No status for
> > > > > >> >>> hdfs://localhost:8020/hbase/.corrupt
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(
> > > > > FSUtils.java:888)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.util.FSTableDescriptors.get(
> > > > > FSTableDescriptors.java:122)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(
> > > > > FSTableDescriptors.java:149)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.master.HMaster.
> > > getHTableDescriptors(HMaster.
> > > > > java:1429)
> > > > > >> >>>       at sun.reflect.NativeMethodAccessorImpl.
> invoke0(Native
> > > > > Method)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> sun.reflect.NativeMethodAccessorImpl.invoke(
> > > > > NativeMethodAccessorImpl.java:39)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > > > > DelegatingMethodAccessorImpl.java:25)
> > > > > >> >>>       at java.lang.reflect.Method.invoke(Method.java:597)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
> > > > > WritableRpcEngine.java:312)
> > > > > >> >>>       at
> > > > > >> >>>
> > > > > >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
> > > > > HBaseServer.java:1065)
> > > > > >> >>>
> > > > > >> >>> Lars
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > -- Appy
> > >
> >
>

Re: Merge and HMerge

Posted by Stephen Jiang <sy...@gmail.com>.
Revive this thread

I am in the process of removing Region Server side merge (and split)
transaction code in master branch; as now we have merge (and split)
procedure(s) from master doing the same thing.

The Merge tool depends on RS-side merge code.  I'd like to use this chance
to remove the util.Merge tool.  This is for 2.0 and up releases only.
Deprecation does not work here; as keeping the RS-side merge code would
have duplicate logic in source code and make the new Assignment manager
code more complicated.

Please let me know whether you have objection.

Thanks
Stephen

PS.  I could deprecated HMerge code if anyone is really using it.  It has
its own logic and standalone (supposed to dangerously work offline and
merge more than 2 regions - the util.Merge and shell not support these
functionality for now).

On Wed, Nov 16, 2016 at 11:04 AM, Enis Söztutar <en...@gmail.com> wrote:

> @Appy what is not clear from above?
>
> I think we should get rid of both Merge and HMerge.
>
> We should not have any tool which will work in offline mode by going over
> the HDFS data. Seems very brittle to be broken when things get changed.
> Only use case I can think of is that somehow you end up with a lot of
> regions and you cannot bring the cluster back up because of OOMs, etc and
> you have to reduce the number of regions in offline mode. However, we did
> not see this kind of thing in any of our customers for the last couple of
> years so far.
>
> I think we should seriously look into improving normalizer and enabling
> that by default for all the tables. Ideally, normalizer should be running
> much more frequently, and should be configured with higher-level goals and
> heuristics. Like on average how many regions per node, etc and should be
> looking at the global state (like the balancer) to decide on split / merge
> points.
>
> Enis
>
> On Wed, Nov 16, 2016 at 1:17 AM, Apekshit Sharma <ap...@cloudera.com>
> wrote:
>
> > bq. HMerge can merge multiple regions by going over the list of
> > regions and checking
> > their sizes.
> > bq. But both of these tools (Merge and HMerge) are very dangerous
> >
> > I came across HMerge and it looks like dead code. Isn't referenced from
> > anywhere except one test. (This is what lars also pointed out in the
> first
> > email too).
> > It would make perfect sense if it was a tool or was being referenced from
> > somewhere, but with lack of either of that, am a bit confused here.
> > @Enis, you seem to know everything about them, please educate me.
> > Thanks
> > - Appy
> >
> >
> >
> > On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar <en...@gmail.com>
> > wrote:
> >
> > > Merge has very limited usability singe it can do a single merge and can
> > > only run when HBase is offline.
> > > HMerge can merge multiple regions by going over the list of regions and
> > > checking their sizes.
> > > And of course we have the "supported" online merge which is the shell
> > > command.
> > >
> > > But both of these tools (Merge and HMerge) are very dangerous I think.
> I
> > > would say we should deprecate both to be replaced by the online merger
> > > tool. We should not allow offline merge at all. I fail to see the
> usecase
> > > that you have to use an offline merge.
> > >
> > > Enis
> > >
> > > On Wed, Sep 28, 2016 at 7:32 AM, Lars George <la...@gmail.com>
> > > wrote:
> > >
> > > > Hey,
> > > >
> > > > Sorry to resurrect this old thread, but working on the book update, I
> > > > came across the same today, i.e. we have Merge and HMerge. I tried
> and
> > > > Merge works fine now. It is also the only one of the two flagged as
> > > > being a tool. Should HMerge be removed? At least deprecated?
> > > >
> > > > Cheers,
> > > > Lars
> > > >
> > > >
> > > > On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yu...@gmail.com> wrote:
> > > > >>> there is already an issue to do this but not revamp of these
> Merge
> > > > > classes
> > > > > I guess the issue is HBASE-1621
> > > > >
> > > > > On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net> wrote:
> > > > >
> > > > >> Yeah, can you file an issue Lars.  This stuff is ancient and needs
> > to
> > > > >> be redone AND redone so we can do merging while table is online
> > (there
> > > > >> is already an issue to do this but not revamp of these Merge
> > classes).
> > > > >>  The unit tests for Merge are also all junit3 and do whacky stuff
> to
> > > > >> put up multiple regions.  This should be redone too (they are
> often
> > > > >> first thing broke when major change and putting them back together
> > is
> > > > >> a headache since they do not follow the usual pattern).
> > > > >>
> > > > >> St.Ack
> > > > >>
> > > > >> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <
> lars.george@gmail.com
> > >
> > > > >> wrote:
> > > > >> > Hi Ted,
> > > > >> >
> > > > >> > The log is from an earlier attempt, I tried this a few times.
> This
> > > is
> > > > all
> > > > >> local, after rm'ing the /hbase. So the files are all pretty empty,
> > but
> > > > since
> > > > >> I put data in I was assuming it should work. Once you gotten into
> > this
> > > > >> state, you also get funny error messages in the shell:
> > > > >> >
> > > > >> > hbase(main):001:0> list
> > > > >> > TABLE
> > > > >> > 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
> > > > >> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
> > > > >> org.apache.hadoop.hbase.ipc.HMasterInterface
> > > > >> >
> > > > >> > ERROR: undefined method `map' for nil:NilClass
> > > > >> >
> > > > >> > Here is some help for this command:
> > > > >> > List all tables in hbase. Optional regular expression parameter
> > > could
> > > > >> > be used to filter the output. Examples:
> > > > >> >
> > > > >> >  hbase> list
> > > > >> >  hbase> list 'abc.*'
> > > > >> >
> > > > >> >
> > > > >> > hbase(main):002:0>
> > > > >> >
> > > > >> > I am assuming this is collateral, but why? The UI works but the
> > > table
> > > > is
> > > > >> gone too.
> > > > >> >
> > > > >> > Lars
> > > > >> >
> > > > >> > On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
> > > > >> >
> > > > >> >> There is TestMergeTool which tests Merge.
> > > > >> >>
> > > > >> >> From the log you provided, I got a little confused as why
> > > > >> >> 'testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419
> > e0.'
> > > > >> didn't
> > > > >> >> appear in your command line or the output from .META. scanning.
> > > > >> >>
> > > > >> >> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <
> > > lars.george@gmail.com>
> > > > >> wrote:
> > > > >> >>
> > > > >> >>> Hi,
> > > > >> >>>
> > > > >> >>> These two seem both in a bit of a weird state: HMerge is
> scoped
> > > > package
> > > > >> >>> local, therefore no one but the package can call the merge()
> > > > >> functions...
> > > > >> >>> and no one does that but the unit test. But it would be good
> to
> > > have
> > > > >> this on
> > > > >> >>> the CLI and shell as a command (and in the shell maybe with a
> > > > >> confirmation
> > > > >> >>> message?), but it is not available AFAIK.
> > > > >> >>>
> > > > >> >>> HMerge can merge regions of tables that are disabled. It also
> > > merges
> > > > >> all
> > > > >> >>> that qualify, i.e. where the merged region is less than or
> equal
> > > of
> > > > >> half the
> > > > >> >>> configured max file size.
> > > > >> >>>
> > > > >> >>> Merge on the other hand does have a main(), so can be invoked:
> > > > >> >>>
> > > > >> >>> $ hbase org.apache.hadoop.hbase.util.Merge
> > > > >> >>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
> > > > >> >>>
> > > > >> >>> Note how the help insinuates that you can use it as a tool,
> but
> > > > that is
> > > > >> not
> > > > >> >>> correct. Also, it only merges two given regions, and the
> cluster
> > > > must
> > > > >> be
> > > > >> >>> shut down (only the HBase daemons). So that is a step back.
> > > > >> >>>
> > > > >> >>> What is worse is that I cannot get it to work. I tried in the
> > > shell:
> > > > >> >>>
> > > > >> >>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS =>
> > > > >> >>> ['row-10','row-20','row-30','row-40','row-50']}
> > > > >> >>> 0 row(s) in 0.2640 seconds
> > > > >> >>>
> > > > >> >>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do
> put
> > > > >> >>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
> > > > >> >>> 0 row(s) in 1.0450 seconds
> > > > >> >>>
> > > > >> >>> hbase(main):003:0> flush 'testtable'
> > > > >> >>> 0 row(s) in 0.2000 seconds
> > > > >> >>>
> > > > >> >>> hbase(main):004:0> scan '.META.', { COLUMNS =>
> > > ['info:regioninfo']}
> > > > >> >>> ROW                                  COLUMN+CELL
> > > > >> >>> testtable,,1309614509037.612d1e0112 column=info:regioninfo,
> > > > >> >>> timestamp=130...
> > > > >> >>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY =>
> > > > 'row-10'
> > > > >> >>> testtable,row-10,1309614509040.2fba column=info:regioninfo,
> > > > >> >>> timestamp=130...
> > > > >> >>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10',
> ENDKEY
> > > =>
> > > > >> >>> 'row-20'
> > > > >> >>> testtable,row-20,1309614509041.e7c1 column=info:regioninfo,
> > > > >> >>> timestamp=130...
> > > > >> >>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20',
> ENDKEY
> > > =>
> > > > >> >>> 'row-30'
> > > > >> >>> testtable,row-30,1309614509041.a9cd column=info:regioninfo,
> > > > >> >>> timestamp=130...
> > > > >> >>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30',
> ENDKEY
> > > =>
> > > > >> >>> 'row-40'
> > > > >> >>> testtable,row-40,1309614509041.d458 column=info:regioninfo,
> > > > >> >>> timestamp=130...
> > > > >> >>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40',
> ENDKEY
> > > =>
> > > > >> >>> 'row-50'
> > > > >> >>> testtable,row-50,1309614509041.74a5 column=info:regioninfo,
> > > > >> >>> timestamp=130...
> > > > >> >>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50',
> ENDKEY
> > > =>
> > > > ''
> > > > >> >>> 6 row(s) in 0.0440 seconds
> > > > >> >>>
> > > > >> >>> hbase(main):005:0> exit
> > > > >> >>>
> > > > >> >>> $ ./bin/stop-hbase.sh
> > > > >> >>>
> > > > >> >>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
> > > > >> >>> testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f9
> > 82.
> > > \
> > > > >> >>> testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30a
> > d8.
> > > > >> >>>
> > > > >> >>> But I get consistently errors:
> > > > >> >>>
> > > > >> >>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
> > > > >> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419
> > e0.
> > > > and
> > > > >> >>> testtable,row-30,1309613053987.3664920956c30ac5ff2a7726e4e6
> in
> > > > table
> > > > >> >>> testtable
> > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration:
> > blocksize=32
> > > > MB,
> > > > >> >>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=
> 1000ms
> > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
> > > > >> >>>
> > > > >> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_1309616449171/hlog.
> > > > 1309616449181
> > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: getNumCurrentReplicas--HDFS-
> > 826
> > > > not
> > > > >> >>> available; hdfs_out=org.apache.hadoop.fs.
> > > > FSDataOutputStream@25961581,
> > > > >> >>>
> > > > >> exception=org.apache.hadoop.fs.ChecksumFileSystem$
> > > > ChecksumFSOutputSummer.getNumCurrentReplicas()
> > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > > > tabledescriptor
> > > > >> >>> config now ...
> > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > > > >> -ROOT-,,0.70236052;
> > > > >> >>> next sequenceid=1
> > > > >> >>> info: null
> > > > >> >>> region1: [B@48fd918a
> > > > >> >>> region2: [B@7f5e2075
> > > > >> >>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
> > > > >> >>> java.io.IOException: Could not find meta region for
> > > > >> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419
> > e0.
> > > > >> >>>       at
> > > > >> >>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.
> > > java:211)
> > > > >> >>>       at org.apache.hadoop.hbase.util.
> Merge.run(Merge.java:111)
> > > > >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.
> > > java:65)
> > > > >> >>>       at org.apache.hadoop.hbase.util.
> > Merge.main(Merge.java:386)
> > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > > > tabledescriptor
> > > > >> >>> config now ...
> > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > > > >> .META.,,1.1028785192;
> > > > >> >>> next sequenceid=1
> > > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed
> > > > -ROOT-,,0.70236052
> > > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
> > > > >> >>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
> > > > >> >>> java.lang.NullPointerException
> > > > >> >>>       at
> > > > >> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:119)
> > > > >> >>>       at
> > > > >> >>>
> > > > >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > > > MetaUtils.java:229)
> > > > >> >>>       at
> > > > >> >>>
> > > > >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > > > MetaUtils.java:258)
> > > > >> >>>       at org.apache.hadoop.hbase.util.
> Merge.run(Merge.java:116)
> > > > >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.
> > > java:65)
> > > > >> >>>       at org.apache.hadoop.hbase.util.
> > Merge.main(Merge.java:386)
> > > > >> >>>
> > > > >> >>> After which I most of the times have shot .META. with an error
> > > > >> >>>
> > > > >> >>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.
> > > > master.HMaster:
> > > > >> Failed
> > > > >> >>> getting all descriptors
> > > > >> >>> java.io.FileNotFoundException: No status for
> > > > >> >>> hdfs://localhost:8020/hbase/.corrupt
> > > > >> >>>       at
> > > > >> >>>
> > > > >> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(
> > > > FSUtils.java:888)
> > > > >> >>>       at
> > > > >> >>>
> > > > >> org.apache.hadoop.hbase.util.FSTableDescriptors.get(
> > > > FSTableDescriptors.java:122)
> > > > >> >>>       at
> > > > >> >>>
> > > > >> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(
> > > > FSTableDescriptors.java:149)
> > > > >> >>>       at
> > > > >> >>>
> > > > >> org.apache.hadoop.hbase.master.HMaster.
> > getHTableDescriptors(HMaster.
> > > > java:1429)
> > > > >> >>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > > > Method)
> > > > >> >>>       at
> > > > >> >>>
> > > > >> sun.reflect.NativeMethodAccessorImpl.invoke(
> > > > NativeMethodAccessorImpl.java:39)
> > > > >> >>>       at
> > > > >> >>>
> > > > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > > > DelegatingMethodAccessorImpl.java:25)
> > > > >> >>>       at java.lang.reflect.Method.invoke(Method.java:597)
> > > > >> >>>       at
> > > > >> >>>
> > > > >> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
> > > > WritableRpcEngine.java:312)
> > > > >> >>>       at
> > > > >> >>>
> > > > >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
> > > > HBaseServer.java:1065)
> > > > >> >>>
> > > > >> >>> Lars
> > > > >> >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> >
> >
> > --
> >
> > -- Appy
> >
>

Re: Merge and HMerge

Posted by Enis Söztutar <en...@gmail.com>.
@Appy what is not clear from above?

I think we should get rid of both Merge and HMerge.

We should not have any tool which will work in offline mode by going over
the HDFS data. Seems very brittle to be broken when things get changed.
Only use case I can think of is that somehow you end up with a lot of
regions and you cannot bring the cluster back up because of OOMs, etc and
you have to reduce the number of regions in offline mode. However, we did
not see this kind of thing in any of our customers for the last couple of
years so far.

I think we should seriously look into improving normalizer and enabling
that by default for all the tables. Ideally, normalizer should be running
much more frequently, and should be configured with higher-level goals and
heuristics. Like on average how many regions per node, etc and should be
looking at the global state (like the balancer) to decide on split / merge
points.

Enis

On Wed, Nov 16, 2016 at 1:17 AM, Apekshit Sharma <ap...@cloudera.com> wrote:

> bq. HMerge can merge multiple regions by going over the list of
> regions and checking
> their sizes.
> bq. But both of these tools (Merge and HMerge) are very dangerous
>
> I came across HMerge and it looks like dead code. Isn't referenced from
> anywhere except one test. (This is what lars also pointed out in the first
> email too).
> It would make perfect sense if it was a tool or was being referenced from
> somewhere, but with lack of either of that, am a bit confused here.
> @Enis, you seem to know everything about them, please educate me.
> Thanks
> - Appy
>
>
>
> On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar <en...@gmail.com>
> wrote:
>
> > Merge has very limited usability singe it can do a single merge and can
> > only run when HBase is offline.
> > HMerge can merge multiple regions by going over the list of regions and
> > checking their sizes.
> > And of course we have the "supported" online merge which is the shell
> > command.
> >
> > But both of these tools (Merge and HMerge) are very dangerous I think. I
> > would say we should deprecate both to be replaced by the online merger
> > tool. We should not allow offline merge at all. I fail to see the usecase
> > that you have to use an offline merge.
> >
> > Enis
> >
> > On Wed, Sep 28, 2016 at 7:32 AM, Lars George <la...@gmail.com>
> > wrote:
> >
> > > Hey,
> > >
> > > Sorry to resurrect this old thread, but working on the book update, I
> > > came across the same today, i.e. we have Merge and HMerge. I tried and
> > > Merge works fine now. It is also the only one of the two flagged as
> > > being a tool. Should HMerge be removed? At least deprecated?
> > >
> > > Cheers,
> > > Lars
> > >
> > >
> > > On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yu...@gmail.com> wrote:
> > > >>> there is already an issue to do this but not revamp of these Merge
> > > > classes
> > > > I guess the issue is HBASE-1621
> > > >
> > > > On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > >> Yeah, can you file an issue Lars.  This stuff is ancient and needs
> to
> > > >> be redone AND redone so we can do merging while table is online
> (there
> > > >> is already an issue to do this but not revamp of these Merge
> classes).
> > > >>  The unit tests for Merge are also all junit3 and do whacky stuff to
> > > >> put up multiple regions.  This should be redone too (they are often
> > > >> first thing broke when major change and putting them back together
> is
> > > >> a headache since they do not follow the usual pattern).
> > > >>
> > > >> St.Ack
> > > >>
> > > >> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <lars.george@gmail.com
> >
> > > >> wrote:
> > > >> > Hi Ted,
> > > >> >
> > > >> > The log is from an earlier attempt, I tried this a few times. This
> > is
> > > all
> > > >> local, after rm'ing the /hbase. So the files are all pretty empty,
> but
> > > since
> > > >> I put data in I was assuming it should work. Once you gotten into
> this
> > > >> state, you also get funny error messages in the shell:
> > > >> >
> > > >> > hbase(main):001:0> list
> > > >> > TABLE
> > > >> > 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
> > > >> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
> > > >> org.apache.hadoop.hbase.ipc.HMasterInterface
> > > >> >
> > > >> > ERROR: undefined method `map' for nil:NilClass
> > > >> >
> > > >> > Here is some help for this command:
> > > >> > List all tables in hbase. Optional regular expression parameter
> > could
> > > >> > be used to filter the output. Examples:
> > > >> >
> > > >> >  hbase> list
> > > >> >  hbase> list 'abc.*'
> > > >> >
> > > >> >
> > > >> > hbase(main):002:0>
> > > >> >
> > > >> > I am assuming this is collateral, but why? The UI works but the
> > table
> > > is
> > > >> gone too.
> > > >> >
> > > >> > Lars
> > > >> >
> > > >> > On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
> > > >> >
> > > >> >> There is TestMergeTool which tests Merge.
> > > >> >>
> > > >> >> From the log you provided, I got a little confused as why
> > > >> >> 'testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419
> e0.'
> > > >> didn't
> > > >> >> appear in your command line or the output from .META. scanning.
> > > >> >>
> > > >> >> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <
> > lars.george@gmail.com>
> > > >> wrote:
> > > >> >>
> > > >> >>> Hi,
> > > >> >>>
> > > >> >>> These two seem both in a bit of a weird state: HMerge is scoped
> > > package
> > > >> >>> local, therefore no one but the package can call the merge()
> > > >> functions...
> > > >> >>> and no one does that but the unit test. But it would be good to
> > have
> > > >> this on
> > > >> >>> the CLI and shell as a command (and in the shell maybe with a
> > > >> confirmation
> > > >> >>> message?), but it is not available AFAIK.
> > > >> >>>
> > > >> >>> HMerge can merge regions of tables that are disabled. It also
> > merges
> > > >> all
> > > >> >>> that qualify, i.e. where the merged region is less than or equal
> > of
> > > >> half the
> > > >> >>> configured max file size.
> > > >> >>>
> > > >> >>> Merge on the other hand does have a main(), so can be invoked:
> > > >> >>>
> > > >> >>> $ hbase org.apache.hadoop.hbase.util.Merge
> > > >> >>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
> > > >> >>>
> > > >> >>> Note how the help insinuates that you can use it as a tool, but
> > > that is
> > > >> not
> > > >> >>> correct. Also, it only merges two given regions, and the cluster
> > > must
> > > >> be
> > > >> >>> shut down (only the HBase daemons). So that is a step back.
> > > >> >>>
> > > >> >>> What is worse is that I cannot get it to work. I tried in the
> > shell:
> > > >> >>>
> > > >> >>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS =>
> > > >> >>> ['row-10','row-20','row-30','row-40','row-50']}
> > > >> >>> 0 row(s) in 0.2640 seconds
> > > >> >>>
> > > >> >>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do put
> > > >> >>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
> > > >> >>> 0 row(s) in 1.0450 seconds
> > > >> >>>
> > > >> >>> hbase(main):003:0> flush 'testtable'
> > > >> >>> 0 row(s) in 0.2000 seconds
> > > >> >>>
> > > >> >>> hbase(main):004:0> scan '.META.', { COLUMNS =>
> > ['info:regioninfo']}
> > > >> >>> ROW                                  COLUMN+CELL
> > > >> >>> testtable,,1309614509037.612d1e0112 column=info:regioninfo,
> > > >> >>> timestamp=130...
> > > >> >>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY =>
> > > 'row-10'
> > > >> >>> testtable,row-10,1309614509040.2fba column=info:regioninfo,
> > > >> >>> timestamp=130...
> > > >> >>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10', ENDKEY
> > =>
> > > >> >>> 'row-20'
> > > >> >>> testtable,row-20,1309614509041.e7c1 column=info:regioninfo,
> > > >> >>> timestamp=130...
> > > >> >>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20', ENDKEY
> > =>
> > > >> >>> 'row-30'
> > > >> >>> testtable,row-30,1309614509041.a9cd column=info:regioninfo,
> > > >> >>> timestamp=130...
> > > >> >>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30', ENDKEY
> > =>
> > > >> >>> 'row-40'
> > > >> >>> testtable,row-40,1309614509041.d458 column=info:regioninfo,
> > > >> >>> timestamp=130...
> > > >> >>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40', ENDKEY
> > =>
> > > >> >>> 'row-50'
> > > >> >>> testtable,row-50,1309614509041.74a5 column=info:regioninfo,
> > > >> >>> timestamp=130...
> > > >> >>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50', ENDKEY
> > =>
> > > ''
> > > >> >>> 6 row(s) in 0.0440 seconds
> > > >> >>>
> > > >> >>> hbase(main):005:0> exit
> > > >> >>>
> > > >> >>> $ ./bin/stop-hbase.sh
> > > >> >>>
> > > >> >>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
> > > >> >>> testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f9
> 82.
> > \
> > > >> >>> testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30a
> d8.
> > > >> >>>
> > > >> >>> But I get consistently errors:
> > > >> >>>
> > > >> >>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
> > > >> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419
> e0.
> > > and
> > > >> >>> testtable,row-30,1309613053987.3664920956c30ac5ff2a7726e4e6 in
> > > table
> > > >> >>> testtable
> > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration:
> blocksize=32
> > > MB,
> > > >> >>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=1000ms
> > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
> > > >> >>>
> > > >> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_1309616449171/hlog.
> > > 1309616449181
> > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: getNumCurrentReplicas--HDFS-
> 826
> > > not
> > > >> >>> available; hdfs_out=org.apache.hadoop.fs.
> > > FSDataOutputStream@25961581,
> > > >> >>>
> > > >> exception=org.apache.hadoop.fs.ChecksumFileSystem$
> > > ChecksumFSOutputSummer.getNumCurrentReplicas()
> > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > > tabledescriptor
> > > >> >>> config now ...
> > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > > >> -ROOT-,,0.70236052;
> > > >> >>> next sequenceid=1
> > > >> >>> info: null
> > > >> >>> region1: [B@48fd918a
> > > >> >>> region2: [B@7f5e2075
> > > >> >>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
> > > >> >>> java.io.IOException: Could not find meta region for
> > > >> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419
> e0.
> > > >> >>>       at
> > > >> >>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.
> > java:211)
> > > >> >>>       at org.apache.hadoop.hbase.util.Merge.run(Merge.java:111)
> > > >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.
> > java:65)
> > > >> >>>       at org.apache.hadoop.hbase.util.
> Merge.main(Merge.java:386)
> > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > > tabledescriptor
> > > >> >>> config now ...
> > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > > >> .META.,,1.1028785192;
> > > >> >>> next sequenceid=1
> > > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed
> > > -ROOT-,,0.70236052
> > > >> >>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
> > > >> >>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
> > > >> >>> java.lang.NullPointerException
> > > >> >>>       at
> > > >> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:119)
> > > >> >>>       at
> > > >> >>>
> > > >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > > MetaUtils.java:229)
> > > >> >>>       at
> > > >> >>>
> > > >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > > MetaUtils.java:258)
> > > >> >>>       at org.apache.hadoop.hbase.util.Merge.run(Merge.java:116)
> > > >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.
> > java:65)
> > > >> >>>       at org.apache.hadoop.hbase.util.
> Merge.main(Merge.java:386)
> > > >> >>>
> > > >> >>> After which I most of the times have shot .META. with an error
> > > >> >>>
> > > >> >>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.
> > > master.HMaster:
> > > >> Failed
> > > >> >>> getting all descriptors
> > > >> >>> java.io.FileNotFoundException: No status for
> > > >> >>> hdfs://localhost:8020/hbase/.corrupt
> > > >> >>>       at
> > > >> >>>
> > > >> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(
> > > FSUtils.java:888)
> > > >> >>>       at
> > > >> >>>
> > > >> org.apache.hadoop.hbase.util.FSTableDescriptors.get(
> > > FSTableDescriptors.java:122)
> > > >> >>>       at
> > > >> >>>
> > > >> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(
> > > FSTableDescriptors.java:149)
> > > >> >>>       at
> > > >> >>>
> > > >> org.apache.hadoop.hbase.master.HMaster.
> getHTableDescriptors(HMaster.
> > > java:1429)
> > > >> >>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > > Method)
> > > >> >>>       at
> > > >> >>>
> > > >> sun.reflect.NativeMethodAccessorImpl.invoke(
> > > NativeMethodAccessorImpl.java:39)
> > > >> >>>       at
> > > >> >>>
> > > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > > DelegatingMethodAccessorImpl.java:25)
> > > >> >>>       at java.lang.reflect.Method.invoke(Method.java:597)
> > > >> >>>       at
> > > >> >>>
> > > >> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
> > > WritableRpcEngine.java:312)
> > > >> >>>       at
> > > >> >>>
> > > >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
> > > HBaseServer.java:1065)
> > > >> >>>
> > > >> >>> Lars
> > > >> >
> > > >> >
> > > >>
> > >
> >
>
>
>
> --
>
> -- Appy
>

Re: Merge and HMerge

Posted by Apekshit Sharma <ap...@cloudera.com>.
bq. HMerge can merge multiple regions by going over the list of
regions and checking
their sizes.
bq. But both of these tools (Merge and HMerge) are very dangerous

I came across HMerge and it looks like dead code. Isn't referenced from
anywhere except one test. (This is what lars also pointed out in the first
email too).
It would make perfect sense if it was a tool or was being referenced from
somewhere, but with lack of either of that, am a bit confused here.
@Enis, you seem to know everything about them, please educate me.
Thanks
- Appy



On Thu, Sep 29, 2016 at 12:43 AM, Enis Söztutar <en...@gmail.com> wrote:

> Merge has very limited usability singe it can do a single merge and can
> only run when HBase is offline.
> HMerge can merge multiple regions by going over the list of regions and
> checking their sizes.
> And of course we have the "supported" online merge which is the shell
> command.
>
> But both of these tools (Merge and HMerge) are very dangerous I think. I
> would say we should deprecate both to be replaced by the online merger
> tool. We should not allow offline merge at all. I fail to see the usecase
> that you have to use an offline merge.
>
> Enis
>
> On Wed, Sep 28, 2016 at 7:32 AM, Lars George <la...@gmail.com>
> wrote:
>
> > Hey,
> >
> > Sorry to resurrect this old thread, but working on the book update, I
> > came across the same today, i.e. we have Merge and HMerge. I tried and
> > Merge works fine now. It is also the only one of the two flagged as
> > being a tool. Should HMerge be removed? At least deprecated?
> >
> > Cheers,
> > Lars
> >
> >
> > On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yu...@gmail.com> wrote:
> > >>> there is already an issue to do this but not revamp of these Merge
> > > classes
> > > I guess the issue is HBASE-1621
> > >
> > > On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net> wrote:
> > >
> > >> Yeah, can you file an issue Lars.  This stuff is ancient and needs to
> > >> be redone AND redone so we can do merging while table is online (there
> > >> is already an issue to do this but not revamp of these Merge classes).
> > >>  The unit tests for Merge are also all junit3 and do whacky stuff to
> > >> put up multiple regions.  This should be redone too (they are often
> > >> first thing broke when major change and putting them back together is
> > >> a headache since they do not follow the usual pattern).
> > >>
> > >> St.Ack
> > >>
> > >> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <la...@gmail.com>
> > >> wrote:
> > >> > Hi Ted,
> > >> >
> > >> > The log is from an earlier attempt, I tried this a few times. This
> is
> > all
> > >> local, after rm'ing the /hbase. So the files are all pretty empty, but
> > since
> > >> I put data in I was assuming it should work. Once you gotten into this
> > >> state, you also get funny error messages in the shell:
> > >> >
> > >> > hbase(main):001:0> list
> > >> > TABLE
> > >> > 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
> > >> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
> > >> org.apache.hadoop.hbase.ipc.HMasterInterface
> > >> >
> > >> > ERROR: undefined method `map' for nil:NilClass
> > >> >
> > >> > Here is some help for this command:
> > >> > List all tables in hbase. Optional regular expression parameter
> could
> > >> > be used to filter the output. Examples:
> > >> >
> > >> >  hbase> list
> > >> >  hbase> list 'abc.*'
> > >> >
> > >> >
> > >> > hbase(main):002:0>
> > >> >
> > >> > I am assuming this is collateral, but why? The UI works but the
> table
> > is
> > >> gone too.
> > >> >
> > >> > Lars
> > >> >
> > >> > On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
> > >> >
> > >> >> There is TestMergeTool which tests Merge.
> > >> >>
> > >> >> From the log you provided, I got a little confused as why
> > >> >> 'testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.'
> > >> didn't
> > >> >> appear in your command line or the output from .META. scanning.
> > >> >>
> > >> >> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <
> lars.george@gmail.com>
> > >> wrote:
> > >> >>
> > >> >>> Hi,
> > >> >>>
> > >> >>> These two seem both in a bit of a weird state: HMerge is scoped
> > package
> > >> >>> local, therefore no one but the package can call the merge()
> > >> functions...
> > >> >>> and no one does that but the unit test. But it would be good to
> have
> > >> this on
> > >> >>> the CLI and shell as a command (and in the shell maybe with a
> > >> confirmation
> > >> >>> message?), but it is not available AFAIK.
> > >> >>>
> > >> >>> HMerge can merge regions of tables that are disabled. It also
> merges
> > >> all
> > >> >>> that qualify, i.e. where the merged region is less than or equal
> of
> > >> half the
> > >> >>> configured max file size.
> > >> >>>
> > >> >>> Merge on the other hand does have a main(), so can be invoked:
> > >> >>>
> > >> >>> $ hbase org.apache.hadoop.hbase.util.Merge
> > >> >>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
> > >> >>>
> > >> >>> Note how the help insinuates that you can use it as a tool, but
> > that is
> > >> not
> > >> >>> correct. Also, it only merges two given regions, and the cluster
> > must
> > >> be
> > >> >>> shut down (only the HBase daemons). So that is a step back.
> > >> >>>
> > >> >>> What is worse is that I cannot get it to work. I tried in the
> shell:
> > >> >>>
> > >> >>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS =>
> > >> >>> ['row-10','row-20','row-30','row-40','row-50']}
> > >> >>> 0 row(s) in 0.2640 seconds
> > >> >>>
> > >> >>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do put
> > >> >>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
> > >> >>> 0 row(s) in 1.0450 seconds
> > >> >>>
> > >> >>> hbase(main):003:0> flush 'testtable'
> > >> >>> 0 row(s) in 0.2000 seconds
> > >> >>>
> > >> >>> hbase(main):004:0> scan '.META.', { COLUMNS =>
> ['info:regioninfo']}
> > >> >>> ROW                                  COLUMN+CELL
> > >> >>> testtable,,1309614509037.612d1e0112 column=info:regioninfo,
> > >> >>> timestamp=130...
> > >> >>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY =>
> > 'row-10'
> > >> >>> testtable,row-10,1309614509040.2fba column=info:regioninfo,
> > >> >>> timestamp=130...
> > >> >>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10', ENDKEY
> =>
> > >> >>> 'row-20'
> > >> >>> testtable,row-20,1309614509041.e7c1 column=info:regioninfo,
> > >> >>> timestamp=130...
> > >> >>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20', ENDKEY
> =>
> > >> >>> 'row-30'
> > >> >>> testtable,row-30,1309614509041.a9cd column=info:regioninfo,
> > >> >>> timestamp=130...
> > >> >>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30', ENDKEY
> =>
> > >> >>> 'row-40'
> > >> >>> testtable,row-40,1309614509041.d458 column=info:regioninfo,
> > >> >>> timestamp=130...
> > >> >>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40', ENDKEY
> =>
> > >> >>> 'row-50'
> > >> >>> testtable,row-50,1309614509041.74a5 column=info:regioninfo,
> > >> >>> timestamp=130...
> > >> >>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50', ENDKEY
> =>
> > ''
> > >> >>> 6 row(s) in 0.0440 seconds
> > >> >>>
> > >> >>> hbase(main):005:0> exit
> > >> >>>
> > >> >>> $ ./bin/stop-hbase.sh
> > >> >>>
> > >> >>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
> > >> >>> testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f982.
> \
> > >> >>> testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30ad8.
> > >> >>>
> > >> >>> But I get consistently errors:
> > >> >>>
> > >> >>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
> > >> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.
> > and
> > >> >>> testtable,row-30,1309613053987.3664920956c30ac5ff2a7726e4e6 in
> > table
> > >> >>> testtable
> > >> >>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration: blocksize=32
> > MB,
> > >> >>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=1000ms
> > >> >>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
> > >> >>>
> > >> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_1309616449171/hlog.
> > 1309616449181
> > >> >>> 11/07/02 07:20:49 INFO wal.HLog: getNumCurrentReplicas--HDFS-826
> > not
> > >> >>> available; hdfs_out=org.apache.hadoop.fs.
> > FSDataOutputStream@25961581,
> > >> >>>
> > >> exception=org.apache.hadoop.fs.ChecksumFileSystem$
> > ChecksumFSOutputSummer.getNumCurrentReplicas()
> > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > tabledescriptor
> > >> >>> config now ...
> > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > >> -ROOT-,,0.70236052;
> > >> >>> next sequenceid=1
> > >> >>> info: null
> > >> >>> region1: [B@48fd918a
> > >> >>> region2: [B@7f5e2075
> > >> >>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
> > >> >>> java.io.IOException: Could not find meta region for
> > >> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.
> > >> >>>       at
> > >> >>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.
> java:211)
> > >> >>>       at org.apache.hadoop.hbase.util.Merge.run(Merge.java:111)
> > >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.
> java:65)
> > >> >>>       at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386)
> > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> > tabledescriptor
> > >> >>> config now ...
> > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> > >> .META.,,1.1028785192;
> > >> >>> next sequenceid=1
> > >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed
> > -ROOT-,,0.70236052
> > >> >>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
> > >> >>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
> > >> >>> java.lang.NullPointerException
> > >> >>>       at
> > >> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:119)
> > >> >>>       at
> > >> >>>
> > >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > MetaUtils.java:229)
> > >> >>>       at
> > >> >>>
> > >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> > MetaUtils.java:258)
> > >> >>>       at org.apache.hadoop.hbase.util.Merge.run(Merge.java:116)
> > >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.
> java:65)
> > >> >>>       at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386)
> > >> >>>
> > >> >>> After which I most of the times have shot .META. with an error
> > >> >>>
> > >> >>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.
> > master.HMaster:
> > >> Failed
> > >> >>> getting all descriptors
> > >> >>> java.io.FileNotFoundException: No status for
> > >> >>> hdfs://localhost:8020/hbase/.corrupt
> > >> >>>       at
> > >> >>>
> > >> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(
> > FSUtils.java:888)
> > >> >>>       at
> > >> >>>
> > >> org.apache.hadoop.hbase.util.FSTableDescriptors.get(
> > FSTableDescriptors.java:122)
> > >> >>>       at
> > >> >>>
> > >> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(
> > FSTableDescriptors.java:149)
> > >> >>>       at
> > >> >>>
> > >> org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.
> > java:1429)
> > >> >>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > Method)
> > >> >>>       at
> > >> >>>
> > >> sun.reflect.NativeMethodAccessorImpl.invoke(
> > NativeMethodAccessorImpl.java:39)
> > >> >>>       at
> > >> >>>
> > >> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > DelegatingMethodAccessorImpl.java:25)
> > >> >>>       at java.lang.reflect.Method.invoke(Method.java:597)
> > >> >>>       at
> > >> >>>
> > >> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
> > WritableRpcEngine.java:312)
> > >> >>>       at
> > >> >>>
> > >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
> > HBaseServer.java:1065)
> > >> >>>
> > >> >>> Lars
> > >> >
> > >> >
> > >>
> >
>



-- 

-- Appy

Re: Merge and HMerge

Posted by Enis Söztutar <en...@gmail.com>.
Merge has very limited usability singe it can do a single merge and can
only run when HBase is offline.
HMerge can merge multiple regions by going over the list of regions and
checking their sizes.
And of course we have the "supported" online merge which is the shell
command.

But both of these tools (Merge and HMerge) are very dangerous I think. I
would say we should deprecate both to be replaced by the online merger
tool. We should not allow offline merge at all. I fail to see the usecase
that you have to use an offline merge.

Enis

On Wed, Sep 28, 2016 at 7:32 AM, Lars George <la...@gmail.com> wrote:

> Hey,
>
> Sorry to resurrect this old thread, but working on the book update, I
> came across the same today, i.e. we have Merge and HMerge. I tried and
> Merge works fine now. It is also the only one of the two flagged as
> being a tool. Should HMerge be removed? At least deprecated?
>
> Cheers,
> Lars
>
>
> On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yu...@gmail.com> wrote:
> >>> there is already an issue to do this but not revamp of these Merge
> > classes
> > I guess the issue is HBASE-1621
> >
> > On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net> wrote:
> >
> >> Yeah, can you file an issue Lars.  This stuff is ancient and needs to
> >> be redone AND redone so we can do merging while table is online (there
> >> is already an issue to do this but not revamp of these Merge classes).
> >>  The unit tests for Merge are also all junit3 and do whacky stuff to
> >> put up multiple regions.  This should be redone too (they are often
> >> first thing broke when major change and putting them back together is
> >> a headache since they do not follow the usual pattern).
> >>
> >> St.Ack
> >>
> >> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <la...@gmail.com>
> >> wrote:
> >> > Hi Ted,
> >> >
> >> > The log is from an earlier attempt, I tried this a few times. This is
> all
> >> local, after rm'ing the /hbase. So the files are all pretty empty, but
> since
> >> I put data in I was assuming it should work. Once you gotten into this
> >> state, you also get funny error messages in the shell:
> >> >
> >> > hbase(main):001:0> list
> >> > TABLE
> >> > 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
> >> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
> >> org.apache.hadoop.hbase.ipc.HMasterInterface
> >> >
> >> > ERROR: undefined method `map' for nil:NilClass
> >> >
> >> > Here is some help for this command:
> >> > List all tables in hbase. Optional regular expression parameter could
> >> > be used to filter the output. Examples:
> >> >
> >> >  hbase> list
> >> >  hbase> list 'abc.*'
> >> >
> >> >
> >> > hbase(main):002:0>
> >> >
> >> > I am assuming this is collateral, but why? The UI works but the table
> is
> >> gone too.
> >> >
> >> > Lars
> >> >
> >> > On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
> >> >
> >> >> There is TestMergeTool which tests Merge.
> >> >>
> >> >> From the log you provided, I got a little confused as why
> >> >> 'testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.'
> >> didn't
> >> >> appear in your command line or the output from .META. scanning.
> >> >>
> >> >> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <la...@gmail.com>
> >> wrote:
> >> >>
> >> >>> Hi,
> >> >>>
> >> >>> These two seem both in a bit of a weird state: HMerge is scoped
> package
> >> >>> local, therefore no one but the package can call the merge()
> >> functions...
> >> >>> and no one does that but the unit test. But it would be good to have
> >> this on
> >> >>> the CLI and shell as a command (and in the shell maybe with a
> >> confirmation
> >> >>> message?), but it is not available AFAIK.
> >> >>>
> >> >>> HMerge can merge regions of tables that are disabled. It also merges
> >> all
> >> >>> that qualify, i.e. where the merged region is less than or equal of
> >> half the
> >> >>> configured max file size.
> >> >>>
> >> >>> Merge on the other hand does have a main(), so can be invoked:
> >> >>>
> >> >>> $ hbase org.apache.hadoop.hbase.util.Merge
> >> >>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
> >> >>>
> >> >>> Note how the help insinuates that you can use it as a tool, but
> that is
> >> not
> >> >>> correct. Also, it only merges two given regions, and the cluster
> must
> >> be
> >> >>> shut down (only the HBase daemons). So that is a step back.
> >> >>>
> >> >>> What is worse is that I cannot get it to work. I tried in the shell:
> >> >>>
> >> >>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS =>
> >> >>> ['row-10','row-20','row-30','row-40','row-50']}
> >> >>> 0 row(s) in 0.2640 seconds
> >> >>>
> >> >>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do put
> >> >>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
> >> >>> 0 row(s) in 1.0450 seconds
> >> >>>
> >> >>> hbase(main):003:0> flush 'testtable'
> >> >>> 0 row(s) in 0.2000 seconds
> >> >>>
> >> >>> hbase(main):004:0> scan '.META.', { COLUMNS => ['info:regioninfo']}
> >> >>> ROW                                  COLUMN+CELL
> >> >>> testtable,,1309614509037.612d1e0112 column=info:regioninfo,
> >> >>> timestamp=130...
> >> >>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY =>
> 'row-10'
> >> >>> testtable,row-10,1309614509040.2fba column=info:regioninfo,
> >> >>> timestamp=130...
> >> >>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10', ENDKEY =>
> >> >>> 'row-20'
> >> >>> testtable,row-20,1309614509041.e7c1 column=info:regioninfo,
> >> >>> timestamp=130...
> >> >>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20', ENDKEY =>
> >> >>> 'row-30'
> >> >>> testtable,row-30,1309614509041.a9cd column=info:regioninfo,
> >> >>> timestamp=130...
> >> >>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30', ENDKEY =>
> >> >>> 'row-40'
> >> >>> testtable,row-40,1309614509041.d458 column=info:regioninfo,
> >> >>> timestamp=130...
> >> >>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40', ENDKEY =>
> >> >>> 'row-50'
> >> >>> testtable,row-50,1309614509041.74a5 column=info:regioninfo,
> >> >>> timestamp=130...
> >> >>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50', ENDKEY =>
> ''
> >> >>> 6 row(s) in 0.0440 seconds
> >> >>>
> >> >>> hbase(main):005:0> exit
> >> >>>
> >> >>> $ ./bin/stop-hbase.sh
> >> >>>
> >> >>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
> >> >>> testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f982. \
> >> >>> testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30ad8.
> >> >>>
> >> >>> But I get consistently errors:
> >> >>>
> >> >>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
> >> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.
> and
> >> >>> testtable,row-30,1309613053987.3664920956c30ac5ff2a7726e4e6 in
> table
> >> >>> testtable
> >> >>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration: blocksize=32
> MB,
> >> >>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=1000ms
> >> >>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
> >> >>>
> >> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_1309616449171/hlog.
> 1309616449181
> >> >>> 11/07/02 07:20:49 INFO wal.HLog: getNumCurrentReplicas--HDFS-826
> not
> >> >>> available; hdfs_out=org.apache.hadoop.fs.
> FSDataOutputStream@25961581,
> >> >>>
> >> exception=org.apache.hadoop.fs.ChecksumFileSystem$
> ChecksumFSOutputSummer.getNumCurrentReplicas()
> >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> tabledescriptor
> >> >>> config now ...
> >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> >> -ROOT-,,0.70236052;
> >> >>> next sequenceid=1
> >> >>> info: null
> >> >>> region1: [B@48fd918a
> >> >>> region2: [B@7f5e2075
> >> >>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
> >> >>> java.io.IOException: Could not find meta region for
> >> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.
> >> >>>       at
> >> >>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.java:211)
> >> >>>       at org.apache.hadoop.hbase.util.Merge.run(Merge.java:111)
> >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >> >>>       at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386)
> >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up
> tabledescriptor
> >> >>> config now ...
> >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
> >> .META.,,1.1028785192;
> >> >>> next sequenceid=1
> >> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed
> -ROOT-,,0.70236052
> >> >>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
> >> >>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
> >> >>> java.lang.NullPointerException
> >> >>>       at
> >> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:119)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> MetaUtils.java:229)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(
> MetaUtils.java:258)
> >> >>>       at org.apache.hadoop.hbase.util.Merge.run(Merge.java:116)
> >> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >> >>>       at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386)
> >> >>>
> >> >>> After which I most of the times have shot .META. with an error
> >> >>>
> >> >>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.
> master.HMaster:
> >> Failed
> >> >>> getting all descriptors
> >> >>> java.io.FileNotFoundException: No status for
> >> >>> hdfs://localhost:8020/hbase/.corrupt
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(
> FSUtils.java:888)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.util.FSTableDescriptors.get(
> FSTableDescriptors.java:122)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(
> FSTableDescriptors.java:149)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.
> java:1429)
> >> >>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> >> >>>       at
> >> >>>
> >> sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:39)
> >> >>>       at
> >> >>>
> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:25)
> >> >>>       at java.lang.reflect.Method.invoke(Method.java:597)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(
> WritableRpcEngine.java:312)
> >> >>>       at
> >> >>>
> >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(
> HBaseServer.java:1065)
> >> >>>
> >> >>> Lars
> >> >
> >> >
> >>
>

Re: Merge and HMerge

Posted by Lars George <la...@gmail.com>.
Just to save you from searching:
https://issues.apache.org/jira/browse/HBASE-8219?focusedCommentId=13617529&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13617529

No one replied to Enis it seems?

On Wed, Sep 28, 2016 at 4:32 PM, Lars George <la...@gmail.com> wrote:
> Hey,
>
> Sorry to resurrect this old thread, but working on the book update, I
> came across the same today, i.e. we have Merge and HMerge. I tried and
> Merge works fine now. It is also the only one of the two flagged as
> being a tool. Should HMerge be removed? At least deprecated?
>
> Cheers,
> Lars
>
>
> On Thu, Jul 7, 2011 at 2:03 AM, Ted Yu <yu...@gmail.com> wrote:
>>>> there is already an issue to do this but not revamp of these Merge
>> classes
>> I guess the issue is HBASE-1621
>>
>> On Wed, Jul 6, 2011 at 2:28 PM, Stack <st...@duboce.net> wrote:
>>
>>> Yeah, can you file an issue Lars.  This stuff is ancient and needs to
>>> be redone AND redone so we can do merging while table is online (there
>>> is already an issue to do this but not revamp of these Merge classes).
>>>  The unit tests for Merge are also all junit3 and do whacky stuff to
>>> put up multiple regions.  This should be redone too (they are often
>>> first thing broke when major change and putting them back together is
>>> a headache since they do not follow the usual pattern).
>>>
>>> St.Ack
>>>
>>> On Sun, Jul 3, 2011 at 12:38 AM, Lars George <la...@gmail.com>
>>> wrote:
>>> > Hi Ted,
>>> >
>>> > The log is from an earlier attempt, I tried this a few times. This is all
>>> local, after rm'ing the /hbase. So the files are all pretty empty, but since
>>> I put data in I was assuming it should work. Once you gotten into this
>>> state, you also get funny error messages in the shell:
>>> >
>>> > hbase(main):001:0> list
>>> > TABLE
>>> > 11/07/03 09:36:21 INFO ipc.HBaseRPC: Using
>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine for
>>> org.apache.hadoop.hbase.ipc.HMasterInterface
>>> >
>>> > ERROR: undefined method `map' for nil:NilClass
>>> >
>>> > Here is some help for this command:
>>> > List all tables in hbase. Optional regular expression parameter could
>>> > be used to filter the output. Examples:
>>> >
>>> >  hbase> list
>>> >  hbase> list 'abc.*'
>>> >
>>> >
>>> > hbase(main):002:0>
>>> >
>>> > I am assuming this is collateral, but why? The UI works but the table is
>>> gone too.
>>> >
>>> > Lars
>>> >
>>> > On Jul 2, 2011, at 10:55 PM, Ted Yu wrote:
>>> >
>>> >> There is TestMergeTool which tests Merge.
>>> >>
>>> >> From the log you provided, I got a little confused as why
>>> >> 'testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.'
>>> didn't
>>> >> appear in your command line or the output from .META. scanning.
>>> >>
>>> >> On Sat, Jul 2, 2011 at 10:36 AM, Lars George <la...@gmail.com>
>>> wrote:
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> These two seem both in a bit of a weird state: HMerge is scoped package
>>> >>> local, therefore no one but the package can call the merge()
>>> functions...
>>> >>> and no one does that but the unit test. But it would be good to have
>>> this on
>>> >>> the CLI and shell as a command (and in the shell maybe with a
>>> confirmation
>>> >>> message?), but it is not available AFAIK.
>>> >>>
>>> >>> HMerge can merge regions of tables that are disabled. It also merges
>>> all
>>> >>> that qualify, i.e. where the merged region is less than or equal of
>>> half the
>>> >>> configured max file size.
>>> >>>
>>> >>> Merge on the other hand does have a main(), so can be invoked:
>>> >>>
>>> >>> $ hbase org.apache.hadoop.hbase.util.Merge
>>> >>> Usage: bin/hbase merge <table-name> <region-1> <region-2>
>>> >>>
>>> >>> Note how the help insinuates that you can use it as a tool, but that is
>>> not
>>> >>> correct. Also, it only merges two given regions, and the cluster must
>>> be
>>> >>> shut down (only the HBase daemons). So that is a step back.
>>> >>>
>>> >>> What is worse is that I cannot get it to work. I tried in the shell:
>>> >>>
>>> >>> hbase(main):001:0> create 'testtable', 'colfam1',  {SPLITS =>
>>> >>> ['row-10','row-20','row-30','row-40','row-50']}
>>> >>> 0 row(s) in 0.2640 seconds
>>> >>>
>>> >>> hbase(main):002:0> for i in '0'..'9' do for j in '0'..'9' do put
>>> >>> 'testtable', "row-#{i}#{j}", "colfam1:#{j}", "#{j}" end end
>>> >>> 0 row(s) in 1.0450 seconds
>>> >>>
>>> >>> hbase(main):003:0> flush 'testtable'
>>> >>> 0 row(s) in 0.2000 seconds
>>> >>>
>>> >>> hbase(main):004:0> scan '.META.', { COLUMNS => ['info:regioninfo']}
>>> >>> ROW                                  COLUMN+CELL
>>> >>> testtable,,1309614509037.612d1e0112 column=info:regioninfo,
>>> >>> timestamp=130...
>>> >>> 406e6c2bb482eeaec57322.             STARTKEY => '', ENDKEY => 'row-10'
>>> >>> testtable,row-10,1309614509040.2fba column=info:regioninfo,
>>> >>> timestamp=130...
>>> >>> fcc9bc6afac94c465ce5dcabc5d1.       STARTKEY => 'row-10', ENDKEY =>
>>> >>> 'row-20'
>>> >>> testtable,row-20,1309614509041.e7c1 column=info:regioninfo,
>>> >>> timestamp=130...
>>> >>> 6267eb30e147e5d988c63d40f982.       STARTKEY => 'row-20', ENDKEY =>
>>> >>> 'row-30'
>>> >>> testtable,row-30,1309614509041.a9cd column=info:regioninfo,
>>> >>> timestamp=130...
>>> >>> e1cbc7d1a21b1aca2ac7fda30ad8.       STARTKEY => 'row-30', ENDKEY =>
>>> >>> 'row-40'
>>> >>> testtable,row-40,1309614509041.d458 column=info:regioninfo,
>>> >>> timestamp=130...
>>> >>> 236feae097efcf33477e7acc51d4.       STARTKEY => 'row-40', ENDKEY =>
>>> >>> 'row-50'
>>> >>> testtable,row-50,1309614509041.74a5 column=info:regioninfo,
>>> >>> timestamp=130...
>>> >>> 7dc7e3e9602d9229b15d4c0357d1.       STARTKEY => 'row-50', ENDKEY => ''
>>> >>> 6 row(s) in 0.0440 seconds
>>> >>>
>>> >>> hbase(main):005:0> exit
>>> >>>
>>> >>> $ ./bin/stop-hbase.sh
>>> >>>
>>> >>> $ hbase org.apache.hadoop.hbase.util.Merge testtable \
>>> >>> testtable,row-20,1309614509041.e7c16267eb30e147e5d988c63d40f982. \
>>> >>> testtable,row-30,1309614509041.a9cde1cbc7d1a21b1aca2ac7fda30ad8.
>>> >>>
>>> >>> But I get consistently errors:
>>> >>>
>>> >>> 11/07/02 07:20:49 INFO util.Merge: Merging regions
>>> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0. and
>>> >>> testtable,row-30,1309613053987.3664920956c30ac5ff2a7726e4e6 in table
>>> >>> testtable
>>> >>> 11/07/02 07:20:49 INFO wal.HLog: HLog configuration: blocksize=32 MB,
>>> >>> rollsize=30.4 MB, enabled=true, optionallogflushinternal=1000ms
>>> >>> 11/07/02 07:20:49 INFO wal.HLog: New hlog
>>> >>>
>>> /Volumes/Macintosh-HD/Users/larsgeorge/.logs_1309616449171/hlog.1309616449181
>>> >>> 11/07/02 07:20:49 INFO wal.HLog: getNumCurrentReplicas--HDFS-826 not
>>> >>> available; hdfs_out=org.apache.hadoop.fs.FSDataOutputStream@25961581,
>>> >>>
>>> exception=org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.getNumCurrentReplicas()
>>> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up tabledescriptor
>>> >>> config now ...
>>> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
>>> -ROOT-,,0.70236052;
>>> >>> next sequenceid=1
>>> >>> info: null
>>> >>> region1: [B@48fd918a
>>> >>> region2: [B@7f5e2075
>>> >>> 11/07/02 07:20:49 FATAL util.Merge: Merge failed
>>> >>> java.io.IOException: Could not find meta region for
>>> >>> testtable,row-20,1309613053987.23a35ac696bdf4a8023dcc4c5b8419e0.
>>> >>>       at
>>> >>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.java:211)
>>> >>>       at org.apache.hadoop.hbase.util.Merge.run(Merge.java:111)
>>> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> >>>       at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386)
>>> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Setting up tabledescriptor
>>> >>> config now ...
>>> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Onlined
>>> .META.,,1.1028785192;
>>> >>> next sequenceid=1
>>> >>> 11/07/02 07:20:49 INFO regionserver.HRegion: Closed -ROOT-,,0.70236052
>>> >>> 11/07/02 07:20:49 INFO wal.HLog: main.logSyncer exiting
>>> >>> 11/07/02 07:20:49 ERROR util.Merge: exiting due to error
>>> >>> java.lang.NullPointerException
>>> >>>       at
>>> org.apache.hadoop.hbase.util.Merge$1.processRow(Merge.java:119)
>>> >>>       at
>>> >>>
>>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(MetaUtils.java:229)
>>> >>>       at
>>> >>>
>>> org.apache.hadoop.hbase.util.MetaUtils.scanMetaRegion(MetaUtils.java:258)
>>> >>>       at org.apache.hadoop.hbase.util.Merge.run(Merge.java:116)
>>> >>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> >>>       at org.apache.hadoop.hbase.util.Merge.main(Merge.java:386)
>>> >>>
>>> >>> After which I most of the times have shot .META. with an error
>>> >>>
>>> >>> 2011-07-02 06:42:10,763 WARN org.apache.hadoop.hbase.master.HMaster:
>>> Failed
>>> >>> getting all descriptors
>>> >>> java.io.FileNotFoundException: No status for
>>> >>> hdfs://localhost:8020/hbase/.corrupt
>>> >>>       at
>>> >>>
>>> org.apache.hadoop.hbase.util.FSUtils.getTableInfoModtime(FSUtils.java:888)
>>> >>>       at
>>> >>>
>>> org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:122)
>>> >>>       at
>>> >>>
>>> org.apache.hadoop.hbase.util.FSTableDescriptors.getAll(FSTableDescriptors.java:149)
>>> >>>       at
>>> >>>
>>> org.apache.hadoop.hbase.master.HMaster.getHTableDescriptors(HMaster.java:1429)
>>> >>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> >>>       at
>>> >>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>> >>>       at
>>> >>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> >>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>> >>>       at
>>> >>>
>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:312)
>>> >>>       at
>>> >>>
>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1065)
>>> >>>
>>> >>> Lars
>>> >
>>> >
>>>