You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Vladimir Rodionov <vr...@carrieriq.com> on 2013/07/28 01:21:05 UTC

Master aborts on start up - URGENT

This may be related to :

https://issues.apache.org/jira/browse/HBASE-8912


It has started when I tried to install and run YCSB. I have created 'usertable' and then tried to modify it couple times (added COMPRESSION),
HBase (0.94.6) stopped working (Master could not finish initialization)

I stopped the cluster and physically removed /hbase/usertable directory as well as all ZK local stores. Restarted. No success.

I manually ran OfflineMetaRepair. Restarted. No success. This is  FATAL error in Master's log file.

For some reason, OfflineMetaRepair did not fix missing 'usertable'.

Please, advise. This is a development cluster with a large volume of data.



2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. has been deleted.
2013-07-27 23:08:56,504 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. that was online on sjc1-eng-perf-g1-grid06.carrieriq.com,60020,1374966494222
2013-07-27 23:08:56,504 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. state=PENDING_OPEN, ts=1374966536502, server=sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. state=PENDING_OPEN, ts=1374966536502, server=sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot transit it to OFFLINE.
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
        at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region TMO_NOV_INDEX-UPLOADS,46,1360181215846.6f2a2eb3924ba5cb6ed22f966e6356e8. has been deleted.
2013-07-27 23:08:56,505 INFO org.apache.hadoop.hbase.master.HMaster: Aborting


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: stack (JIRA) [jira@apache.org]
Sent: Saturday, July 27, 2013 3:21 PM
To: dev@hbase.apache.org
Subject: [jira] [Created] (HBASE-9063) TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState fails

stack created HBASE-9063:
----------------------------

             Summary: TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState fails
                 Key: HBASE-9063
                 URL: https://issues.apache.org/jira/browse/HBASE-9063
             Project: HBase
          Issue Type: Bug
          Components: test
            Reporter: stack
            Assignee: Jimmy Xiang


https://builds.apache.org/job/hbase-0.95-on-hadoop2/200/testReport/org.apache.hadoop.hbase.master/TestAssignmentManagerOnCluster/testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState/

{code}java.lang.NullPointerException
        at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1314)
        at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState(TestAssignmentManagerOnCluster.java:482)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){code}

Hope you don't mind my assigning it to you Jimmy.  Thought you might be interested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: Master aborts on start up - URGENT

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
OK, that was my issue. 

All RS failed to create table because we do not have SNAPPY support.

RS fail to create table, but Master should not abort in this case. 

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 5:47 PM
To: dev@hbase.apache.org
Subject: RE: Master aborts on start up - URGENT

Nope. this seems to be very serious issue

When I tried to recreate 'usertable' I got the same issue again:


2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x54022944d180000 Creating (or updating) unassigned node for a386becc8860c810e33bb9c9d81482bc with OFFLINE state
2013-07-28 00:35:40,747 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. destination server is sjc1-eng-perf-g1-grid04.carrieriq.com,60020,1374969681440
2013-07-28 00:35:40,748 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. so generated a random one; hri=usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67., src=, dest=sjc1-eng-perf-g1-grid19.carrieriq.com,60020,1374969681450; 20 (online=20, available=19) available servers
2013-07-28 00:35:40,748 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 16938dcb9c3bb52a46ffb7b10fab3c57
2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=usertable,user7,1374971740436.16938dcb9c3bb52a46ffb7b10fab3c57. state=CLOSED, ts=1374971740713, server=sjc1-eng-perf-g1-grid01.carrieriq.com,60020,1374969681434
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x54022944d180000 Creating (or updating) unassigned node for 16938dcb9c3bb52a46ffb7b10fab3c57 with OFFLINE state
2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48. state=PENDING_OPEN, ts=1374971740749, server=sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 .. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48. state=PENDING_OPEN, ts=1374971740749, server=sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 .. Cannot transit it to OFFLINE.
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
        at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-07-28 00:35:40,749 INFO org.apache.hadoop.hbase.master.HMaster: Aborting


Master aborted.

This is what I ran:

create 'usertable', { NAME=>'cf', VERSIONS=> 1, COMPRESSION => 'SNAPPY', BLOCKCACHE => true}, { SPLITS => ['user', 'user05', 'user1','user15','user2','user25','user3','user35','user4','user45','user5','user55','user6','user65','user7','user75','user8','user85','user9','user95' ]}

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 5:08 PM
To: dev@hbase.apache.org
Subject: RE: Master aborts on start up - URGENT

OK, I managed to fix the issue and minimize the damage.

The reason why OfflineMetaRepair failed to fix .META. was because there were inconsistencies in one of the tables
and the tool refused to do META repair. I had to physically remove this table in HDFS and then I re-ran the tool
and successfully repaired META.



table and
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 4:21 PM
To: dev@hbase.apache.org
Subject: Master aborts on start up - URGENT

This may be related to :

https://issues.apache.org/jira/browse/HBASE-8912


It has started when I tried to install and run YCSB. I have created 'usertable' and then tried to modify it couple times (added COMPRESSION),
HBase (0.94.6) stopped working (Master could not finish initialization)

I stopped the cluster and physically removed /hbase/usertable directory as well as all ZK local stores. Restarted. No success.

I manually ran OfflineMetaRepair. Restarted. No success. This is  FATAL error in Master's log file.

For some reason, OfflineMetaRepair did not fix missing 'usertable'.

Please, advise. This is a development cluster with a large volume of data.



2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. has been deleted.
2013-07-27 23:08:56,504 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. that was online on sjc1-eng-perf-g1-grid06.carrieriq.com,60020,1374966494222
2013-07-27 23:08:56,504 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. state=PENDING_OPEN, ts=1374966536502, server=sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. state=PENDING_OPEN, ts=1374966536502, server=sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot transit it to OFFLINE.
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
        at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region TMO_NOV_INDEX-UPLOADS,46,1360181215846.6f2a2eb3924ba5cb6ed22f966e6356e8. has been deleted.
2013-07-27 23:08:56,505 INFO org.apache.hadoop.hbase.master.HMaster: Aborting


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: stack (JIRA) [jira@apache.org]
Sent: Saturday, July 27, 2013 3:21 PM
To: dev@hbase.apache.org
Subject: [jira] [Created] (HBASE-9063) TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState fails

stack created HBASE-9063:
----------------------------

             Summary: TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState fails
                 Key: HBASE-9063
                 URL: https://issues.apache.org/jira/browse/HBASE-9063
             Project: HBase
          Issue Type: Bug
          Components: test
            Reporter: stack
            Assignee: Jimmy Xiang


https://builds.apache.org/job/hbase-0.95-on-hadoop2/200/testReport/org.apache.hadoop.hbase.master/TestAssignmentManagerOnCluster/testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState/

{code}java.lang.NullPointerException
        at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1314)
        at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState(TestAssignmentManagerOnCluster.java:482)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){code}

Hope you don't mind my assigning it to you Jimmy.  Thought you might be interested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: Master aborts on start up - URGENT

Posted by Ted Yu <yu...@gmail.com>.
Can you collect region server log from sjc1-eng-perf-g1-grid03.carrieriq.com?

You can pastebin portion of region server log related to
usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48. after
anonymization.

Cheers

On Sat, Jul 27, 2013 at 5:47 PM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

> Nope. this seems to be very serious issue
>
> When I tried to recreate 'usertable' I got the same issue again:
>
>
> 2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> master:60000-0x54022944d180000 Creating (or updating) unassigned node for
> a386becc8860c810e33bb9c9d81482bc with OFFLINE state
> 2013-07-28 00:35:40,747 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> IPC Server Responder
> 2013-07-28 00:35:40,747 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan
> for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67.
> destination server is sjc1-eng-perf-g1-grid04.carrieriq.com
> ,60020,1374969681440
> 2013-07-28 00:35:40,748 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: No previous transition
> plan was found (or we are ignoring an existing plan) for
> usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. so generated
> a random one;
> hri=usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67., src=,
> dest=sjc1-eng-perf-g1-grid19.carrieriq.com,60020,1374969681450; 20
> (online=20, available=19) available servers
> 2013-07-28 00:35:40,748 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:60010
> 2013-07-28 00:35:40,749 DEBUG
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED
> event for 16938dcb9c3bb52a46ffb7b10fab3c57
> 2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster:
> Master server abort: loaded coprocessors are: []
> 2013-07-28 00:35:40,749 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
> was=usertable,user7,1374971740436.16938dcb9c3bb52a46ffb7b10fab3c57.
> state=CLOSED, ts=1374971740713, server=
> sjc1-eng-perf-g1-grid01.carrieriq.com,60020,1374969681434
> 2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> master:60000-0x54022944d180000 Creating (or updating) unassigned node for
> 16938dcb9c3bb52a46ffb7b10fab3c57 with OFFLINE state
> 2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster:
> Unexpected state :
> usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48.
> state=PENDING_OPEN, ts=1374971740749, server=
> sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 .. Cannot
> transit it to OFFLINE.
> java.lang.IllegalStateException: Unexpected state :
> usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48.
> state=PENDING_OPEN, ts=1374971740749, server=
> sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 .. Cannot
> transit it to OFFLINE.
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
>         at
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
>         at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-07-28 00:35:40,749 INFO org.apache.hadoop.hbase.master.HMaster:
> Aborting
>
>
> Master aborted.
>
> This is what I ran:
>
> create 'usertable', { NAME=>'cf', VERSIONS=> 1, COMPRESSION => 'SNAPPY',
> BLOCKCACHE => true}, { SPLITS => ['user', 'user05',
> 'user1','user15','user2','user25','user3','user35','user4','user45','user5','user55','user6','user65','user7','user75','user8','user85','user9','user95'
> ]}
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Vladimir Rodionov
> Sent: Saturday, July 27, 2013 5:08 PM
> To: dev@hbase.apache.org
> Subject: RE: Master aborts on start up - URGENT
>
> OK, I managed to fix the issue and minimize the damage.
>
> The reason why OfflineMetaRepair failed to fix .META. was because there
> were inconsistencies in one of the tables
> and the tool refused to do META repair. I had to physically remove this
> table in HDFS and then I re-ran the tool
> and successfully repaired META.
>
>
>
> table and
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Vladimir Rodionov
> Sent: Saturday, July 27, 2013 4:21 PM
> To: dev@hbase.apache.org
> Subject: Master aborts on start up - URGENT
>
> This may be related to :
>
> https://issues.apache.org/jira/browse/HBASE-8912
>
>
> It has started when I tried to install and run YCSB. I have created
> 'usertable' and then tried to modify it couple times (added COMPRESSION),
> HBase (0.94.6) stopped working (Master could not finish initialization)
>
> I stopped the cluster and physically removed /hbase/usertable directory as
> well as all ZK local stores. Restarted. No success.
>
> I manually ran OfflineMetaRepair. Restarted. No success. This is  FATAL
> error in Master's log file.
>
> For some reason, OfflineMetaRepair did not fix missing 'usertable'.
>
> Please, advise. This is a development cluster with a large volume of data.
>
>
>
> 2013-07-27 23:08:56,504 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region
> TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca.
> has been deleted.
> 2013-07-27 23:08:56,504 INFO
> org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the
> region
> TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca.
> that was online on sjc1-eng-perf-g1-grid06.carrieriq.com
> ,60020,1374966494222
> 2013-07-27 23:08:56,504 FATAL org.apache.hadoop.hbase.master.HMaster:
> Unexpected state :
> usertable,,1374962208806.249881162b6ad6d084b30507283f98b8.
> state=PENDING_OPEN, ts=1374966536502, server=
> sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot
> transit it to OFFLINE.
> java.lang.IllegalStateException: Unexpected state :
> usertable,,1374962208806.249881162b6ad6d084b30507283f98b8.
> state=PENDING_OPEN, ts=1374966536502, server=
> sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot
> transit it to OFFLINE.
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
>         at
> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
>         at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2013-07-27 23:08:56,504 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: The znode of region
> TMO_NOV_INDEX-UPLOADS,46,1360181215846.6f2a2eb3924ba5cb6ed22f966e6356e8.
> has been deleted.
> 2013-07-27 23:08:56,505 INFO org.apache.hadoop.hbase.master.HMaster:
> Aborting
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: stack (JIRA) [jira@apache.org]
> Sent: Saturday, July 27, 2013 3:21 PM
> To: dev@hbase.apache.org
> Subject: [jira] [Created] (HBASE-9063)
> TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState
> fails
>
> stack created HBASE-9063:
> ----------------------------
>
>              Summary:
> TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState
> fails
>                  Key: HBASE-9063
>                  URL: https://issues.apache.org/jira/browse/HBASE-9063
>              Project: HBase
>           Issue Type: Bug
>           Components: test
>             Reporter: stack
>             Assignee: Jimmy Xiang
>
>
>
> https://builds.apache.org/job/hbase-0.95-on-hadoop2/200/testReport/org.apache.hadoop.hbase.master/TestAssignmentManagerOnCluster/testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState/
>
> {code}java.lang.NullPointerException
>         at
> org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1314)
>         at
> org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState(TestAssignmentManagerOnCluster.java:482)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>         at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>         at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>         at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>         at
> org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){code}
>
> Hope you don't mind my assigning it to you Jimmy.  Thought you might be
> interested.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

RE: Master aborts on start up - URGENT

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
Nope. this seems to be very serious issue

When I tried to recreate 'usertable' I got the same issue again:


2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x54022944d180000 Creating (or updating) unassigned node for a386becc8860c810e33bb9c9d81482bc with OFFLINE state
2013-07-28 00:35:40,747 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder
2013-07-28 00:35:40,747 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. destination server is sjc1-eng-perf-g1-grid04.carrieriq.com,60020,1374969681440
2013-07-28 00:35:40,748 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67. so generated a random one; hri=usertable,user,1374971740436.cf772fe9e49bc911024b442914a15f67., src=, dest=sjc1-eng-perf-g1-grid19.carrieriq.com,60020,1374969681450; 20 (online=20, available=19) available servers
2013-07-28 00:35:40,748 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for 16938dcb9c3bb52a46ffb7b10fab3c57
2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=usertable,user7,1374971740436.16938dcb9c3bb52a46ffb7b10fab3c57. state=CLOSED, ts=1374971740713, server=sjc1-eng-perf-g1-grid01.carrieriq.com,60020,1374969681434
2013-07-28 00:35:40,749 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x54022944d180000 Creating (or updating) unassigned node for 16938dcb9c3bb52a46ffb7b10fab3c57 with OFFLINE state
2013-07-28 00:35:40,749 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48. state=PENDING_OPEN, ts=1374971740749, server=sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 .. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : usertable,user6,1374971740436.73e01b52a570febc16833d7cc4f7ca48. state=PENDING_OPEN, ts=1374971740749, server=sjc1-eng-perf-g1-grid03.carrieriq.com,60020,1374969681445 .. Cannot transit it to OFFLINE.
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
        at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-07-28 00:35:40,749 INFO org.apache.hadoop.hbase.master.HMaster: Aborting


Master aborted.

This is what I ran:

create 'usertable', { NAME=>'cf', VERSIONS=> 1, COMPRESSION => 'SNAPPY', BLOCKCACHE => true}, { SPLITS => ['user', 'user05', 'user1','user15','user2','user25','user3','user35','user4','user45','user5','user55','user6','user65','user7','user75','user8','user85','user9','user95' ]} 

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 5:08 PM
To: dev@hbase.apache.org
Subject: RE: Master aborts on start up - URGENT

OK, I managed to fix the issue and minimize the damage.

The reason why OfflineMetaRepair failed to fix .META. was because there were inconsistencies in one of the tables
and the tool refused to do META repair. I had to physically remove this table in HDFS and then I re-ran the tool
and successfully repaired META.



table and
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 4:21 PM
To: dev@hbase.apache.org
Subject: Master aborts on start up - URGENT

This may be related to :

https://issues.apache.org/jira/browse/HBASE-8912


It has started when I tried to install and run YCSB. I have created 'usertable' and then tried to modify it couple times (added COMPRESSION),
HBase (0.94.6) stopped working (Master could not finish initialization)

I stopped the cluster and physically removed /hbase/usertable directory as well as all ZK local stores. Restarted. No success.

I manually ran OfflineMetaRepair. Restarted. No success. This is  FATAL error in Master's log file.

For some reason, OfflineMetaRepair did not fix missing 'usertable'.

Please, advise. This is a development cluster with a large volume of data.



2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. has been deleted.
2013-07-27 23:08:56,504 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. that was online on sjc1-eng-perf-g1-grid06.carrieriq.com,60020,1374966494222
2013-07-27 23:08:56,504 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. state=PENDING_OPEN, ts=1374966536502, server=sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. state=PENDING_OPEN, ts=1374966536502, server=sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot transit it to OFFLINE.
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
        at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region TMO_NOV_INDEX-UPLOADS,46,1360181215846.6f2a2eb3924ba5cb6ed22f966e6356e8. has been deleted.
2013-07-27 23:08:56,505 INFO org.apache.hadoop.hbase.master.HMaster: Aborting


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: stack (JIRA) [jira@apache.org]
Sent: Saturday, July 27, 2013 3:21 PM
To: dev@hbase.apache.org
Subject: [jira] [Created] (HBASE-9063) TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState fails

stack created HBASE-9063:
----------------------------

             Summary: TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState fails
                 Key: HBASE-9063
                 URL: https://issues.apache.org/jira/browse/HBASE-9063
             Project: HBase
          Issue Type: Bug
          Components: test
            Reporter: stack
            Assignee: Jimmy Xiang


https://builds.apache.org/job/hbase-0.95-on-hadoop2/200/testReport/org.apache.hadoop.hbase.master/TestAssignmentManagerOnCluster/testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState/

{code}java.lang.NullPointerException
        at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1314)
        at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState(TestAssignmentManagerOnCluster.java:482)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){code}

Hope you don't mind my assigning it to you Jimmy.  Thought you might be interested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: Master aborts on start up - URGENT

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
OK, I managed to fix the issue and minimize the damage.

The reason why OfflineMetaRepair failed to fix .META. was because there were inconsistencies in one of the tables
and the tool refused to do META repair. I had to physically remove this table in HDFS and then I re-ran the tool
and successfully repaired META.
 


table and 
Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 4:21 PM
To: dev@hbase.apache.org
Subject: Master aborts on start up - URGENT

This may be related to :

https://issues.apache.org/jira/browse/HBASE-8912


It has started when I tried to install and run YCSB. I have created 'usertable' and then tried to modify it couple times (added COMPRESSION),
HBase (0.94.6) stopped working (Master could not finish initialization)

I stopped the cluster and physically removed /hbase/usertable directory as well as all ZK local stores. Restarted. No success.

I manually ran OfflineMetaRepair. Restarted. No success. This is  FATAL error in Master's log file.

For some reason, OfflineMetaRepair did not fix missing 'usertable'.

Please, advise. This is a development cluster with a large volume of data.



2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. has been deleted.
2013-07-27 23:08:56,504 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. that was online on sjc1-eng-perf-g1-grid06.carrieriq.com,60020,1374966494222
2013-07-27 23:08:56,504 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. state=PENDING_OPEN, ts=1374966536502, server=sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. state=PENDING_OPEN, ts=1374966536502, server=sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot transit it to OFFLINE.
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
        at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region TMO_NOV_INDEX-UPLOADS,46,1360181215846.6f2a2eb3924ba5cb6ed22f966e6356e8. has been deleted.
2013-07-27 23:08:56,505 INFO org.apache.hadoop.hbase.master.HMaster: Aborting


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: stack (JIRA) [jira@apache.org]
Sent: Saturday, July 27, 2013 3:21 PM
To: dev@hbase.apache.org
Subject: [jira] [Created] (HBASE-9063) TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState fails

stack created HBASE-9063:
----------------------------

             Summary: TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState fails
                 Key: HBASE-9063
                 URL: https://issues.apache.org/jira/browse/HBASE-9063
             Project: HBase
          Issue Type: Bug
          Components: test
            Reporter: stack
            Assignee: Jimmy Xiang


https://builds.apache.org/job/hbase-0.95-on-hadoop2/200/testReport/org.apache.hadoop.hbase.master/TestAssignmentManagerOnCluster/testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState/

{code}java.lang.NullPointerException
        at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1314)
        at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState(TestAssignmentManagerOnCluster.java:482)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){code}

Hope you don't mind my assigning it to you Jimmy.  Thought you might be interested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

RE: Master aborts on start up - URGENT

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
There is a parallel thread with a similar issue in user group:
http://mail-archives.apache.org/mod_mbox/hbase-user/201307.mbox/%3CEE3F98CB-A4E8-4BFF-8C5F-AC50E164EB0D%40gmail.com%3E

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Vladimir Rodionov
Sent: Saturday, July 27, 2013 4:21 PM
To: dev@hbase.apache.org
Subject: Master aborts on start up - URGENT

This may be related to :

https://issues.apache.org/jira/browse/HBASE-8912


It has started when I tried to install and run YCSB. I have created 'usertable' and then tried to modify it couple times (added COMPRESSION),
HBase (0.94.6) stopped working (Master could not finish initialization)

I stopped the cluster and physically removed /hbase/usertable directory as well as all ZK local stores. Restarted. No success.

I manually ran OfflineMetaRepair. Restarted. No success. This is  FATAL error in Master's log file.

For some reason, OfflineMetaRepair did not fix missing 'usertable'.

Please, advise. This is a development cluster with a large volume of data.



2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. has been deleted.
2013-07-27 23:08:56,504 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region TMO_NOV_INDEX-UPLOADS,38,1360181215845.2553b53773e3cb9030c3248768a3b0ca. that was online on sjc1-eng-perf-g1-grid06.carrieriq.com,60020,1374966494222
2013-07-27 23:08:56,504 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. state=PENDING_OPEN, ts=1374966536502, server=sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state : usertable,,1374962208806.249881162b6ad6d084b30507283f98b8. state=PENDING_OPEN, ts=1374966536502, server=sjc1-eng-perf-g1-grid14.carrieriq.com,60020,1374966494232 .. Cannot transit it to OFFLINE.
        at org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1820)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1659)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
        at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
        at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2013-07-27 23:08:56,504 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region TMO_NOV_INDEX-UPLOADS,46,1360181215846.6f2a2eb3924ba5cb6ed22f966e6356e8. has been deleted.
2013-07-27 23:08:56,505 INFO org.apache.hadoop.hbase.master.HMaster: Aborting


Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: stack (JIRA) [jira@apache.org]
Sent: Saturday, July 27, 2013 3:21 PM
To: dev@hbase.apache.org
Subject: [jira] [Created] (HBASE-9063) TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState fails

stack created HBASE-9063:
----------------------------

             Summary: TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState fails
                 Key: HBASE-9063
                 URL: https://issues.apache.org/jira/browse/HBASE-9063
             Project: HBase
          Issue Type: Bug
          Components: test
            Reporter: stack
            Assignee: Jimmy Xiang


https://builds.apache.org/job/hbase-0.95-on-hadoop2/200/testReport/org.apache.hadoop.hbase.master/TestAssignmentManagerOnCluster/testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState/

{code}java.lang.NullPointerException
        at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:1314)
        at org.apache.hadoop.hbase.master.TestAssignmentManagerOnCluster.testSSHWhenDisablingTableRegionsInOpeningOrPendingOpenState(TestAssignmentManagerOnCluster.java:482)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74){code}

Hope you don't mind my assigning it to you Jimmy.  Thought you might be interested.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.