You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by David Koch <og...@googlemail.com> on 2014/03/06 20:13:34 UTC

Distributed log splitting failing after cluster outage.

Hello,

Our HBase cluster had an unexpected shut-down and while trying to bring it
back up we the Master gets stuck with the following message:

Failed splitting of [ list of <host_name>,<port>,<tmst> ]
java.io.IOException: error or interrupted while splitting logs in [ list of
<host_name>,<port>,<tmst> ]
Task = installed = 10 done = 0 error = 10
at
org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:282)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:300)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:242)
at
org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:661)
at
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:580)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:396)
at java.lang.Thread.run(Thread.java:724)

What can I do to get the cluster operational again. There was no data
ingestion going on since quite some hours before the crash so maybe
clearing out /hbase/.logs/ could be an option.

Thanks,

/David

Re: Distributed log splitting failing after cluster outage.

Posted by Bharath Vissapragada <bh...@cloudera.com>.
Glad to know everything is up. We faced this issue too, I'm not really sure
whats the exact cause of this.


On Mon, Mar 10, 2014 at 4:12 AM, David Koch <og...@googlemail.com> wrote:

> Actually, all the files were 0-sized so that's in the end we deleted those
> files and HBase started up.
>
>
> On Sun, Mar 9, 2014 at 7:33 PM, Bharath Vissapragada
> <bh...@cloudera.com>wrote:
>
> > Check if there are an 0 sized wals in /hbase/.logs and sideline them and
> > restart. That could help. As Ted mentioned the actual problematic log
> names
> > are in the RS logs that got the task assigned.
> >
> >
> > On Fri, Mar 7, 2014 at 12:43 AM, David Koch <og...@googlemail.com>
> wrote:
> >
> > > Hello,
> > >
> > > Our HBase cluster had an unexpected shut-down and while trying to bring
> > it
> > > back up we the Master gets stuck with the following message:
> > >
> > > Failed splitting of [ list of <host_name>,<port>,<tmst> ]
> > > java.io.IOException: error or interrupted while splitting logs in [
> list
> > of
> > > <host_name>,<port>,<tmst> ]
> > > Task = installed = 10 done = 0 error = 10
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:282)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:300)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:242)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:661)
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:580)
> > > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:396)
> > > at java.lang.Thread.run(Thread.java:724)
> > >
> > > What can I do to get the cluster operational again. There was no data
> > > ingestion going on since quite some hours before the crash so maybe
> > > clearing out /hbase/.logs/ could be an option.
> > >
> > > Thanks,
> > >
> > > /David
> > >
> >
> >
> >
> > --
> > Bharath Vissapragada
> > <http://www.cloudera.com>
> >
>



-- 
Bharath Vissapragada
<http://www.cloudera.com>

Re: Distributed log splitting failing after cluster outage.

Posted by David Koch <og...@googlemail.com>.
Actually, all the files were 0-sized so that's in the end we deleted those
files and HBase started up.


On Sun, Mar 9, 2014 at 7:33 PM, Bharath Vissapragada
<bh...@cloudera.com>wrote:

> Check if there are an 0 sized wals in /hbase/.logs and sideline them and
> restart. That could help. As Ted mentioned the actual problematic log names
> are in the RS logs that got the task assigned.
>
>
> On Fri, Mar 7, 2014 at 12:43 AM, David Koch <og...@googlemail.com> wrote:
>
> > Hello,
> >
> > Our HBase cluster had an unexpected shut-down and while trying to bring
> it
> > back up we the Master gets stuck with the following message:
> >
> > Failed splitting of [ list of <host_name>,<port>,<tmst> ]
> > java.io.IOException: error or interrupted while splitting logs in [ list
> of
> > <host_name>,<port>,<tmst> ]
> > Task = installed = 10 done = 0 error = 10
> > at
> >
> >
> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:282)
> > at
> >
> >
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:300)
> > at
> >
> >
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:242)
> > at
> >
> >
> org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:661)
> > at
> >
> >
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:580)
> > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:396)
> > at java.lang.Thread.run(Thread.java:724)
> >
> > What can I do to get the cluster operational again. There was no data
> > ingestion going on since quite some hours before the crash so maybe
> > clearing out /hbase/.logs/ could be an option.
> >
> > Thanks,
> >
> > /David
> >
>
>
>
> --
> Bharath Vissapragada
> <http://www.cloudera.com>
>

Re: Distributed log splitting failing after cluster outage.

Posted by Bharath Vissapragada <bh...@cloudera.com>.
Check if there are an 0 sized wals in /hbase/.logs and sideline them and
restart. That could help. As Ted mentioned the actual problematic log names
are in the RS logs that got the task assigned.


On Fri, Mar 7, 2014 at 12:43 AM, David Koch <og...@googlemail.com> wrote:

> Hello,
>
> Our HBase cluster had an unexpected shut-down and while trying to bring it
> back up we the Master gets stuck with the following message:
>
> Failed splitting of [ list of <host_name>,<port>,<tmst> ]
> java.io.IOException: error or interrupted while splitting logs in [ list of
> <host_name>,<port>,<tmst> ]
> Task = installed = 10 done = 0 error = 10
> at
>
> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:282)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:300)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:242)
> at
>
> org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:661)
> at
>
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:580)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:396)
> at java.lang.Thread.run(Thread.java:724)
>
> What can I do to get the cluster operational again. There was no data
> ingestion going on since quite some hours before the crash so maybe
> clearing out /hbase/.logs/ could be an option.
>
> Thanks,
>
> /David
>



-- 
Bharath Vissapragada
<http://www.cloudera.com>

Re: Distributed log splitting failing after cluster outage.

Posted by Alok Singh <al...@urbanairship.com>.
We ran into this a few weeks ago when while adding new nodes into an
existing cluster. Due to a misconfiguration, the new nodes were assigned a
wrong zookeeper quorum, and ended up forming a new cluster.
We saw a similar error in our logs:

2014-01-30 16:47:19,196 ERROR
org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while
processing event M_META_SERVER_SHUTDOWN
java.io.IOException: failed log splitting for
xxxxx.xxx.urbanairship.com,60020,1385165871751, will retry
	at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:182)
	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: error or interrupted while splitting
logs in [maprfs:/......./xxxx.xxxx.urbanairship.com,60020,1385165871751-splitting]
Task = installed = 1 done = 0 error = 1
	at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:272)
	at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:284)
	at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:252)
	at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:175)


We fixed it by shutting the new nodes down, moving aside the offending logs
and restarting the master. Later,we fixed the zooker configuration and then
brought new nodes back into the cluster.

Alok


On Thu, Mar 6, 2014 at 11:13 AM, David Koch <og...@googlemail.com> wrote:

> Hello,
>
> Our HBase cluster had an unexpected shut-down and while trying to bring it
> back up we the Master gets stuck with the following message:
>
> Failed splitting of [ list of <host_name>,<port>,<tmst> ]
> java.io.IOException: error or interrupted while splitting logs in [ list of
> <host_name>,<port>,<tmst> ]
> Task = installed = 10 done = 0 error = 10
> at
>
> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:282)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:300)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:242)
> at
>
> org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:661)
> at
>
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:580)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:396)
> at java.lang.Thread.run(Thread.java:724)
>
> What can I do to get the cluster operational again. There was no data
> ingestion going on since quite some hours before the crash so maybe
> clearing out /hbase/.logs/ could be an option.
>
> Thanks,
>
> /David
>

Re: Distributed log splitting failing after cluster outage.

Posted by Ted Yu <yu...@gmail.com>.
bq. error or interrupted while splitting logs in

The list following the above message should reveal which log directories
had problem.
You can go to corresponding region server(s) to see what caused the issue.

BTW which hbase release are you using ?

Cheers


On Thu, Mar 6, 2014 at 11:13 AM, David Koch <og...@googlemail.com> wrote:

> Hello,
>
> Our HBase cluster had an unexpected shut-down and while trying to bring it
> back up we the Master gets stuck with the following message:
>
> Failed splitting of [ list of <host_name>,<port>,<tmst> ]
> java.io.IOException: error or interrupted while splitting logs in [ list of
> <host_name>,<port>,<tmst> ]
> Task = installed = 10 done = 0 error = 10
> at
>
> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:282)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:300)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:242)
> at
>
> org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:661)
> at
>
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:580)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:396)
> at java.lang.Thread.run(Thread.java:724)
>
> What can I do to get the cluster operational again. There was no data
> ingestion going on since quite some hours before the crash so maybe
> clearing out /hbase/.logs/ could be an option.
>
> Thanks,
>
> /David
>