You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Gaojinchao <ga...@huawei.com> on 2011/04/01 03:14:51 UTC

RE: A lot of data is lost when name node crashed

Thanks, please submit a patch and I can try to test it.
Jira is :
https://issues.apache.org/jira/browse/HBASE-3722

-----邮件原件-----
发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年4月1日 1:20
收件人: Gaojinchao; user@hbase.apache.org
主题: Re: A lot of data is lost when name node crashed

(sending back to the list, please don't answer to directly to the
sender, always send back to the mailing list)

MasterFileSystem has most of DFS interactions, it seems that
checkFileSystem is never called (it should be) and splitLog catches
the ERROR when splitting but doesn't abort.

Would you mind opening a jira about this issue and perhaps submit a patch?

Thx,

J-D

On Thu, Mar 31, 2011 at 5:40 AM, Gaojinchao <ga...@huawei.com> wrote:
> Thanks, I will try to do it again because last one info log level don't turn on.
> I have a question.
> Which code is the Master kill itself when it find namenode crashed?
>
> if (isCarryingRoot()) { // -ROOT-
>      try {
>        this.services.getAssignmentManager().assignRoot();
>      } catch (KeeperException e) {
>        this.server.abort("In server shutdown processing, assigning root", e);
>        throw new IOException("Aborting", e);
>      }
>    }
>
> -----邮件原件-----
> 发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
> 发送时间: 2011年3月30日 1:39
> 收件人: user@hbase.apache.org
> 抄送: Gaojinchao; Chenjian
> 主题: Re: A lot of data is lost when name node crashed
>
> I was expecting it would die, strange it didn't. Could you provide a
> bigger log, this one basically tells us the NN is gone but that's
> about it. Please put it on a web server or something else that's
> easily reachable for anyone (eg don't post the full thing here).
>
> Thx,
>
>
> J-D
>
> On Tue, Mar 29, 2011 at 4:28 AM, Gaojinchao <ga...@huawei.com> wrote:
>> I do some performance test for hbase version 0.90.1
>> when the name node crashed, I find some data lost.
>> I'm not sure exactly what arose it.  It seems like split logs failed.
>> I think the master should shutdown itself when HDFS crashed.
>>
>>
>> The logs is :
>> 2011-03-22 13:21:55,056 WARN org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs
>> java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:820)
>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>>         at $Proxy5.getListing(Unknown Source)
>>         at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>         at $Proxy5.getListing(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>>         at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>>         at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>>         at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>>         at org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>> Caused by: java.net.ConnectException: Connection refused
>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:788)
>>         ... 13 more
>> 2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>> 2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>> 2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>> 2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>> 2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>> 2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>> 2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>> 2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>> 2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>> 2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>> 2011-03-22 13:22:05,060 ERROR org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>> java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:820)
>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>>         at $Proxy5.getFileInfo(Unknown Source)
>>         at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>         at $Proxy5.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>>         at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>>         at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>>         at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>>         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.net.ConnectException: Connection refused
>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:788)
>>         ... 18 more
>> 2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>> 2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>> 2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>> 2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>> 2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>> 2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>> 2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>> 2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>> 2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>> 2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>> 2011-03-22 13:22:54,603 WARN org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs
>> java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:820)
>>         at org.apache.hadoop.ipc.RPC$Invok
>>
>