You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "gaojinchao (JIRA)" <ji...@apache.org> on 2011/04/06 03:15:06 UTC

[jira] [Commented] (HBASE-3722) A lot of data is lost when name node crashed

    [ https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016208#comment-13016208 ] 

gaojinchao commented on HBASE-3722:
-----------------------------------

Jean-Daniel answers in mail as:

MasterFileSystem has most of DFS interactions, it seems that checkFileSystem is never called (it should be) and splitLog catches the ERROR when splitting but doesn't abort.

Would you mind opening a jira about this issue and perhaps submit a patch?

when will it submit a patch and I want to test it?
thanks.

>  A lot of data is lost when name node crashed
> ---------------------------------------------
>
>                 Key: HBASE-3722
>                 URL: https://issues.apache.org/jira/browse/HBASE-3722
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1
>            Reporter: gaojinchao
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getListing(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getListing(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>          at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>          at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>          at 
>  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 13 more
>  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:05,060 ERROR 
>  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
>  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>          at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>          at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>          at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>          at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>          at java.lang.Thread.run(Thread.java:662)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 18 more
>  2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:54,603 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invok

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira