You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "gaojinchao (JIRA)" <ji...@apache.org> on 2011/04/01 03:08:06 UTC

[jira] [Created] (HBASE-3722) A lot of data is lost when name node crashed

 A lot of data is lost when name node crashed
---------------------------------------------

                 Key: HBASE-3722
                 URL: https://issues.apache.org/jira/browse/HBASE-3722
             Project: HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 0.90.1
            Reporter: gaojinchao


I'm not sure exactly what arose it. there is some split failed logs .
the master should shutdown itself when the HDFS is crashed.

 The logs is :
 2011-03-22 13:21:55,056 WARN 
 org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
 logs
 java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
         at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
         at org.apache.hadoop.ipc.Client.call(Client.java:820)
         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
         at $Proxy5.getListing(Unknown Source)
         at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
         at $Proxy5.getListing(Unknown Source)
         at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
         at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
         at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
         at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
         at 
 org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
 Caused by: java.net.ConnectException: Connection refused
         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
         at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
         at org.apache.hadoop.ipc.Client.call(Client.java:788)
         ... 13 more
 2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
 2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
 2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
 2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
 2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
 2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
 2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
 2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
 2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
 2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
 2011-03-22 13:22:05,060 ERROR 
 org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
 hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
 java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
         at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
         at org.apache.hadoop.ipc.Client.call(Client.java:820)
         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
         at $Proxy5.getFileInfo(Unknown Source)
         at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
         at $Proxy5.getFileInfo(Unknown Source)
         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
         at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
         at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
         at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)
 Caused by: java.net.ConnectException: Connection refused
         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
         at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
         at org.apache.hadoop.ipc.Client.call(Client.java:788)
         ... 18 more
 2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
 2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
 2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
 2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
 2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
 2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
 2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
 2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
 2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
 2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
 2011-03-22 13:22:54,603 WARN 
 org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
 logs
 java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
         at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
         at org.apache.hadoop.ipc.Client.call(Client.java:820)
         at org.apache.hadoop.ipc.RPC$Invok



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3722) A lot of data is lost when name node crashed

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

gaojinchao updated HBASE-3722:
------------------------------

    Attachment: HmasterFilesystem_PatchV1.patch

>  A lot of data is lost when name node crashed
> ---------------------------------------------
>
>                 Key: HBASE-3722
>                 URL: https://issues.apache.org/jira/browse/HBASE-3722
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1
>            Reporter: gaojinchao
>         Attachments: HmasterFilesystem_PatchV1.patch
>
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getListing(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getListing(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>          at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>          at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>          at 
>  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 13 more
>  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:05,060 ERROR 
>  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
>  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>          at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>          at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>          at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>          at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>          at java.lang.Thread.run(Thread.java:662)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 18 more
>  2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:54,603 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invok

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3722) A lot of data is lost when name node crashed

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans reassigned HBASE-3722:
-----------------------------------------

    Assignee: gaojinchao

>  A lot of data is lost when name node crashed
> ---------------------------------------------
>
>                 Key: HBASE-3722
>                 URL: https://issues.apache.org/jira/browse/HBASE-3722
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1
>            Reporter: gaojinchao
>            Assignee: gaojinchao
>             Fix For: 0.90.3
>
>         Attachments: HmasterFilesystem_PatchV1.patch
>
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getListing(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getListing(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>          at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>          at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>          at 
>  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 13 more
>  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:05,060 ERROR 
>  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
>  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>          at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>          at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>          at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>          at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>          at java.lang.Thread.run(Thread.java:662)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 18 more
>  2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:54,603 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invok

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3722) A lot of data is lost when name node crashed

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016350#comment-13016350 ] 

gaojinchao commented on HBASE-3722:
-----------------------------------

I try to fix this bug. who do review it?  
thanks

>  A lot of data is lost when name node crashed
> ---------------------------------------------
>
>                 Key: HBASE-3722
>                 URL: https://issues.apache.org/jira/browse/HBASE-3722
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1
>            Reporter: gaojinchao
>         Attachments: HmasterFilesystem_PatchV1.patch
>
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getListing(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getListing(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>          at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>          at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>          at 
>  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 13 more
>  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:05,060 ERROR 
>  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
>  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>          at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>          at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>          at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>          at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>          at java.lang.Thread.run(Thread.java:662)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 18 more
>  2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:54,603 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invok

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3722) A lot of data is lost when name node crashed

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019520#comment-13019520 ] 

Hudson commented on HBASE-3722:
-------------------------------

Integrated in HBase-TRUNK #1850 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])
    

>  A lot of data is lost when name node crashed
> ---------------------------------------------
>
>                 Key: HBASE-3722
>                 URL: https://issues.apache.org/jira/browse/HBASE-3722
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1
>            Reporter: gaojinchao
>             Fix For: 0.90.3
>
>         Attachments: HmasterFilesystem_PatchV1.patch
>
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getListing(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getListing(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>          at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>          at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>          at 
>  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 13 more
>  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:05,060 ERROR 
>  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
>  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>          at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>          at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>          at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>          at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>          at java.lang.Thread.run(Thread.java:662)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 18 more
>  2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:54,603 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invok

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3722) A lot of data is lost when name node crashed

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016208#comment-13016208 ] 

gaojinchao commented on HBASE-3722:
-----------------------------------

Jean-Daniel answers in mail as:

MasterFileSystem has most of DFS interactions, it seems that checkFileSystem is never called (it should be) and splitLog catches the ERROR when splitting but doesn't abort.

Would you mind opening a jira about this issue and perhaps submit a patch?

when will it submit a patch and I want to test it?
thanks.

>  A lot of data is lost when name node crashed
> ---------------------------------------------
>
>                 Key: HBASE-3722
>                 URL: https://issues.apache.org/jira/browse/HBASE-3722
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1
>            Reporter: gaojinchao
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getListing(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getListing(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>          at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>          at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>          at 
>  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 13 more
>  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:05,060 ERROR 
>  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
>  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>          at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>          at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>          at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>          at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>          at java.lang.Thread.run(Thread.java:662)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 18 more
>  2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:54,603 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invok

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3722) A lot of data is lost when name node crashed

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016696#comment-13016696 ] 

stack commented on HBASE-3722:
------------------------------

That seems like an harmless addtion.  Do you think it would help w/ the issue you saw Gao Jinchao?  If so, I can commit.

>  A lot of data is lost when name node crashed
> ---------------------------------------------
>
>                 Key: HBASE-3722
>                 URL: https://issues.apache.org/jira/browse/HBASE-3722
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1
>            Reporter: gaojinchao
>         Attachments: HmasterFilesystem_PatchV1.patch
>
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getListing(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getListing(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>          at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>          at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>          at 
>  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 13 more
>  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:05,060 ERROR 
>  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
>  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>          at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>          at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>          at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>          at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>          at java.lang.Thread.run(Thread.java:662)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 18 more
>  2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:54,603 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invok

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3722) A lot of data is lost when name node crashed

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3722.
--------------------------

       Resolution: Fixed
    Fix Version/s: 0.90.3
     Hadoop Flags: [Reviewed]

Applied to branch and trunk.  Makes sense.  Thanks for patch and substantiating evidence gaojinchao.

>  A lot of data is lost when name node crashed
> ---------------------------------------------
>
>                 Key: HBASE-3722
>                 URL: https://issues.apache.org/jira/browse/HBASE-3722
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1
>            Reporter: gaojinchao
>             Fix For: 0.90.3
>
>         Attachments: HmasterFilesystem_PatchV1.patch
>
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getListing(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getListing(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>          at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>          at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>          at 
>  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 13 more
>  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:05,060 ERROR 
>  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
>  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>          at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>          at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>          at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>          at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>          at java.lang.Thread.run(Thread.java:662)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 18 more
>  2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:54,603 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invok

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3722) A lot of data is lost when name node crashed

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016768#comment-13016768 ] 

gaojinchao commented on HBASE-3722:
-----------------------------------

yes, it is important for me. thanks.
some explains about our application:
1.I have a babysitter process,  it controls all Hbase process start or stop.
  when NN crash. Hbase can be self-protection.
  when NN recover. I hope to Hbase can automatically recover service.
  if Hmaster don't shutdown itself, it will skipping splitlog and wait for assign Meta table or root table.
  when NN recover and region server start up. a lots of data is lost. especially the meta table. 
  
2. Hbase + hadoop-append should assure all data not to be lost except hadoop is lost data.
   the reliability is importance for my application. I read the code about Hlog and do some DFX tests.
   the issue is badly. but NN crashed is lowness probability. 
   I find Region server will also retart when NN crash.
   
please review the modification.  I afraid to make a mistake. 

>  A lot of data is lost when name node crashed
> ---------------------------------------------
>
>                 Key: HBASE-3722
>                 URL: https://issues.apache.org/jira/browse/HBASE-3722
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1
>            Reporter: gaojinchao
>         Attachments: HmasterFilesystem_PatchV1.patch
>
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getListing(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getListing(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>          at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>          at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>          at 
>  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 13 more
>  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:05,060 ERROR 
>  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
>  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>          at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>          at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>          at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>          at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>          at java.lang.Thread.run(Thread.java:662)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 18 more
>  2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:54,603 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invok

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3722) A lot of data is lost when name node crashed

Posted by "gaojinchao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018791#comment-13018791 ] 

gaojinchao commented on HBASE-3722:
-----------------------------------

In my cluster :
1.HDFS cluster is HA namenode( ANN and BNN)
2.HBASE Version 0.90.1:
  Active Hmaster: C4C1 
  Backup Hmaster: C4C2
  Region server: C4C3,C4C4,C4C5,...

operation:
1.ANN crashed and BNN becomed Active(that needs some time)
2.Some region server crashed(eg:C4C3 has meta table) that Hbase client is putting into data and some Region server is ok.
3.Hmaster split hlog failed and skip it.
4.BNN had been active and Hmaster had finished processed shutdown event.
5.A lots of data is lost that region server had crashed.


log as:
14:57:58 C4C3 shutdow itself  because of ANN crashed.
skip splitlog and ressigned Meta table.  

2011-04-12 14:57:58,782 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for C4C3.site,60020,1302590910433
2011-04-12 14:57:59,790 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
....
2011-04-12 14:58:08,793 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
2011-04-12 14:58:08,795 ERROR org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting hdfs://C4C1:9000/hbase/.logs/C4C3.site,60020,1302590910433
java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
2011-04-12 14:58:08,805 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region location in ZooKeeper
2011-04-12 14:58:08,880 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: Failed verification of .META.,,1 at address=C4C3.site:60020; java.net.ConnectException: Connection refused
2011-04-12 14:58:08,880 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: Current cached META location is not valid, resetting

Hmaster finished process shutdown event when BNN becomes active and meta table ressigned 

2011-04-12 15:00:31,681 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
2011-04-12 15:00:32,682 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
2011-04-12 15:00:40,698 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  .META.,,1.1028785192 state=OPENING, ts=1302591600701
2011-04-12 15:00:40,699 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=.META.,,1.1028785192
2011-04-12 15:00:40,709 INFO org.apache.hadoop.hbase.master.AssignmentManager: Successfully transitioned region=.META.,,1.1028785192 into OFFLINE and forcing a new assignment
2011-04-12 15:00:40,712 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  -ROOT-,,0.70236052 state=OPENING, ts=1302591600718
2011-04-12 15:00:40,712 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=-ROOT-,,0.70236052
2011-04-12 15:00:40,725 INFO org.apache.hadoop.hbase.master.AssignmentManager: Successfully transitioned region=-ROOT-,,0.70236052 into OFFLINE and forcing a new assignment
2011-04-12 15:00:40,892 INFO org.apache.hadoop.hbase.zookeeper.MetaNodeTracker: Detected completed assignment of META, notifying catalog tracker
2011-04-12 15:00:45,870 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 0 region(s) that C4C3.site,60020,1302590910433 was carrying (skipping 0 regions(s) that are already in transition)
2011-04-12 15:00:45,870 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished processing of shutdown of C4C3.site,60020,1302590910433



It has been lost that the Hlog is skipped if Hmaster don't restart when NN recovered.
so I think Hmaster should shutdown itslef when NN crashed.
like as region server roll Hlog shutdowns itself when it catchs any IO exception.

>  A lot of data is lost when name node crashed
> ---------------------------------------------
>
>                 Key: HBASE-3722
>                 URL: https://issues.apache.org/jira/browse/HBASE-3722
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1
>            Reporter: gaojinchao
>         Attachments: HmasterFilesystem_PatchV1.patch
>
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getListing(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getListing(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>          at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>          at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>          at 
>  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 13 more
>  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:05,060 ERROR 
>  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
>  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>          at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>          at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>          at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>          at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>          at java.lang.Thread.run(Thread.java:662)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 18 more
>  2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:54,603 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invok

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira