You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2014/03/06 06:04:43 UTC
[jira] [Commented] (HBASE-10476) HBase Master log grows very fast after stopped hadoop (due to connection exception)

    [ https://issues.apache.org/jira/browse/HBASE-10476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922014#comment-13922014 ] 

Hadoop QA commented on HBASE-10476:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12632982/HBASE-10476-trunk-v0.patch
  against trunk revision .
  ATTACHMENT ID: 12632982

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    {color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 1.0 profile.

    {color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 1.1 profile.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
     

     {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s): 

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8908//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8908//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8908//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8908//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8908//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8908//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8908//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8908//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8908//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8908//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8908//console

This message is automatically generated.

> HBase Master log grows very fast after stopped hadoop (due to connection exception)
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-10476
>                 URL: https://issues.apache.org/jira/browse/HBASE-10476
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0, 0.96.0
>            Reporter: Demai Ni
>            Assignee: Demai Ni
>             Fix For: 0.96.2, 0.98.1, 0.99.0
>
>         Attachments: HBASE-10476-trunk-v0.patch
>
>
> hbase 96.0(probably the same issue on 94.x) on single node cluster. At some point, we stopped Hadoop, but keep hbase running. As expected, hbase began to throw connection errors. 
> For the first hour, the regionserver log grows by ~10MB, and master log doesn't grow much,  which is ok. 
> {code:title=log size after one hour}
> -rw-rw-r-- 1 biadmin biadmin  497959 Feb  5 10:36 hbase-biadmin-master-hdtest014.svl.ibm.com.log
> ...
> -rw-rw-r-- 1 biadmin biadmin 8865371 Feb  5 10:37 hbase-biadmin-regionserver-hdtest014.svl.ibm.com.log
> {code}
> However, within 4 hours, the Master log grows to 13GB. And it only stops due to out of disk space. 
> {code:title=log size after 4 hour}
> -rw-rw-r-- 1 biadmin biadmin  3521880064 Feb  5 14:10 hbase-biadmin-master-hdtest014.svl.ibm.com.log
> -rw-rw-r-- 1 biadmin biadmin 10737418582 Feb  5 11:25 hbase-biadmin-master-hdtest014.svl.ibm.com.log.1
> ...
> -rw-rw-r-- 1 biadmin biadmin    11222365 Feb  5 10:49 hbase-biadmin-regionserver-hdtest014.svl.ibm.com.log
> {code}
> The exception/error message filled out Master log is 
> {code:title=Error message filling up Master log}
> 2014-02-05 11:37:48,688 INFO org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: Splitting hbase:meta logs for hdtest014.svl.ibm.com,60020,1391622549030
> 2014-02-05 11:37:48,689 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_META_SERVER_SHUTDOWN
> java.io.IOException: failed log splitting for hdtest014.svl.ibm.com,60020,1391622549030, will retry
>     at org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:70)
>     at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:906)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:929)
>     at java.lang.Thread.run(Thread.java:738)
> Caused by: java.net.ConnectException: Call From hdtest014.svl.ibm.com/9.30.194.23 to hdtest014.svl.ibm.com:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>     at sun.reflect.GeneratedConstructorAccessor5.newInstance(Unknown Source)
>     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:39)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:527)
>     at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
>     at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1351)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1300)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy8.getFileInfo(Unknown Source)
>     at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>     at java.lang.reflect.Method.invoke(Method.java:611)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>     at com.sun.proxy.$Proxy8.getFileInfo(Unknown Source)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
>     at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
>     at java.lang.reflect.Method.invoke(Method.java:611)
>     at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
>     at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
>     at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
>     at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
>     at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
>     at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
>     at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1406)
>     at org.apache.hadoop.hbase.master.MasterFileSystem.getLogDirs(MasterFileSystem.java:317)
>     at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:405)
>     at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
>     at org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
>     at org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:63)
>     ... 4 more
> Caused by: java.net.ConnectException: Connection refused
>     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:614)
>     at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>     at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
>     at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
>     at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
>     at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1318)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)