You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Umesh Agashe (JIRA)" <ji...@apache.org> on 2018/04/03 21:21:00 UTC

[jira] [Created] (HBASE-20338) WALProcedureStore#recoverLease() should have fixed sleeps and/ or exponential backoff for retrying rollWriter()

Umesh Agashe created HBASE-20338:
------------------------------------

             Summary: WALProcedureStore#recoverLease() should have fixed sleeps and/ or exponential backoff for retrying rollWriter()
                 Key: HBASE-20338
                 URL: https://issues.apache.org/jira/browse/HBASE-20338
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.0.0-beta-2
            Reporter: Umesh Agashe


In our internal testing we observed that logs are getting flooded due to continuous loop in WALProcedureStore#recoverLease():
{code}
      while (isRunning()) {
        // Get Log-MaxID and recover lease on old logs
        try {
          flushLogId = initOldLogs(oldLogs);
        } catch (FileNotFoundException e) {
          LOG.warn("Someone else is active and deleted logs. retrying.", e);
          oldLogs = getLogFiles();
          continue;
        }

        // Create new state-log
        if (!rollWriter(flushLogId + 1)) {
          // someone else has already created this log
          LOG.debug("Someone else has already created log " + flushLogId);
          continue;
        }
{code}

rollWriter() fails to create a new file. Error messages in HDFS namenode logs around same time:
{code}
INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.create from 172.31.121.196:38508 Call#3141 Retry#0
java.io.IOException: Exeption while contacting value generator
        at org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:389)
        at org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:291)
        at org.apache.hadoop.crypto.key.kms.KMSClientProvider.generateEncryptedKey(KMSClientProvider.java:724)
        at org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:511)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2680)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$2.run(FSNamesystem.java:2676)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
        at org.apache.hadoop.security.SecurityUtil.doAsUser(SecurityUtil.java:477)
        at org.apache.hadoop.security.SecurityUtil.doAsLoginUser(SecurityUtil.java:458)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.generateEncryptedDataEncryptionKey(FSNamesystem.java:2675)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2815)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2712)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:604)
        at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2222)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:673)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
        at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
        at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1546)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
        at org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:503)
        at org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:488)
        at org.apache.hadoop.crypto.key.kms.KMSClientProvider.access$200(KMSClientProvider.java:94)
        at org.apache.hadoop.crypto.key.kms.KMSClientProvider$EncryptedQueueRefiller.fillQueueForKey(KMSClientProvider.java:149)
        at org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:378)
        ... 25 more
{code}

Both HDFS NameNode and HBase Master logs are filling up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)