You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "zhuobin zheng (Jira)" <ji...@apache.org> on 2021/06/24 03:12:00 UTC
[jira] [Commented] (HBASE-26022) DNS jitter causes hbase client to
get stuck
[ https://issues.apache.org/jira/browse/HBASE-26022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368577#comment-17368577 ]
zhuobin zheng commented on HBASE-26022:
---------------------------------------
In *master branch*, it seem like RpcClient will dynamic generate server principal before create saslClient everyTime. So, it's not a problem.
But it seems to be a problem too in branch-1. I will try to fix it latter.
> DNS jitter causes hbase client to get stuck
> -------------------------------------------
>
> Key: HBASE-26022
> URL: https://issues.apache.org/jira/browse/HBASE-26022
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: zhuobin zheng
> Assignee: zhuobin zheng
> Priority: Major
>
> In our product hbase cluster, we occasionally encounter below errors, and stuck hbase a long time. Then hbase requests to this machine will fail forever.
> {code:java}
> WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:${user@realm} (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]
> WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:${user@realm} (auth:KERBEROS) cause:java.io.IOException: Couldn't setup connection for ${user@realm} to hbase/${ip}@realm
> {code}
> The main problem is the trully server principal we generated in KDC is hbase/*${hostname}*@realm, so we must can't find hbase/*${ip}*@realm in KDC.
> When RpcClientImpl#Connection construct, the field serverPrincial which never changed generated by method InetAddress.getCanonicalHostName() which will return IP when failed to get hostname.
> Therefor, once DNS jitter when RpcClientImpl#Connection, this connection will never setup sasl env. And I'm not see connection abandon logic in sasl failed code path.
> I think of two solutions to this problem:
> # Abandon connection when sasl failed. So next request will reconstruct a connection, and will regenerate a new server principal.
> # Refresh serverPrincial field when sasl failed. So next retry will use new server principal.
> HBase Version: 1.2.0-cdh5.14.4
--
This message was sent by Atlassian Jira
(v8.3.4#803005)