You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Prabhu Joseph (Jira)" <ji...@apache.org> on 2021/11/15 18:18:00 UTC

[jira] [Comment Edited] (HADOOP-17996) UserGroupInformation#unprotectedRelogin sets the last login time before logging in

    [ https://issues.apache.org/jira/browse/HADOOP-17996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444021#comment-17444021 ] 

Prabhu Joseph edited comment on HADOOP-17996 at 11/15/21, 6:17 PM:
-------------------------------------------------------------------

Thanks [~brahmareddy] for reviewing the patch.
{quote}this was just to track the re-login attempt so that so many retries can be avoided.?
{quote}
There are two issues the patch tries to address

1. When IPC#Client fails during {{{}saslConnect{}}}, it does re-login from {{{}handleSaslConnectionFailure{}}}. The re-login sets the last login time to current time irrespective of the login status, followed by logout and then login. When login fails for some reason like intermittent issue in connecting to AD, then all subsequent Client and Server operations will fail with GSS Initiate Failed for next configured {{kerberosMinSecondsBeforeLogin}} (60 seconds).
{code:java}
    // try re-login
              if (UserGroupInformation.isLoginKeytabBased()) {
                UserGroupInformation.getLoginUser().reloginFromKeytab();
              } else if (UserGroupInformation.isLoginTicketBased()) {
                UserGroupInformation.getLoginUser().reloginFromTicketCache();
              }
{code}
This issue is addressed by setting the last login time to current time after the login succeeds. 

2. Currently the re-login happens only from IPC#Client during {{{}handleSaslConnectionFailure(){}}}. Have observed cases where Client has logged out and have failed to login back leading to all IPC#Server operations failing in {{processSaslMessage}} with below error.
{code:java}
2021-11-02 13:28:08,750 WARN  ipc.Server - Auth failed for 10.25.35.45:37849:null (GSS initiate failed) with true cause: (GSS initiate failed)
2021-11-02 13:28:08,767 WARN  ipc.Server - Auth failed for 10.25.35.46:35919:null (GSS initiate failed) with true cause: (GSS initiate failed)
{code}
This patch adds re-login from Server side as well during any Authentication Failure.

bq. Configuring kerberosMinSecondsBeforeRelogin with low value will not work here if it's needed.?
This will workaround the first issue.
 

bq. After this fix , on failure it will continuously retry..?

IPC#Client does re-login during Connection Failure. This patch adds at IPC#Server side as well. Retries are based on the retry mechanism of IPC#Client and IPC#Server. The real kerberos login will happen for every retry from IPC#Client and IPC#Server till the login succeeds.


was (Author: prabhu joseph):
Thanks [~brahmareddy] for reviewing the patch.
{quote}this was just to track the re-login attempt so that so many retries can be avoided.?
{quote}
There are two issues the patch tries to address

1. When IPC#Client fails during {{{}saslConnect{}}}, it does re-login from {{{}handleSaslConnectionFailure{}}}. The re-login sets the last login time to current time irrespective of the login status, followed by logout and then login. When login fails for some reason like intermittent issue in connecting to AD, then all subsequent Client and Server operations will fail with GSS Initiate Failed for next configured {{kerberosMinSecondsBeforeLogin}} (60 seconds).
{code:java}
    // try re-login
              if (UserGroupInformation.isLoginKeytabBased()) {
                UserGroupInformation.getLoginUser().reloginFromKeytab();
              } else if (UserGroupInformation.isLoginTicketBased()) {
                UserGroupInformation.getLoginUser().reloginFromTicketCache();
              }
{code}
This issue is addressed by setting the last login time to current time after the login succeeds. 

2. Currently the re-login happens only from IPC#Client during {{{}handleSaslConnectionFailure(){}}}. Have observed cases where Client has logged out and have failed to login back leading to all IPC#Server operations failing in {{processSaslMessage}} with below error.
{code:java}
2021-11-02 13:28:08,750 WARN  ipc.Server - Auth failed for 10.25.35.45:37849:null (GSS initiate failed) with true cause: (GSS initiate failed)
2021-11-02 13:28:08,767 WARN  ipc.Server - Auth failed for 10.25.35.46:35919:null (GSS initiate failed) with true cause: (GSS initiate failed)
{code}
This patch adds re-login from Server side as well during any Authentication Failure.

bq. Configuring kerberosMinSecondsBeforeRelogin with low value will not work here if it's needed.?
This will workaround the first issue.
 
{quote}
{quote}After this fix , on failure it will continuously retry..?
{quote}

IPC#Client does re-login during Connection Failure. This patch adds at IPC#Server side as well. Retries are based on the retry mechanism of IPC#Client and IPC#Server. The real kerberos login will happen for every retry from IPC#Client and IPC#Server till the login succeeds.

> UserGroupInformation#unprotectedRelogin sets the last login time before logging in
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-17996
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17996
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 3.3.1
>            Reporter: Prabhu Joseph
>            Assignee: Ravuri Sushma sree
>            Priority: Major
>         Attachments: HADOOP-17996.001.patch
>
>
> UserGroupInformation#unprotectedRelogin sets the last login time before logging in. IPC#Client does reloginFromKeytab when there is a connection reset failure from AD which does logout and set the last login time to now and then tries to login. The login also fails as not able to connect to AD. Then the reattempts does not happen as kerberosMinSecondsBeforeRelogin check fails. All Client and Server operations fails with *GSS initiate failed*
> {code}
> 2021-10-31 09:50:53,546 WARN  ha.EditLogTailer - Unable to trigger a roll of the active NN
> java.util.concurrent.ExecutionException: org.apache.hadoop.security.KerberosAuthException:  DestHost:destPort namenode0:8020 , LocalHost:localPort namenode1/1.2.3.4:0. Failed on local exception: org.apache.hadoop.security.KerberosAuthException: Login failure for user: nn/namenode1@EXAMPLE.COM javax.security.auth.login.LoginException: Connection reset
> 	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> 	at java.util.concurrent.FutureTask.get(FutureTask.java:206)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:382)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:441)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:360)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1712)
> 	at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> Caused by: org.apache.hadoop.security.KerberosAuthException:  DestHost:destPort namenode0:8020 , LocalHost:localPort namenode1/1.2.3.4:0. Failed on local exception: org.apache.hadoop.security.KerberosAuthException: Login failure for user: nn/namenode1@EXAMPLE.COM javax.security.auth.login.LoginException: Connection reset
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)
> 	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1501)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1443)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1353)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> 	at com.sun.proxy.$Proxy21.rollEditLog(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:150)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:367)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:364)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$MultipleNameNodeProxy.call(EditLogTailer.java:514)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.security.KerberosAuthException: Login failure for user: nn/namenode1@EXAMPLE.COM javax.security.auth.login.LoginException: Connection reset
> 	at org.apache.hadoop.security.UserGroupInformation.unprotectedRelogin(UserGroupInformation.java:1193)
> 	at org.apache.hadoop.security.UserGroupInformation.relogin(UserGroupInformation.java:1159)
> 	at org.apache.hadoop.security.UserGroupInformation.reloginFromKeytab(UserGroupInformation.java:1128)
> 	at org.apache.hadoop.security.UserGroupInformation.reloginFromKeytab(UserGroupInformation.java:1110)
> 	at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:734)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1732)
> 	at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:720)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:813)
> 	at org.apache.hadoop.ipc.Client$Connection.access$3600(Client.java:410)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1558)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1389)
> 	... 12 more
> Caused by: javax.security.auth.login.LoginException: Connection reset
> 	at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:812)
> 	at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:618)
> 	at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
> 	at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
> 	at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
> 	at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
> 	at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
> 	at org.apache.hadoop.security.UserGroupInformation$HadoopLoginContext.login(UserGroupInformation.java:1928)
> 	at org.apache.hadoop.security.UserGroupInformation.unprotectedRelogin(UserGroupInformation.java:1187)
> 	... 24 more
> Caused by: java.net.SocketException: Connection reset
> 	at java.net.SocketInputStream.read(SocketInputStream.java:210)
> 	at java.net.SocketInputStream.read(SocketInputStream.java:141)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> 	at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> 	at sun.security.krb5.internal.TCPClient.readFully(NetClient.java:130)
> 	at sun.security.krb5.internal.TCPClient.receive(NetClient.java:82)
> 	at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:404)
> 	at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:364)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.security.krb5.KdcComm.send(KdcComm.java:348)
> 	at sun.security.krb5.KdcComm.sendIfPossible(KdcComm.java:253)
> 	at sun.security.krb5.KdcComm.send(KdcComm.java:229)
> 	at sun.security.krb5.KdcComm.send(KdcComm.java:200)
> 	at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:345)
> 	at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:498)
> 	at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:780)
> 	... 37 more
> 2021-10-31 09:50:53,576 WARN  security.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 60 seconds before. Last Login=1635673853525
> 2021-10-31 09:50:53,576 WARN  security.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 60 seconds before. Last Login=1635673853525
> 2021-10-31 09:50:53,576 WARN  security.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 60 seconds before. Last Login=1635673853525
> 2021-10-31 09:50:56,085 WARN  security.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 60 seconds before. Last Login=1635673853525
> 2021-11-02 13:28:08,750 WARN  ipc.Server - Auth failed for 10.25.35.45:37849:null (GSS initiate failed) with true cause: (GSS initiate failed)
> 2021-11-02 13:28:08,767 WARN  ipc.Server - Auth failed for 10.25.35.46:35919:null (GSS initiate failed) with true cause: (GSS initiate failed)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org