You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uniffle.apache.org by GitBox <gi...@apache.org> on 2022/09/23 10:49:23 UTC

[GitHub] [incubator-uniffle] jerqi commented on issue #196: Flaky test ShuffleFlushManagerOnKerberizedHdfsTest

jerqi commented on issue #196:
URL: https://github.com/apache/incubator-uniffle/issues/196#issuecomment-1256062111

   > The stacktrace is as follow
   > 
   > ```
   > 2022-09-01 07:31:40,131 ERROR [FlushEventThreadPool] server.ShuffleFlushManager (ShuffleFlushManager.java:flushToFile(209)) - Exception happened when process flush shuffle data for ShuffleDataFlushEvent: eventId=0, appId=complexWriteTest_appId1, shuffleId=1, startPartition=0, endPartition=1
   > java.lang.RuntimeException: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - Server not found in Kerberos database)]; Host Details : local host is: "fv-az489-314/10.1.1.91"; destination host is: "localhost":37279; 
   > 	at org.apache.uniffle.storage.common.HdfsStorage.newWriteHandler(HdfsStorage.java:113)
   > 	at org.apache.uniffle.storage.common.AbstractStorage.lambda$getOrCreateWriteHandler$2(AbstractStorage.java:50)
   > 	at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
   > 	at org.apache.uniffle.storage.common.AbstractStorage.getOrCreateWriteHandler(AbstractStorage.java:50)
   > 	at org.apache.uniffle.server.ShuffleFlushManager.flushToFile(ShuffleFlushManager.java:168)
   > 	at org.apache.uniffle.server.ShuffleFlushManager.lambda$null$0(ShuffleFlushManager.java:100)
   > 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   > 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   > 	at java.lang.Thread.run(Thread.java:750)
   > Caused by: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - Server not found in Kerberos database)]; Host Details : local host is: "fv-az489-314/10.1.1.91"; destination host is: "localhost":37279; 
   > 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:782)
   > 	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1493)
   > 	at org.apache.hadoop.ipc.Client.call(Client.java:1435)
   > 	at org.apache.hadoop.ipc.Client.call(Client.java:1345)
   > 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
   > 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
   > 	at com.sun.proxy.$Proxy68.getFileInfo(Unknown Source)
   > 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:796)
   > 	at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
   > 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   > 	at java.lang.reflect.Method.invoke(Method.java:498)
   > 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
   > 	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
   > 	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
   > 	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
   > 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
   > 	at com.sun.proxy.$Proxy69.getFileInfo(Unknown Source)
   > 	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1649)
   > 	at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1440)
   > 	at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1437)
   > 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   > 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1437)
   > 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1437)
   > 	at org.apache.uniffle.storage.handler.impl.HdfsShuffleWriteHandler.initialize(HdfsShuffleWriteHandler.java:89)
   > 	at org.apache.uniffle.storage.handler.impl.HdfsShuffleWriteHandler.<init>(HdfsShuffleWriteHandler.java:81)
   > 	at org.apache.uniffle.storage.common.HdfsStorage.newWriteHandler(HdfsStorage.java:108)
   > 	... 8 more
   > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - Server not found in Kerberos database)]
   > 	at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:755)
   > 	at java.security.AccessController.doPrivileged(Native Method)
   > 	at javax.security.auth.Subject.doAs(Subject.java:422)
   > 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
   > 	at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:718)
   > 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:811)
   > 	at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410)
   > 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1550)
   > 	at org.apache.hadoop.ipc.Client.call(Client.java:1381)
   > 	... 31 more
   > Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - Server not found in Kerberos database)]
   > 	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
   > 	at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:406)
   > 	at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:614)
   > 	at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:410)
   > 	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:798)
   > 	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:794)
   > 	at java.security.AccessController.doPrivileged(Native Method)
   > 	at javax.security.auth.Subject.doAs(Subject.java:422)
   > 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
   > 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:793)
   > 	... 34 more
   > Caused by: GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - Server not found in Kerberos database)
   > 	at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:772)
   > 	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
   > 	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
   > 	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
   > 	... 43 more
   > Caused by: KrbException: Server not found in Kerberos database (7) - Server not found in Kerberos database
   > 	at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:73)
   > 	at sun.security.krb5.KrbTgsReq.getReply(KrbTgsReq.java:226)
   > 	at sun.security.krb5.KrbTgsReq.sendAndGetCreds(KrbTgsReq.java:237)
   > 	at sun.security.krb5.internal.CredentialsUtil.serviceCredsSingle(CredentialsUtil.java:477)
   > 	at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:340)
   > 	at sun.security.krb5.internal.CredentialsUtil.serviceCreds(CredentialsUtil.java:314)
   > 	at sun.security.krb5.internal.CredentialsUtil.acquireServiceCreds(CredentialsUtil.java:169)
   > 	at sun.security.krb5.Credentials.acquireServiceCreds(Credentials.java:490)
   > 	at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:695)
   > 	... 46 more
   > Caused by: KrbException: Identifier doesn't match expected value (906)
   > 	at sun.security.krb5.internal.KDCRep.init(KDCRep.java:140)
   > 	at sun.security.krb5.internal.TGSRep.init(TGSRep.java:65)
   > 	at sun.security.krb5.internal.TGSRep.<init>(TGSRep.java:60)
   > 	at sun.security.krb5.KrbTgsRep.<init>(KrbTgsRep.java:55)
   > 	... 54 more
   > ```
   
   It occurs again
   https://github.com/apache/incubator-uniffle/actions/runs/3111709653/jobs/5044304701
   This test is still flaky.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org