You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Tom Lee (Jira)" <ji...@apache.org> on 2019/12/02 01:16:00 UTC

[jira] [Commented] (FLINK-11259) Bump Zookeeper dependency to 3.4.13

    [ https://issues.apache.org/jira/browse/FLINK-11259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985739#comment-16985739 ] 

Tom Lee commented on FLINK-11259:
---------------------------------

[~fokko], continuing some investigation I started over in https://github.com/apache/flink/pull/7406.

Reproduced locally, now think HADOOP-15974 is unrelated.

With 3.4.14 and some fiddling with log4j config + the krb debug system prop I get this:

{code}
01:47:22,408 INFO  org.I0Itec.zkclient.ZkClient                                  - zookeeper state changed (SyncConnected)
>>> Credentials acquireServiceCreds: same realm
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 16 23.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes128CtsHmacSha1EType
>>> KrbKdcReq send: kdc=localhost TCP:59943, timeout=30000, number of retries =3, #bytes=582
>>> KDCCommunication: kdc=localhost TCP:59943, timeout=30000,Attempt =1, #bytes=582
01:47:22,453 WARN  org.apache.directory.server.protocol.shared.kerberos.StoreUtils  - No server entry found for kerberos principal name zookeeper/localhost@EXAMPLE.COM
01:47:22,453 WARN  org.apache.directory.server.KERBEROS_LOG                      - No server entry found for kerberos principal name zookeeper/localhost@EXAMPLE.COM
01:47:22,453 WARN  org.apache.directory.server.kerberos.protocol.KerberosProtocolHandler  - Server not found in Kerberos database (7)
01:47:22,453 WARN  org.apache.directory.server.KERBEROS_LOG                      - Server not found in Kerberos database (7)
>>>DEBUG: TCPClient reading 135 bytes
>>> KrbKdcReq send: #bytes read=135
>>> KdcAccessibility: remove localhost:59943
>>> KDCRep: init() encoding tag is 126 req type is 13
>>>KRBError:
	 sTime is Sun Dec 01 01:47:22 PST 2019 1575193642000
	 suSec is 0
	 error code is 7
	 error Message is Server not found in Kerberos database
	 sname is krbtgt/EXAMPLE.COM@EXAMPLE.COM
	 msgType is 30
01:47:22,455 ERROR org.apache.zookeeper.client.ZooKeeperSaslClient               - An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - Server not found in Kerberos database)]) occurred when evaluating Zookeeper Quorum Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED state.
01:47:22,455 ERROR org.apache.zookeeper.ClientCnxn                               - SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - Server not found in Kerberos database)]) occurred when evaluating Zookeeper Quorum Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED state.
01:47:22,455 INFO  org.I0Itec.zkclient.ZkClient                                  - zookeeper state changed (AuthFailed)
{code}


Same thing with 3.4.10 + logging changes/krb debug shows no references to *zookeeper/localhost@EXAMPLE.COM* and instead we see *zookeeper/127.0.0.1@EXAMPLE.COM*. Modifying {{SecureTestEnvironment.prepare()}} to use "localhost" (well, {{hostName}}) instead of "127.0.0.1" seems to address the issue in the broken tests in PR #7406 for me.

I'm happy to put together a patch for this but it may take me a week or two to clear some red tape. I'll get the wheels moving, but I'm far more interested in seeing this upgrade happen: the DNS improvements in 3.4.13+ are a big deal to us, so my feelings won't be hurt if somebody beats me to it. :)

Also suggest we rename the ticket to "Bump Zookeeper dependency to 3.4.14" since it's the latest + greatest -- or we could open another ticket I suppose.

> Bump Zookeeper dependency to 3.4.13
> -----------------------------------
>
>                 Key: FLINK-11259
>                 URL: https://issues.apache.org/jira/browse/FLINK-11259
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.7.1
>            Reporter: Fokko Driesprong
>            Assignee: Fokko Driesprong
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.3
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Bump Zookeeper to 3.4.13
> https://zookeeper.apache.org/doc/r3.4.13/releasenotes.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)