You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/03/29 15:40:00 UTC

[jira] [Commented] (FLINK-9103) SSL verification on TaskManager when parallelism > 1

    [ https://issues.apache.org/jira/browse/FLINK-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419213#comment-16419213 ] 

ASF GitHub Bot commented on FLINK-9103:
---------------------------------------

GitHub user EAlexRojas opened a pull request:

    https://github.com/apache/flink/pull/5789

    [FLINK-9103] Using CanonicalHostName instead of IP for SSL connection on NettyClient

    ## What is the purpose of the change
    
    This pull request makes the NettyClient use the CanonicalHostName instead of the IP address for SSL communication. That way dynamic environments like kubernetes can be fully supported as certificates with wildcard DNS can be used.
    
    
    ## Brief change log
    
    - Use CanonicalHostName instead of HostNameAddress to identify the server on the NettyClient
    
    
    ## Verifying this change
    
    This change is already covered by existing tests, such as:
    
    NettyClientServerSslTest (org.apache.flink.runtime.io.network.netty)
       - testValidSslConnection
       - testSslHandshakeError 
    
    Also manually verified the change by running a 4 node kubernetes cluster with 1 JobManagers and 3 TaskManagers, using wildcard DNS certificates and executing a stateful streaming program with parallelism set to 2 and verifying that all nodes are able to communicate to each other successfully. 
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency):  no
      - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
      - The serializers: no
      - The runtime per-record code paths (performance sensitive): no
      - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
      - The S3 file system connector: no
    
    ## Documentation
    
      - Does this pull request introduce a new feature? no
      - If yes, how is the feature documented? not applicable


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/EAlexRojas/flink release-1.4

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5789.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5789
    
----
commit 202672da7901fe7df912e6a057d6d0c29ccaf0fd
Author: EAlexRojas <al...@...>
Date:   2018-03-29T14:01:24Z

    Using CanonicalHostName instead of IP for SSL coonection on NettyClient

----


> SSL verification on TaskManager when parallelism > 1
> ----------------------------------------------------
>
>                 Key: FLINK-9103
>                 URL: https://issues.apache.org/jira/browse/FLINK-9103
>             Project: Flink
>          Issue Type: Bug
>          Components: Docker, Security
>    Affects Versions: 1.4.0
>            Reporter: Edward Rojas
>            Priority: Major
>         Attachments: job.log, task0.log
>
>
> In dynamic environments like Kubernetes, the SSL certificates can be generated to use only the DNS addresses for validation of the identity of servers, given that the IP can change eventually.
>  
> In this cases when executing Jobs with Parallelism set to 1, the SSL validations are good and the Jobmanager can communicate with Task manager and vice versa.
>  
> But with parallelism set to more than 1, SSL validation fails when Task Managers communicate to each other as it seems to try to validate against IP address:
> Caused by: java.security.cert.CertificateException: No subject alternative names matching IP address 172.xx.xxx.xxx found 
> at sun.security.util.HostnameChecker.matchIP(HostnameChecker.java:168) 
> at sun.security.util.HostnameChecker.match(HostnameChecker.java:94) 
> at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:455) 
> at sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:436) 
> at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:252) 
> at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136) 
> at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1601) 
> ... 21 more 
>  
> From the logs, it seems the task managers register successfully its full address to Netty, but still the IP is used.
>  
> Attached pertinent logs from JobManager and a TaskManager. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)