You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Edward Alexander Rojas Clavijo <ed...@gmail.com> on 2018/03/27 12:48:37 UTC

SSL config on Kubernetes - Dynamic IP

Hi all,

Currently I have a Flink 1.4 cluster running on kubernetes and with SSL
configuration based on
https://ci.apache.org/projects/flink/flink-docs-master/ops/security-ssl.html
.

However, as the IP of the nodes are dynamic (from the nature of
kubernetes), we are using only the DNS which we can control using
kubernetes services. So we add to the Subject Alternative Name(SAN) the
flink-jobmanager DNS and also the DNS for the task managers
*.flink-taskmanager-svc (each task manager has a DNS in the form
flink-taskmanager-0.flink-taskmanager-svc).

Additionally we set the jobmanager.rpc.address property on all the nodes
and each task manager sets the taskmanager.host property, all matching the
ones on the certificate.

This is working well when using Job with Parallelism set to 1. The SSL
validations are good and the Jobmanager can communicate with Task manager
and vice versa.

But when we set the parallelism to more than 1 we have exceptions on the
SSL validation like this:

Caused by: java.security.cert.CertificateException: No subject alternative
names matching IP address 172.30.247.163 found
at sun.security.util.HostnameChecker.matchIP(HostnameChecker.java:168)
at sun.security.util.HostnameChecker.match(HostnameChecker.java:94)
at
sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:455)
at
sun.security.ssl.X509TrustManagerImpl.checkIdentity(X509TrustManagerImpl.java:436)
at
sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:252)
at
sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:136)
at
sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1601)
... 21 more


From the logs I see the Jobmanager is correctly registering the
taskmanagers:

org.apache.flink.runtime.instance.InstanceManager   - Registered
TaskManager at flink-taskmanager-1
(akka.ssl.tcp://flink@taiga-flink-taskmanager-1.flink-taskmanager-svc.default.svc.cluster.local:6122/user/taskmanager)
as 1a3f59693cec8b3929ed8898edcc2700. Current number of registered hosts is
3. Current number of alive task slots is 6.

And also each taskmanager is correctly registered to use the hostname for
communication:

org.apache.flink.runtime.taskmanager.TaskManager   - TaskManager will use
hostname/address
'flink-taskmanager-1.flink-taskmanager-svc.default.svc.cluster.local'
(172.30.247.163) for communication.
...
akka.remote.Remoting   - Remoting started; listening on addresses
:[akka.ssl.tcp://flink@flink-taskmanager-1.flink-taskmanager-svc.default.svc.cluster.local
:6122]
...
org.apache.flink.runtime.io.network.netty.NettyConfig   - NettyConfig
[server address:
flink-taskmanager-1.flink-taskmanager-svc.default.svc.cluster.local/
172.30.247.163, server port: 6121, ssl enabled: true, memory segment size
(bytes): 32768, transport type: NIO, number of server threads: 2 (manual),
number of client threads: 2 (manual), server connect backlog: 0 (use
Netty's default), client connect timeout (sec): 120, send/receive buffer
size (bytes): 0 (use Netty's default)]
...
org.apache.flink.runtime.taskmanager.TaskManager   - TaskManager data
connection information: bf4a9b50e57c99c17049adb66d65f685 @
flink-taskmanager-1.flink-taskmanager-svc.default.svc.cluster.local
(dataPort=6121)



But even with that, it seems like the taskmanagers are using the IP
communicate between them and the SSL validation fails.

Do you know if it's possible to make the taskmanagers to use the hostname
to communicate instead of the IP ?
or
Do you have any advice to get the SSL configuration to work on this
environment ?

Thanks in advance.

Regards,
Edward