You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Moshe Elisha (Jira)" <ji...@apache.org> on 2022/07/18 13:01:00 UTC
[jira] [Commented] (FLINK-28171) Adjust Job and Task manager port definitions to work with Istio+mTLS
[ https://issues.apache.org/jira/browse/FLINK-28171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567980#comment-17567980 ]
Moshe Elisha commented on FLINK-28171:
--------------------------------------
Hi,
We will appreciate a reply. Istio mTLS is a hard requirement for us. We will be happy to propose a PullRequest but we would like to implement a solution you think is best.
From what I saw, the ports are generated in these code segments [InitJobManagerDecorator.java#L166|https://github.com/apache/flink/blob/master/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/decorators/InitJobManagerDecorator.java#L166] or [HeadlessClusterIPService.java#L43|https://github.com/apache/flink/blob/master/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/services/HeadlessClusterIPService.java#L43] or [KubernetesResourceManagerDriver.java#L251|https://github.com/apache/flink/blob/189f88485d75821fe285e61bbf6623e88aec24d3/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesResourceManagerDriver.java#L251] or all.
One possible solution that is {*}very easy{*}, *still backward compatible* and can be *configured even when we use the Flink operator* is to have an environment variable - FLINK_ADD_APP_PROTOCOL_TO_PORTS and if true, the code will add "appProtocol" to the ports definition.
(Env var name is open for discussion :))
What do you think?
> Adjust Job and Task manager port definitions to work with Istio+mTLS
> --------------------------------------------------------------------
>
> Key: FLINK-28171
> URL: https://issues.apache.org/jira/browse/FLINK-28171
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes
> Affects Versions: 1.14.4
> Environment: flink-kubernetes-operator 1.0.0
> Flink 1.14-java11
> Kubernetes v1.19.5
> Istio 1.7.6
> Reporter: Moshe Elisha
> Priority: Major
>
> Hello,
>
> We are launching Flink deployments using the [Flink Kubernetes Operator|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/] on a Kubernetes cluster with Istio and mTLS enabled.
>
> We found that the TaskManager is unable to communicate with the JobManager on the jobmanager-rpc port:
>
> {{2022-06-15 15:25:40,508 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge|mailto:flink@amf-events-to-inference-and-central.nwdaf-edge]:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge|mailto:flink@amf-events-to-inference-and-central.nwdaf-edge]:6123]] Caused by: [The remote system explicitly disassociated (reason unknown).]}}
>
> The reason for the issue is that the JobManager service port definitions are not following the Istio guidelines [https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/] (see example below).
>
> There was also an email discussion around this topic in the users mailing group under the subject "Flink Kubernetes Operator with K8S + Istio + mTLS - port definitions".
> With the help of the community, we were able to work around the issue but it was very hard and forced us to skip Istio proxy which is not ideal.
>
> We would like you to consider changing the default port definitions, either
> # Rename the ports – I understand it is Istio specific guideline but maybe it is better to at least be aligned with one (popular) vendor guideline instead of none at all.
> # Add the “appProtocol” property[1] that is not specific to any vendor but requires Kubernetes >= 1.19 where it was introduced as beta and moved to stable in >= 1.20. The option to add appProtocol property was added only in [https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0] with [#3570|https://github.com/fabric8io/kubernetes-client/issues/3570].
> # Or allow a way to override the defaults.
>
> [https://kubernetes.io/docs/concepts/services-networking/_print/#application-protocol]
>
>
> {{# k get service inference-results-to-analytics-engine -o yaml}}
> {{apiVersion: v1}}
> {{kind: Service}}
> {{...}}
> {{spec:}}
> {{ clusterIP: None}}
> {{ ports:}}
> {{ - name: jobmanager-rpc *# should start with “tcp-“ or add "appProtocol" property*}}
> {{ port: 6123}}
> {{ protocol: TCP}}
> {{ targetPort: 6123}}
> {{ - name: blobserver *# should start with "tcp-" or add "appProtocol" property*}}
> {{ port: 6124}}
> {{ protocol: TCP}}
> {{ targetPort: 6124}}
> {{...}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)