You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Moshe Elisha (Jira)" <ji...@apache.org> on 2022/07/18 13:01:00 UTC

[jira] [Commented] (FLINK-28171) Adjust Job and Task manager port definitions to work with Istio+mTLS

    [ https://issues.apache.org/jira/browse/FLINK-28171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567980#comment-17567980 ] 

Moshe Elisha commented on FLINK-28171:
--------------------------------------

Hi,

 

We will appreciate a reply. Istio mTLS is a hard requirement for us. We will be happy to propose a PullRequest but we would like to implement a solution you think is best.

 

From what I saw, the ports are generated in these code segments [InitJobManagerDecorator.java#L166|https://github.com/apache/flink/blob/master/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/decorators/InitJobManagerDecorator.java#L166] or [HeadlessClusterIPService.java#L43|https://github.com/apache/flink/blob/master/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/services/HeadlessClusterIPService.java#L43] or [KubernetesResourceManagerDriver.java#L251|https://github.com/apache/flink/blob/189f88485d75821fe285e61bbf6623e88aec24d3/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/KubernetesResourceManagerDriver.java#L251] or all.

 

One possible solution that is {*}very easy{*}, *still backward compatible* and can be *configured even when we use the Flink operator* is to have an environment variable - FLINK_ADD_APP_PROTOCOL_TO_PORTS and if true, the code will add "appProtocol" to the ports definition.

(Env var name is open for discussion :))


What do you think?

> Adjust Job and Task manager port definitions to work with Istio+mTLS
> --------------------------------------------------------------------
>
>                 Key: FLINK-28171
>                 URL: https://issues.apache.org/jira/browse/FLINK-28171
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.14.4
>         Environment: flink-kubernetes-operator 1.0.0
> Flink 1.14-java11
> Kubernetes v1.19.5
> Istio 1.7.6
>            Reporter: Moshe Elisha
>            Priority: Major
>
> Hello,
>  
> We are launching Flink deployments using the [Flink Kubernetes Operator|https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-stable/] on a Kubernetes cluster with Istio and mTLS enabled.
>  
> We found that the TaskManager is unable to communicate with the JobManager on the jobmanager-rpc port:
>  
> {{2022-06-15 15:25:40,508 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge|mailto:flink@amf-events-to-inference-and-central.nwdaf-edge]:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://[flink@amf-events-to-inference-and-central.nwdaf-edge|mailto:flink@amf-events-to-inference-and-central.nwdaf-edge]:6123]] Caused by: [The remote system explicitly disassociated (reason unknown).]}}
>  
> The reason for the issue is that the JobManager service port definitions are not following the Istio guidelines [https://istio.io/latest/docs/ops/configuration/traffic-management/protocol-selection/] (see example below).
>  
> There was also an email discussion around this topic in the users mailing group under the subject "Flink Kubernetes Operator with K8S + Istio + mTLS - port definitions".
> With the help of the community, we were able to work around the issue but it was very hard and forced us to skip Istio proxy which is not ideal.
>  
> We would like you to consider changing the default port definitions, either
>  # Rename the ports – I understand it is Istio specific guideline but maybe it is better to at least be aligned with one (popular) vendor guideline instead of none at all.
>  # Add the “appProtocol” property[1] that is not specific to any vendor but requires Kubernetes >= 1.19 where it was introduced as beta and moved to stable in >= 1.20. The option to add appProtocol property was added only in [https://github.com/fabric8io/kubernetes-client/releases/tag/v5.10.0] with [#3570|https://github.com/fabric8io/kubernetes-client/issues/3570].
>  # Or allow a way to override the defaults.
>  
> [https://kubernetes.io/docs/concepts/services-networking/_print/#application-protocol]
>  
>  
> {{# k get service inference-results-to-analytics-engine -o yaml}}
> {{apiVersion: v1}}
> {{kind: Service}}
> {{...}}
> {{spec:}}
> {{  clusterIP: None}}
> {{  ports:}}
> {{  - name: jobmanager-rpc *# should start with “tcp-“ or add "appProtocol" property*}}
> {{    port: 6123}}
> {{    protocol: TCP}}
> {{    targetPort: 6123}}
> {{  - name: blobserver *# should start with "tcp-" or add "appProtocol" property*}}
> {{    port: 6124}}
> {{    protocol: TCP}}
> {{    targetPort: 6124}}
> {{...}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)