You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by KristoffSC <kr...@gmail.com> on 2020/03/02 07:25:26 UTC

Re: How JobManager and TaskManager find each other?

Thanks about clarification for NAT,

Moving NAT issue aside for a moment",

Is the process of sending "task deployment descriptor" that you mentioned in
"Feb 26, 2020; 4:18pm" a specially the process of notifying TaskManager
about IP of participating TaskManagers in job described somewhere? I'm
familiar with [1] [2] but in there there is no information about sending the
IP information of Task managers.


Another question is how this all sums for Kubernetes Job Session Cluster
deployment when nodes will be deployed across many physical machines inside
Kubernetes cluster.
If I'm using Kubernetes like described in [3]

The final question would be, do I have to modify jobmanager.rpc.address and
flink/conf/slaves file when running Docker JobCluster on Kubernetes. The
default values are localhost. 
Or just following [3] will be fine?

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html
[2]
https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html
[3]
https://github.com/apache/flink/tree/release-1.10/flink-container/kubernetes



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: How JobManager and TaskManager find each other?

Posted by Yang Wang <da...@gmail.com>.

Hi KristoffSC,

Regarding your questions inline.

> 1. task deployment descriptor
The `TaskDeploymentDescriptor` is used by JobMaster to submit a task to
TaskManager.
Since the JobMaster knows all the operator and its location, it will put
the upstream operator location
in the `TaskDeploymentDescriptor`. So when the task is running, it always
know how to communicate
with others.

> 2. Kubernetes job cluster
When you deploy on Kubernetes, it is very different as NAT in PAAS. The
Kubernetes always has a
default overlay network. Each JobManager/TaskManager (i.e. Kubernetes Pod)
will be assigned with
a unique hostname and ip[1]. They could talk to each other directly. So you
do not need to set any
bind-host and bind-port.

> 3. Modify jobmanager.rpc.address
You need to create a Kubernetes service and set the
`jobmanager.rpc.address` to the service name.
This is used for the JobManager fault tolerance. When the JobManager failed
and relaunched again,
the TaskManager could still use the service name to re-register to
JobManager.
You do need to update conf/slaves and just follow the guide[2].


[1].
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
[2].
https://github.com/apache/flink/tree/release-1.10/flink-container/kubernetes


Best,
Yang

KristoffSC <kr...@gmail.com> 于2020年3月2日周一 下午3:25写道：

> Thanks about clarification for NAT,
>
> Moving NAT issue aside for a moment",
>
> Is the process of sending "task deployment descriptor" that you mentioned
> in
> "Feb 26, 2020; 4:18pm" a specially the process of notifying TaskManager
> about IP of participating TaskManagers in job described somewhere? I'm
> familiar with [1] [2] but in there there is no information about sending
> the
> IP information of Task managers.
>
>
> Another question is how this all sums for Kubernetes Job Session Cluster
> deployment when nodes will be deployed across many physical machines inside
> Kubernetes cluster.
> If I'm using Kubernetes like described in [3]
>
> The final question would be, do I have to modify jobmanager.rpc.address and
> flink/conf/slaves file when running Docker JobCluster on Kubernetes. The
> default values are localhost.
> Or just following [3] will be fine?
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-stable/concepts/runtime.html
> [2]
>
> https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html
> [3]
>
> https://github.com/apache/flink/tree/release-1.10/flink-container/kubernetes
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>