You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by art <Su...@163.com> on 2020/09/02 07:49:43 UTC

Fail to deploy Flink on minikube

Hi，I’m going to deploy flink on minikube referring to https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html <https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html>;
kubectl create -f flink-configuration-configmap.yaml
kubectl create -f jobmanager-service.yaml
kubectl create -f jobmanager-session-deployment.yaml
kubectl create -f taskmanager-session-deployment.yaml

But I got this

2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name resolution]
2020-09-02 06:45:42,691 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2020-09-02 06:46:02,731 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor                     [] - No response from remote for outbound association. Associate timed out after [20000 ms]. 

And when I run the command 'kubectl exec -ti flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’ , I find I cannot ping flink-jobmanager from taskmanager

I am new to k8s, can anyone give me some tutorial? Thanks a lot !

Re: Fail to deploy Flink on minikube

Posted by Till Rohrmann <tr...@apache.org>.

Great to hear that it works on K8s and letting us know that the problem is
likely to be caused by Minikube.

Cheers,
Till

On Fri, Sep 4, 2020 at 8:53 AM superainbower <su...@163.com> wrote:

> Hi Till & Yang,
> I can deploy Flink on kubernetes(not minikube), it works well
> So there are some problem about my minikube but I can’t find and fix it
> Anyway I can deploy on k8s now
> Thanks for your help!
> superainbower
> superainbower@163.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
> On 09/3/2020 15:47，Till Rohrmann<tr...@apache.org>
> <tr...@apache.org> wrote：
>
> In order to exclude a Minikube problem, you could also try to run Flink on
> an older Minikube and an older K8s version. Our end-to-end tests use
> Minikube v1.8.2, for example.
>
> Cheers,
> Till
>
> On Thu, Sep 3, 2020 at 8:44 AM Yang Wang <da...@gmail.com> wrote:
>
>> Sorry i forget that the JobManager is binding its rpc address to
>> flink-jobmanager, not the ip address.
>> So you need to also update the jobmanager-session-deployment.yaml with
>> following changes.
>>
>> ...
>>       containers:
>>       - name: jobmanager
>>         env:
>>         - name: JM_IP
>>           valueFrom:
>>             fieldRef:
>>               apiVersion: v1
>>               fieldPath: status.podIP
>>         image: flink:1.11
>>         args: ["jobmanager", "$(JM_IP)"]
>> ...
>>
>> After then the JobManager is binding the rpc address with its ip.
>>
>> Best,
>> Yang
>>
>>
>> superainbower <su...@163.com> 于2020年9月3日周四 上午11:38写道：
>>
>>> HI Yang,
>>> I update taskmanager-session-deployment.yaml like this:
>>>
>>> apiVersion: apps/v1
>>> kind: Deployment
>>> metadata:
>>>   name: flink-taskmanager
>>> spec:
>>>   replicas: 1
>>>   selector:
>>>     matchLabels:
>>>       app: flink
>>>       component: taskmanager
>>>   template:
>>>     metadata:
>>>       labels:
>>>         app: flink
>>>         component: taskmanager
>>>     spec:
>>>       containers:
>>>       - name: taskmanager
>>>         image:
>>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>>>         args: ["taskmanager","-Djobmanager.rpc.address=172.18.0.5"]
>>>         ports:
>>>         - containerPort: 6122
>>>           name: rpc
>>>         - containerPort: 6125
>>>           name: query-state
>>>         livenessProbe:
>>>           tcpSocket:
>>>             port: 6122
>>>           initialDelaySeconds: 30
>>>           periodSeconds: 60
>>>         volumeMounts:
>>>         - name: flink-config-volume
>>>           mountPath: /opt/flink/conf/
>>>         securityContext:
>>>           runAsUser: 9999  # refers to user _flink_ from official flink
>>> image, change if necessary
>>>       volumes:
>>>       - name: flink-config-volume
>>>         configMap:
>>>           name: flink-config
>>>           items:
>>>           - key: flink-conf.yaml
>>>             path: flink-conf.yaml
>>>           - key: log4j-console.properties
>>>             path: log4j-console.properties
>>>       imagePullSecrets:
>>>         - name: regcred
>>>
>>> And Delete the TaskManager pod and restart it , but the logs print this
>>>
>>> Could not resolve ResourceManager address akka.tcp://
>>> flink@172.18.0.5:6123/user/rpc/resourcemanager_*, retrying in 10000 ms:
>>> Could not connect to rpc endpoint under address akka.tcp://
>>> flink@172.18.0.5:6123/user/rpc/resourcemanager_*
>>>
>>> It change flink-jobmanager to 172.18.0.5
>>> superainbower
>>> superainbower@163.com
>>>
>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
>>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>>
>>> On 09/3/2020 11:09，Yang Wang<da...@gmail.com>
>>> <da...@gmail.com> wrote：
>>>
>>> I guess something is wrong with your kube proxy, which causes
>>> TaskManager could not connect to JobManager.
>>> You could verify this by directly using JobManager Pod ip instead of
>>> service name.
>>>
>>> Please do as follows.
>>> * Edit the TaskManager deployment(via kubectl edit flink-taskmanager)
>>> and update the args field to the following.
>>>    args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"]
>>> Given that "172.18.0.5" is the JobManager pod ip.
>>> * Delete the current TaskManager pod and let restart again
>>> * Now check the TaskManager logs to check whether it could register
>>> successfully
>>>
>>>
>>>
>>> Best,
>>> Yang
>>>
>>> superainbower <su...@163.com> 于2020年9月3日周四 上午9:35写道：
>>>
>>>> Hi Till,
>>>> I find something may be helpful.
>>>> The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager
>>>> ip 172.18.0.6
>>>> When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn
>>>> -- /bin/bash’ && ‘ping 172.18.0.5’
>>>> I can get response
>>>> But when I ping flink-jobmanager ,there is no response
>>>>
>>>> superainbower
>>>> superainbower@163.com
>>>>
>>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
>>>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>>>
>>>> On 09/3/2020 09:03，superainbower<su...@163.com>
>>>> <su...@163.com> wrote：
>>>>
>>>> Hi Till,
>>>> This is the taskManager log
>>>> As you see, the logs print  ‘line 92 -- Could not connect to
>>>> flink-jobmanager:6123’
>>>> then print ‘line 128 --Could not resolve ResourceManager address
>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’
>>>> And repeat print this
>>>>
>>>> A few minutes later, the taskmanger shut down and restart
>>>>
>>>> This is my yaml files, could u help me to confirm did I
>>>> omitted something? Thanks a lot!
>>>> ---------------------------------------------------
>>>> flink-configuration-configmap.yaml
>>>> apiVersion: v1
>>>> kind: ConfigMap
>>>> metadata:
>>>>   name: flink-config
>>>>   labels:
>>>>     app: flink
>>>> data:
>>>>   flink-conf.yaml: |+
>>>>     jobmanager.rpc.address: flink-jobmanager
>>>>     taskmanager.numberOfTaskSlots: 1
>>>>     blob.server.port: 6124
>>>>     jobmanager.rpc.port: 6123
>>>>     taskmanager.rpc.port: 6122
>>>>     queryable-state.proxy.ports: 6125
>>>>     jobmanager.memory.process.size: 1024m
>>>>     taskmanager.memory.process.size: 1024m
>>>>     parallelism.default: 1
>>>>   log4j-console.properties: |+
>>>>     rootLogger.level = INFO
>>>>     rootLogger.appenderRef.console.ref = ConsoleAppender
>>>>     rootLogger.appenderRef.rolling.ref = RollingFileAppender
>>>>     logger.akka.name = akka
>>>>     logger.akka.level = INFO
>>>>     logger.kafka.name= org.apache.kafka
>>>>     logger.kafka.level = INFO
>>>>     logger.hadoop.name = org.apache.hadoop
>>>>     logger.hadoop.level = INFO
>>>>     logger.zookeeper.name = org.apache.zookeeper
>>>>     logger.zookeeper.level = INFO
>>>>     appender.console.name = ConsoleAppender
>>>>     appender.console.type = CONSOLE
>>>>     appender.console.layout.type = PatternLayout
>>>>     appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p
>>>> %-60c %x - %m%n
>>>>     appender.rolling.name = RollingFileAppender
>>>>     appender.rolling.type = RollingFile
>>>>     appender.rolling.append = false
>>>>     appender.rolling.fileName = ${sys:log.file}
>>>>     appender.rolling.filePattern = ${sys:log.file}.%i
>>>>     appender.rolling.layout.type = PatternLayout
>>>>     appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p
>>>> %-60c %x - %m%n
>>>>     appender.rolling.policies.type = Policies
>>>>     appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
>>>>     appender.rolling.policies.size.size=100MB
>>>>     appender.rolling.strategy.type = DefaultRolloverStrategy
>>>>     appender.rolling.strategy.max = 10
>>>>     logger.netty.name =
>>>> org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
>>>>     logger.netty.level = OFF
>>>> ---------------------------------------------------
>>>> jobmanager-service.yaml
>>>> apiVersion: v1
>>>> kind: Service
>>>> metadata:
>>>>   name: flink-jobmanager
>>>> spec:
>>>>   type: ClusterIP
>>>>   ports:
>>>>   - name: rpc
>>>>     port: 6123
>>>>   - name: blob-server
>>>>     port: 6124
>>>>   - name: webui
>>>>     port: 8081
>>>>   selector:
>>>>     app: flink
>>>>     component: jobmanager
>>>> --------------------------------------------------
>>>> jobmanager-session-deployment.yaml
>>>> apiVersion: apps/v1
>>>> kind: Deployment
>>>> metadata:
>>>>   name: flink-jobmanager
>>>> spec:
>>>>   replicas: 1
>>>>   selector:
>>>>     matchLabels:
>>>>       app: flink
>>>>       component: jobmanager
>>>>   template:
>>>>     metadata:
>>>>       labels:
>>>>         app: flink
>>>>         component: jobmanager
>>>>     spec:
>>>>       containers:
>>>>       - name: jobmanager
>>>>         image:
>>>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>>>>         args: ["jobmanager"]
>>>>         ports:
>>>>         - containerPort: 6123
>>>>           name: rpc
>>>>         - containerPort: 6124
>>>>           name: blob-server
>>>>         - containerPort: 8081
>>>>           name: webui
>>>>         livenessProbe:
>>>>           tcpSocket:
>>>>             port: 6123
>>>>           initialDelaySeconds: 30
>>>>           periodSeconds: 60
>>>>         volumeMounts:
>>>>         - name: flink-config-volume
>>>>           mountPath: /opt/flink/conf
>>>>         securityContext:
>>>>           runAsUser: 9999  # refers to user _flink_ from official flink
>>>> image, change if necessary
>>>>       volumes:
>>>>       - name: flink-config-volume
>>>>         configMap:
>>>>           name: flink-config
>>>>           items:
>>>>           - key: flink-conf.yaml
>>>>             path: flink-conf.yaml
>>>>           - key: log4j-console.properties
>>>>             path: log4j-console.properties
>>>>       imagePullSecrets:
>>>>         - name: regcred
>>>> ---------------------------------------------------
>>>> taskmanager-session-deployment.yaml
>>>> apiVersion: apps/v1
>>>> kind: Deployment
>>>> metadata:
>>>>   name: flink-taskmanager
>>>> spec:
>>>>   replicas: 1
>>>>   selector:
>>>>     matchLabels:
>>>>       app: flink
>>>>       component: taskmanager
>>>>   template:
>>>>     metadata:
>>>>       labels:
>>>>         app: flink
>>>>         component: taskmanager
>>>>     spec:
>>>>       containers:
>>>>       - name: taskmanager
>>>>         image:
>>>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>>>>         args: ["taskmanager"]
>>>>         ports:
>>>>         - containerPort: 6122
>>>>           name: rpc
>>>>         - containerPort: 6125
>>>>           name: query-state
>>>>         livenessProbe:
>>>>           tcpSocket:
>>>>             port: 6122
>>>>           initialDelaySeconds: 30
>>>>           periodSeconds: 60
>>>>         volumeMounts:
>>>>         - name: flink-config-volume
>>>>           mountPath: /opt/flink/conf/
>>>>         securityContext:
>>>>           runAsUser: 9999  # refers to user _flink_ from official flink
>>>> image, change if necessary
>>>>       volumes:
>>>>       - name: flink-config-volume
>>>>         configMap:
>>>>           name: flink-config
>>>>           items:
>>>>           - key: flink-conf.yaml
>>>>             path: flink-conf.yaml
>>>>           - key: log4j-console.properties
>>>>             path: log4j-console.properties
>>>>       imagePullSecrets:
>>>>         - name: regcred
>>>>
>>>>
>>>> superainbower
>>>> superainbower@163.com
>>>>
>>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
>>>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>>>
>>>> On 09/2/2020 20:38，Till Rohrmann<tr...@apache.org>
>>>> <tr...@apache.org> wrote：
>>>>
>>>> Hmm, this is indeed strange. Could you share the logs of the
>>>> TaskManager with us? Ideally you set the log level to debug. Thanks a lot.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Wed, Sep 2, 2020 at 12:45 PM art <Su...@163.com> wrote:
>>>>
>>>>> Hi Till,
>>>>>
>>>>> The full information when I run command ' kubectl get all’  like this:
>>>>>
>>>>> NAME                                     READY   STATUS    RESTARTS
>>>>> AGE
>>>>> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0
>>>>>  2m34s
>>>>> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0
>>>>>  2m34s
>>>>>
>>>>> NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP
>>>>> PORT(S)                      AGE
>>>>> service/flink-jobmanager   ClusterIP   10.103.207.75   <none>
>>>>>  6123/TCP,6124/TCP,8081/TCP   2m34s
>>>>> service/kubernetes         ClusterIP   10.96.0.1       <none>
>>>>>  443/TCP                      5d2h
>>>>>
>>>>> NAME                                READY   UP-TO-DATE   AVAILABLE
>>>>> AGE
>>>>> deployment.apps/flink-jobmanager    1/1     1            1
>>>>> 2m34s
>>>>> deployment.apps/flink-taskmanager   1/1     1            1
>>>>> 2m34s
>>>>>
>>>>> NAME                                           DESIRED   CURRENT
>>>>> READY   AGE
>>>>> replicaset.apps/flink-jobmanager-85bdbd98d8    1         1         1
>>>>>     2m34s
>>>>> replicaset.apps/flink-taskmanager-74c68c6f48   1         1         1
>>>>>     2m34s
>>>>>
>>>>> And I can open flink ui but the task manger is 0 ,so the job manger is
>>>>> work well
>>>>> I think the problem is taksmanger can not register itself to
>>>>> jobmanger,  did I miss some configure?
>>>>>
>>>>>
>>>>> 在 2020年9月2日，下午5:24，Till Rohrmann <tr...@apache.org> 写道：
>>>>>
>>>>> Hi art,
>>>>>
>>>>> could you check what `kubectl get services` returns? Usually if you
>>>>> run `kubectl get all` you should also see the services. But in your case
>>>>> there are no services listed. You have see something like
>>>>> service/flink-jobmanager otherwise the flink-jobmanager service (K8s
>>>>> service) is not running.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Wed, Sep 2, 2020 at 11:15 AM art <Su...@163.com> wrote:
>>>>>
>>>>>> Hi Till,
>>>>>>
>>>>>> I’m sure the job manager-service is started, I can find it in
>>>>>> Kubernetes DashBoard
>>>>>>
>>>>>> When I run command ' kubectl get deployment’ I can got this:
>>>>>> flink-jobmanager    1/1     1            1           33s
>>>>>> flink-taskmanager   1/1     1            1           33s
>>>>>>
>>>>>> When I run command ' kubectl get all’ I can got this:
>>>>>> NAME                                     READY   STATUS    RESTARTS
>>>>>> AGE
>>>>>> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0
>>>>>>  2m34s
>>>>>> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0
>>>>>>  2m34s
>>>>>>
>>>>>> So, I think flink-jobmanager works well, but taskmannger is restarted
>>>>>> every few minutes
>>>>>>
>>>>>> My minikube version: v1.12.3
>>>>>> Flink version:v1.11.1
>>>>>>
>>>>>> 在 2020年9月2日，下午4:27，Till Rohrmann <tr...@apache.org> 写道：
>>>>>>
>>>>>> Hi art,
>>>>>>
>>>>>> could you verify that the jobmanager-service has been started? It
>>>>>> looks as if the name flink-jobmanager is not resolvable. It could also help
>>>>>> to know the Minikube and K8s version you are using.
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>> On Wed, Sep 2, 2020 at 9:50 AM art <Su...@163.com> wrote:
>>>>>>
>>>>>>> Hi，I’m going to deploy flink on minikube referring to
>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html
>>>>>>> ;
>>>>>>> kubectl create -f flink-configuration-configmap.yaml
>>>>>>> kubectl create -f jobmanager-service.yaml
>>>>>>> kubectl create -f jobmanager-session-deployment.yaml
>>>>>>> kubectl create -f taskmanager-session-deployment.yaml
>>>>>>>
>>>>>>> But I got this
>>>>>>>
>>>>>>> 2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor
>>>>>>>                       [] - Association with remote system [
>>>>>>> akka.tcp://flink@flink-jobmanager:6123] has failed, address is now
>>>>>>> gated for [50] ms. Reason: [Association failed with [
>>>>>>> akka.tcp://flink@flink-jobmanager:6123]] Caused by:
>>>>>>> [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name
>>>>>>> resolution]
>>>>>>> 2020-09-02 06:45:42,691 INFO
>>>>>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>>>>>>> not resolve ResourceManager address
>>>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>>>>>> 2020-09-02 06:46:02,731 INFO
>>>>>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>>>>>>> not resolve ResourceManager address
>>>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>>>>>> 2020-09-02 06:46:12,731 INFO
>>>>>>>  akka.remote.transport.ProtocolStateActor                     [] - No
>>>>>>> response from remote for outbound association. Associate timed out after
>>>>>>> [20000 ms].
>>>>>>>
>>>>>>> And when I run the command 'kubectl exec -ti
>>>>>>> flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’
>>>>>>> , I find I cannot ping flink-jobmanager from taskmanager
>>>>>>>
>>>>>>> I am new to k8s, can anyone give me some tutorial? Thanks a lot !
>>>>>>>
>>>>>>
>>>>>>
>>>>>

Re: Fail to deploy Flink on minikube

Posted by superainbower <su...@163.com>.

Hi Till & Yang,
I can deploy Flink on kubernetes(not minikube), it works well
So there are some problem about my minikube but I can’t find and fix it
Anyway I can deploy on k8s now
Thanks for your help!
| |
superainbower
|
|
superainbower@163.com
|
签名由网易邮箱大师定制


On 09/3/2020 15:47，Till Rohrmann<tr...@apache.org> wrote：
In order to exclude a Minikube problem, you could also try to run Flink on an older Minikube and an older K8s version. Our end-to-end tests use Minikube v1.8.2, for example.


Cheers,
Till


On Thu, Sep 3, 2020 at 8:44 AM Yang Wang <da...@gmail.com> wrote:

Sorry i forget that the JobManager is binding its rpc address to flink-jobmanager, not the ip address.
So you need to also update the jobmanager-session-deployment.yaml with following changes.



...
      containers:
      - name: jobmanager
        env:
        - name: JM_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: flink:1.11
        args: ["jobmanager", "$(JM_IP)"]
...


After then the JobManager is binding the rpc address with its ip.


Best,
Yang





superainbower <su...@163.com> 于2020年9月3日周四 上午11:38写道：

HI Yang,
I update taskmanager-session-deployment.yaml like this:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: flink-taskmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flink
      component: taskmanager
  template:
    metadata:
      labels:
        app: flink
        component: taskmanager
    spec:
      containers:
      - name: taskmanager
        image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
        args: ["taskmanager","-Djobmanager.rpc.address=172.18.0.5"]
        ports:
        - containerPort: 6122
          name: rpc
        - containerPort: 6125
          name: query-state
        livenessProbe:
          tcpSocket:
            port: 6122
          initialDelaySeconds: 30
          periodSeconds: 60
        volumeMounts:
        - name: flink-config-volume
          mountPath: /opt/flink/conf/
        securityContext:
          runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary
      volumes:
      - name: flink-config-volume
        configMap:
          name: flink-config
          items:
          - key: flink-conf.yaml
            path: flink-conf.yaml
          - key: log4j-console.properties
            path: log4j-console.properties
      imagePullSecrets:
        - name: regcred


And Delete the TaskManager pod and restart it , but the logs print this


Could not resolve ResourceManager address akka.tcp://flink@172.18.0.5:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@172.18.0.5:6123/user/rpc/resourcemanager_*


It change flink-jobmanager to 172.18.0.5 
| |
superainbower
|
|
superainbower@163.com
|
签名由网易邮箱大师定制


On 09/3/2020 11:09，Yang Wang<da...@gmail.com> wrote：
I guess something is wrong with your kube proxy, which causes TaskManager could not connect to JobManager.
You could verify this by directly using JobManager Pod ip instead of service name.


Please do as follows.
* Edit the TaskManager deployment(via kubectl edit flink-taskmanager) and update the args field to the following.
   args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"]    Given that "172.18.0.5" is the JobManager pod ip.
* Delete the current TaskManager pod and let restart again
* Now check the TaskManager logs to check whether it could register successfully






Best,
Yang


superainbower <su...@163.com> 于2020年9月3日周四 上午9:35写道：

Hi Till,
I find something may be helpful.
The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip 172.18.0.6
When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn -- /bin/bash’ && ‘ping 172.18.0.5’ 
I can get response
But when I ping flink-jobmanager ,there is no response


| |
superainbower
|
|
superainbower@163.com
|
签名由网易邮箱大师定制


On 09/3/2020 09:03，superainbower<su...@163.com> wrote：
Hi Till,
This is the taskManager log
As you see, the logs print  ‘line 92 -- Could not connect to flink-jobmanager:6123’
then print ‘line 128 --Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’   And repeat print this


A few minutes later, the taskmanger shut down and restart


This is my yaml files, could u help me to confirm did I omitted something? Thanks a lot!
---------------------------------------------------
flink-configuration-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: flink-config
  labels:
    app: flink
data:
  flink-conf.yaml: |+
    jobmanager.rpc.address: flink-jobmanager
    taskmanager.numberOfTaskSlots: 1
    blob.server.port: 6124
    jobmanager.rpc.port: 6123
    taskmanager.rpc.port: 6122
    queryable-state.proxy.ports: 6125
    jobmanager.memory.process.size: 1024m
    taskmanager.memory.process.size: 1024m
    parallelism.default: 1
  log4j-console.properties: |+
    rootLogger.level = INFO
    rootLogger.appenderRef.console.ref = ConsoleAppender
    rootLogger.appenderRef.rolling.ref = RollingFileAppender
    logger.akka.name = akka
    logger.akka.level = INFO
    logger.kafka.name= org.apache.kafka
    logger.kafka.level = INFO
    logger.hadoop.name = org.apache.hadoop
    logger.hadoop.level = INFO
    logger.zookeeper.name = org.apache.zookeeper
    logger.zookeeper.level = INFO
    appender.console.name = ConsoleAppender
    appender.console.type = CONSOLE
    appender.console.layout.type = PatternLayout
    appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
    appender.rolling.name = RollingFileAppender
    appender.rolling.type = RollingFile
    appender.rolling.append = false
    appender.rolling.fileName = ${sys:log.file}
    appender.rolling.filePattern = ${sys:log.file}.%i
    appender.rolling.layout.type = PatternLayout
    appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
    appender.rolling.policies.type = Policies
    appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
    appender.rolling.policies.size.size=100MB
    appender.rolling.strategy.type = DefaultRolloverStrategy
    appender.rolling.strategy.max = 10
    logger.netty.name = org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
    logger.netty.level = OFF
---------------------------------------------------
jobmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: flink-jobmanager
spec:
  type: ClusterIP
  ports:
  - name: rpc
    port: 6123
  - name: blob-server
    port: 6124
  - name: webui
    port: 8081
  selector:
    app: flink
    component: jobmanager
--------------------------------------------------
jobmanager-session-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flink-jobmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flink
      component: jobmanager
  template:
    metadata:
      labels:
        app: flink
        component: jobmanager
    spec:
      containers:
      - name: jobmanager
        image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
        args: ["jobmanager"]
        ports:
        - containerPort: 6123
          name: rpc
        - containerPort: 6124
          name: blob-server
        - containerPort: 8081
          name: webui
        livenessProbe:
          tcpSocket:
            port: 6123
          initialDelaySeconds: 30
          periodSeconds: 60
        volumeMounts:
        - name: flink-config-volume
          mountPath: /opt/flink/conf
        securityContext:
          runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary
      volumes:
      - name: flink-config-volume
        configMap:
          name: flink-config
          items:
          - key: flink-conf.yaml
            path: flink-conf.yaml
          - key: log4j-console.properties
            path: log4j-console.properties
      imagePullSecrets:
        - name: regcred
---------------------------------------------------
taskmanager-session-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flink-taskmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flink
      component: taskmanager
  template:
    metadata:
      labels:
        app: flink
        component: taskmanager
    spec:
      containers:
      - name: taskmanager
        image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
        args: ["taskmanager"]
        ports:
        - containerPort: 6122
          name: rpc
        - containerPort: 6125
          name: query-state
        livenessProbe:
          tcpSocket:
            port: 6122
          initialDelaySeconds: 30
          periodSeconds: 60
        volumeMounts:
        - name: flink-config-volume
          mountPath: /opt/flink/conf/
        securityContext:
          runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary
      volumes:
      - name: flink-config-volume
        configMap:
          name: flink-config
          items:
          - key: flink-conf.yaml
            path: flink-conf.yaml
          - key: log4j-console.properties
            path: log4j-console.properties
      imagePullSecrets:
        - name: regcred
       


| |
superainbower
|
|
superainbower@163.com
|
签名由网易邮箱大师定制


On 09/2/2020 20:38，Till Rohrmann<tr...@apache.org> wrote：
Hmm, this is indeed strange. Could you share the logs of the TaskManager with us? Ideally you set the log level to debug. Thanks a lot.


Cheers,
Till


On Wed, Sep 2, 2020 at 12:45 PM art <Su...@163.com> wrote:

Hi Till,
  
The full information when I run command ' kubectl get all’  like this:


NAME                                     READY   STATUS    RESTARTS   AGE
pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s
pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s


NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/flink-jobmanager   ClusterIP   10.103.207.75   <none>        6123/TCP,6124/TCP,8081/TCP   2m34s
service/kubernetes         ClusterIP   10.96.0.1       <none>        443/TCP                      5d2h


NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/flink-jobmanager    1/1     1            1           2m34s
deployment.apps/flink-taskmanager   1/1     1            1           2m34s


NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/flink-jobmanager-85bdbd98d8    1         1         1       2m34s
replicaset.apps/flink-taskmanager-74c68c6f48   1         1         1       2m34s


And I can open flink ui but the task manger is 0 ,so the job manger is work well
I think the problem is taksmanger can not register itself to jobmanger,  did I miss some configure?




在 2020年9月2日，下午5:24，Till Rohrmann <tr...@apache.org> 写道：


Hi art,


could you check what `kubectl get services` returns? Usually if you run `kubectl get all` you should also see the services. But in your case there are no services listed. You have see something like service/flink-jobmanager otherwise the flink-jobmanager service (K8s service) is not running.


Cheers,
Till


On Wed, Sep 2, 2020 at 11:15 AM art <Su...@163.com> wrote:

Hi Till,


I’m sure the job manager-service is started, I can find it in Kubernetes DashBoard


When I run command ' kubectl get deployment’ I can got this:
flink-jobmanager    1/1     1            1           33s
flink-taskmanager   1/1     1            1           33s


When I run command ' kubectl get all’ I can got this:
NAME                                     READY   STATUS    RESTARTS   AGE
pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s
pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s


So, I think flink-jobmanager works well, but taskmannger is restarted every few minutes 


My minikube version: v1.12.3
Flink version:v1.11.1



在 2020年9月2日，下午4:27，Till Rohrmann <tr...@apache.org> 写道：


Hi art,


could you verify that the jobmanager-service has been started? It looks as if the name flink-jobmanager is not resolvable. It could also help to know the Minikube and K8s version you are using.


Cheers,
Till


On Wed, Sep 2, 2020 at 9:50 AM art <Su...@163.com> wrote:

Hi，I’m going to deploy flink on minikube referring to https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html;
kubectl create -f flink-configuration-configmap.yaml
kubectl create -f jobmanager-service.yaml
kubectl create -f jobmanager-session-deployment.yaml
kubectl create -f taskmanager-session-deployment.yaml


But I got this


2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name resolution]
2020-09-02 06:45:42,691 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2020-09-02 06:46:02,731 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor                     [] - No response from remote for outbound association. Associate timed out after [20000 ms]. 


And when I run the command 'kubectl exec -ti flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’ , I find I cannot ping flink-jobmanager from taskmanager


I am new to k8s, can anyone give me some tutorial? Thanks a lot !

Re: Fail to deploy Flink on minikube

Posted by Till Rohrmann <tr...@apache.org>.

In order to exclude a Minikube problem, you could also try to run Flink on
an older Minikube and an older K8s version. Our end-to-end tests use
Minikube v1.8.2, for example.

Cheers,
Till

On Thu, Sep 3, 2020 at 8:44 AM Yang Wang <da...@gmail.com> wrote:

> Sorry i forget that the JobManager is binding its rpc address to
> flink-jobmanager, not the ip address.
> So you need to also update the jobmanager-session-deployment.yaml with
> following changes.
>
> ...
>       containers:
>       - name: jobmanager
>         env:
>         - name: JM_IP
>           valueFrom:
>             fieldRef:
>               apiVersion: v1
>               fieldPath: status.podIP
>         image: flink:1.11
>         args: ["jobmanager", "$(JM_IP)"]
> ...
>
> After then the JobManager is binding the rpc address with its ip.
>
> Best,
> Yang
>
>
> superainbower <su...@163.com> 于2020年9月3日周四 上午11:38写道：
>
>> HI Yang,
>> I update taskmanager-session-deployment.yaml like this:
>>
>> apiVersion: apps/v1
>> kind: Deployment
>> metadata:
>>   name: flink-taskmanager
>> spec:
>>   replicas: 1
>>   selector:
>>     matchLabels:
>>       app: flink
>>       component: taskmanager
>>   template:
>>     metadata:
>>       labels:
>>         app: flink
>>         component: taskmanager
>>     spec:
>>       containers:
>>       - name: taskmanager
>>         image:
>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>>         args: ["taskmanager","-Djobmanager.rpc.address=172.18.0.5"]
>>         ports:
>>         - containerPort: 6122
>>           name: rpc
>>         - containerPort: 6125
>>           name: query-state
>>         livenessProbe:
>>           tcpSocket:
>>             port: 6122
>>           initialDelaySeconds: 30
>>           periodSeconds: 60
>>         volumeMounts:
>>         - name: flink-config-volume
>>           mountPath: /opt/flink/conf/
>>         securityContext:
>>           runAsUser: 9999  # refers to user _flink_ from official flink
>> image, change if necessary
>>       volumes:
>>       - name: flink-config-volume
>>         configMap:
>>           name: flink-config
>>           items:
>>           - key: flink-conf.yaml
>>             path: flink-conf.yaml
>>           - key: log4j-console.properties
>>             path: log4j-console.properties
>>       imagePullSecrets:
>>         - name: regcred
>>
>> And Delete the TaskManager pod and restart it , but the logs print this
>>
>> Could not resolve ResourceManager address akka.tcp://
>> flink@172.18.0.5:6123/user/rpc/resourcemanager_*, retrying in 10000 ms:
>> Could not connect to rpc endpoint under address akka.tcp://
>> flink@172.18.0.5:6123/user/rpc/resourcemanager_*
>>
>> It change flink-jobmanager to 172.18.0.5
>> superainbower
>> superainbower@163.com
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>
>> On 09/3/2020 11:09，Yang Wang<da...@gmail.com>
>> <da...@gmail.com> wrote：
>>
>> I guess something is wrong with your kube proxy, which causes TaskManager
>> could not connect to JobManager.
>> You could verify this by directly using JobManager Pod ip instead of
>> service name.
>>
>> Please do as follows.
>> * Edit the TaskManager deployment(via kubectl edit flink-taskmanager)
>> and update the args field to the following.
>>    args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"]    Given
>> that "172.18.0.5" is the JobManager pod ip.
>> * Delete the current TaskManager pod and let restart again
>> * Now check the TaskManager logs to check whether it could register
>> successfully
>>
>>
>>
>> Best,
>> Yang
>>
>> superainbower <su...@163.com> 于2020年9月3日周四 上午9:35写道：
>>
>>> Hi Till,
>>> I find something may be helpful.
>>> The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip
>>> 172.18.0.6
>>> When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn
>>> -- /bin/bash’ && ‘ping 172.18.0.5’
>>> I can get response
>>> But when I ping flink-jobmanager ,there is no response
>>>
>>> superainbower
>>> superainbower@163.com
>>>
>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
>>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>>
>>> On 09/3/2020 09:03，superainbower<su...@163.com>
>>> <su...@163.com> wrote：
>>>
>>> Hi Till,
>>> This is the taskManager log
>>> As you see, the logs print  ‘line 92 -- Could not connect to
>>> flink-jobmanager:6123’
>>> then print ‘line 128 --Could not resolve ResourceManager address
>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’
>>> And repeat print this
>>>
>>> A few minutes later, the taskmanger shut down and restart
>>>
>>> This is my yaml files, could u help me to confirm did I
>>> omitted something? Thanks a lot!
>>> ---------------------------------------------------
>>> flink-configuration-configmap.yaml
>>> apiVersion: v1
>>> kind: ConfigMap
>>> metadata:
>>>   name: flink-config
>>>   labels:
>>>     app: flink
>>> data:
>>>   flink-conf.yaml: |+
>>>     jobmanager.rpc.address: flink-jobmanager
>>>     taskmanager.numberOfTaskSlots: 1
>>>     blob.server.port: 6124
>>>     jobmanager.rpc.port: 6123
>>>     taskmanager.rpc.port: 6122
>>>     queryable-state.proxy.ports: 6125
>>>     jobmanager.memory.process.size: 1024m
>>>     taskmanager.memory.process.size: 1024m
>>>     parallelism.default: 1
>>>   log4j-console.properties: |+
>>>     rootLogger.level = INFO
>>>     rootLogger.appenderRef.console.ref = ConsoleAppender
>>>     rootLogger.appenderRef.rolling.ref = RollingFileAppender
>>>     logger.akka.name = akka
>>>     logger.akka.level = INFO
>>>     logger.kafka.name= org.apache.kafka
>>>     logger.kafka.level = INFO
>>>     logger.hadoop.name = org.apache.hadoop
>>>     logger.hadoop.level = INFO
>>>     logger.zookeeper.name = org.apache.zookeeper
>>>     logger.zookeeper.level = INFO
>>>     appender.console.name = ConsoleAppender
>>>     appender.console.type = CONSOLE
>>>     appender.console.layout.type = PatternLayout
>>>     appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p
>>> %-60c %x - %m%n
>>>     appender.rolling.name = RollingFileAppender
>>>     appender.rolling.type = RollingFile
>>>     appender.rolling.append = false
>>>     appender.rolling.fileName = ${sys:log.file}
>>>     appender.rolling.filePattern = ${sys:log.file}.%i
>>>     appender.rolling.layout.type = PatternLayout
>>>     appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p
>>> %-60c %x - %m%n
>>>     appender.rolling.policies.type = Policies
>>>     appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
>>>     appender.rolling.policies.size.size=100MB
>>>     appender.rolling.strategy.type = DefaultRolloverStrategy
>>>     appender.rolling.strategy.max = 10
>>>     logger.netty.name =
>>> org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
>>>     logger.netty.level = OFF
>>> ---------------------------------------------------
>>> jobmanager-service.yaml
>>> apiVersion: v1
>>> kind: Service
>>> metadata:
>>>   name: flink-jobmanager
>>> spec:
>>>   type: ClusterIP
>>>   ports:
>>>   - name: rpc
>>>     port: 6123
>>>   - name: blob-server
>>>     port: 6124
>>>   - name: webui
>>>     port: 8081
>>>   selector:
>>>     app: flink
>>>     component: jobmanager
>>> --------------------------------------------------
>>> jobmanager-session-deployment.yaml
>>> apiVersion: apps/v1
>>> kind: Deployment
>>> metadata:
>>>   name: flink-jobmanager
>>> spec:
>>>   replicas: 1
>>>   selector:
>>>     matchLabels:
>>>       app: flink
>>>       component: jobmanager
>>>   template:
>>>     metadata:
>>>       labels:
>>>         app: flink
>>>         component: jobmanager
>>>     spec:
>>>       containers:
>>>       - name: jobmanager
>>>         image:
>>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>>>         args: ["jobmanager"]
>>>         ports:
>>>         - containerPort: 6123
>>>           name: rpc
>>>         - containerPort: 6124
>>>           name: blob-server
>>>         - containerPort: 8081
>>>           name: webui
>>>         livenessProbe:
>>>           tcpSocket:
>>>             port: 6123
>>>           initialDelaySeconds: 30
>>>           periodSeconds: 60
>>>         volumeMounts:
>>>         - name: flink-config-volume
>>>           mountPath: /opt/flink/conf
>>>         securityContext:
>>>           runAsUser: 9999  # refers to user _flink_ from official flink
>>> image, change if necessary
>>>       volumes:
>>>       - name: flink-config-volume
>>>         configMap:
>>>           name: flink-config
>>>           items:
>>>           - key: flink-conf.yaml
>>>             path: flink-conf.yaml
>>>           - key: log4j-console.properties
>>>             path: log4j-console.properties
>>>       imagePullSecrets:
>>>         - name: regcred
>>> ---------------------------------------------------
>>> taskmanager-session-deployment.yaml
>>> apiVersion: apps/v1
>>> kind: Deployment
>>> metadata:
>>>   name: flink-taskmanager
>>> spec:
>>>   replicas: 1
>>>   selector:
>>>     matchLabels:
>>>       app: flink
>>>       component: taskmanager
>>>   template:
>>>     metadata:
>>>       labels:
>>>         app: flink
>>>         component: taskmanager
>>>     spec:
>>>       containers:
>>>       - name: taskmanager
>>>         image:
>>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>>>         args: ["taskmanager"]
>>>         ports:
>>>         - containerPort: 6122
>>>           name: rpc
>>>         - containerPort: 6125
>>>           name: query-state
>>>         livenessProbe:
>>>           tcpSocket:
>>>             port: 6122
>>>           initialDelaySeconds: 30
>>>           periodSeconds: 60
>>>         volumeMounts:
>>>         - name: flink-config-volume
>>>           mountPath: /opt/flink/conf/
>>>         securityContext:
>>>           runAsUser: 9999  # refers to user _flink_ from official flink
>>> image, change if necessary
>>>       volumes:
>>>       - name: flink-config-volume
>>>         configMap:
>>>           name: flink-config
>>>           items:
>>>           - key: flink-conf.yaml
>>>             path: flink-conf.yaml
>>>           - key: log4j-console.properties
>>>             path: log4j-console.properties
>>>       imagePullSecrets:
>>>         - name: regcred
>>>
>>>
>>> superainbower
>>> superainbower@163.com
>>>
>>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
>>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>>
>>> On 09/2/2020 20:38，Till Rohrmann<tr...@apache.org>
>>> <tr...@apache.org> wrote：
>>>
>>> Hmm, this is indeed strange. Could you share the logs of the TaskManager
>>> with us? Ideally you set the log level to debug. Thanks a lot.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Sep 2, 2020 at 12:45 PM art <Su...@163.com> wrote:
>>>
>>>> Hi Till,
>>>>
>>>> The full information when I run command ' kubectl get all’  like this:
>>>>
>>>> NAME                                     READY   STATUS    RESTARTS
>>>> AGE
>>>> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0
>>>>  2m34s
>>>> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0
>>>>  2m34s
>>>>
>>>> NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP
>>>> PORT(S)                      AGE
>>>> service/flink-jobmanager   ClusterIP   10.103.207.75   <none>
>>>>  6123/TCP,6124/TCP,8081/TCP   2m34s
>>>> service/kubernetes         ClusterIP   10.96.0.1       <none>
>>>>  443/TCP                      5d2h
>>>>
>>>> NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
>>>> deployment.apps/flink-jobmanager    1/1     1            1
>>>> 2m34s
>>>> deployment.apps/flink-taskmanager   1/1     1            1
>>>> 2m34s
>>>>
>>>> NAME                                           DESIRED   CURRENT
>>>> READY   AGE
>>>> replicaset.apps/flink-jobmanager-85bdbd98d8    1         1         1
>>>>     2m34s
>>>> replicaset.apps/flink-taskmanager-74c68c6f48   1         1         1
>>>>     2m34s
>>>>
>>>> And I can open flink ui but the task manger is 0 ,so the job manger is
>>>> work well
>>>> I think the problem is taksmanger can not register itself to jobmanger,
>>>>  did I miss some configure?
>>>>
>>>>
>>>> 在 2020年9月2日，下午5:24，Till Rohrmann <tr...@apache.org> 写道：
>>>>
>>>> Hi art,
>>>>
>>>> could you check what `kubectl get services` returns? Usually if you run
>>>> `kubectl get all` you should also see the services. But in your case there
>>>> are no services listed. You have see something like
>>>> service/flink-jobmanager otherwise the flink-jobmanager service (K8s
>>>> service) is not running.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Wed, Sep 2, 2020 at 11:15 AM art <Su...@163.com> wrote:
>>>>
>>>>> Hi Till,
>>>>>
>>>>> I’m sure the job manager-service is started, I can find it in
>>>>> Kubernetes DashBoard
>>>>>
>>>>> When I run command ' kubectl get deployment’ I can got this:
>>>>> flink-jobmanager    1/1     1            1           33s
>>>>> flink-taskmanager   1/1     1            1           33s
>>>>>
>>>>> When I run command ' kubectl get all’ I can got this:
>>>>> NAME                                     READY   STATUS    RESTARTS
>>>>> AGE
>>>>> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0
>>>>>  2m34s
>>>>> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0
>>>>>  2m34s
>>>>>
>>>>> So, I think flink-jobmanager works well, but taskmannger is restarted
>>>>> every few minutes
>>>>>
>>>>> My minikube version: v1.12.3
>>>>> Flink version:v1.11.1
>>>>>
>>>>> 在 2020年9月2日，下午4:27，Till Rohrmann <tr...@apache.org> 写道：
>>>>>
>>>>> Hi art,
>>>>>
>>>>> could you verify that the jobmanager-service has been started? It
>>>>> looks as if the name flink-jobmanager is not resolvable. It could also help
>>>>> to know the Minikube and K8s version you are using.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Wed, Sep 2, 2020 at 9:50 AM art <Su...@163.com> wrote:
>>>>>
>>>>>> Hi，I’m going to deploy flink on minikube referring to
>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html
>>>>>> ;
>>>>>> kubectl create -f flink-configuration-configmap.yaml
>>>>>> kubectl create -f jobmanager-service.yaml
>>>>>> kubectl create -f jobmanager-session-deployment.yaml
>>>>>> kubectl create -f taskmanager-session-deployment.yaml
>>>>>>
>>>>>> But I got this
>>>>>>
>>>>>> 2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor
>>>>>>                       [] - Association with remote system [
>>>>>> akka.tcp://flink@flink-jobmanager:6123] has failed, address is now
>>>>>> gated for [50] ms. Reason: [Association failed with [
>>>>>> akka.tcp://flink@flink-jobmanager:6123]] Caused by:
>>>>>> [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name
>>>>>> resolution]
>>>>>> 2020-09-02 06:45:42,691 INFO
>>>>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>>>>>> not resolve ResourceManager address
>>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>>>>> 2020-09-02 06:46:02,731 INFO
>>>>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>>>>>> not resolve ResourceManager address
>>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>>>>> 2020-09-02 06:46:12,731 INFO
>>>>>>  akka.remote.transport.ProtocolStateActor                     [] - No
>>>>>> response from remote for outbound association. Associate timed out after
>>>>>> [20000 ms].
>>>>>>
>>>>>> And when I run the command 'kubectl exec -ti
>>>>>> flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’
>>>>>> , I find I cannot ping flink-jobmanager from taskmanager
>>>>>>
>>>>>> I am new to k8s, can anyone give me some tutorial? Thanks a lot !
>>>>>>
>>>>>
>>>>>
>>>>

Re: Fail to deploy Flink on minikube

Posted by Yang Wang <da...@gmail.com>.

Sorry i forget that the JobManager is binding its rpc address to
flink-jobmanager, not the ip address.
So you need to also update the jobmanager-session-deployment.yaml with
following changes.

...
      containers:
      - name: jobmanager
        env:
        - name: JM_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: flink:1.11
        args: ["jobmanager", "$(JM_IP)"]
...

After then the JobManager is binding the rpc address with its ip.

Best,
Yang


superainbower <su...@163.com> 于2020年9月3日周四 上午11:38写道：

> HI Yang,
> I update taskmanager-session-deployment.yaml like this:
>
> apiVersion: apps/v1
> kind: Deployment
> metadata:
>   name: flink-taskmanager
> spec:
>   replicas: 1
>   selector:
>     matchLabels:
>       app: flink
>       component: taskmanager
>   template:
>     metadata:
>       labels:
>         app: flink
>         component: taskmanager
>     spec:
>       containers:
>       - name: taskmanager
>         image:
> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>         args: ["taskmanager","-Djobmanager.rpc.address=172.18.0.5"]
>         ports:
>         - containerPort: 6122
>           name: rpc
>         - containerPort: 6125
>           name: query-state
>         livenessProbe:
>           tcpSocket:
>             port: 6122
>           initialDelaySeconds: 30
>           periodSeconds: 60
>         volumeMounts:
>         - name: flink-config-volume
>           mountPath: /opt/flink/conf/
>         securityContext:
>           runAsUser: 9999  # refers to user _flink_ from official flink
> image, change if necessary
>       volumes:
>       - name: flink-config-volume
>         configMap:
>           name: flink-config
>           items:
>           - key: flink-conf.yaml
>             path: flink-conf.yaml
>           - key: log4j-console.properties
>             path: log4j-console.properties
>       imagePullSecrets:
>         - name: regcred
>
> And Delete the TaskManager pod and restart it , but the logs print this
>
> Could not resolve ResourceManager address akka.tcp://
> flink@172.18.0.5:6123/user/rpc/resourcemanager_*, retrying in 10000 ms:
> Could not connect to rpc endpoint under address akka.tcp://
> flink@172.18.0.5:6123/user/rpc/resourcemanager_*
>
> It change flink-jobmanager to 172.18.0.5
> superainbower
> superainbower@163.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
> On 09/3/2020 11:09，Yang Wang<da...@gmail.com>
> <da...@gmail.com> wrote：
>
> I guess something is wrong with your kube proxy, which causes TaskManager
> could not connect to JobManager.
> You could verify this by directly using JobManager Pod ip instead of
> service name.
>
> Please do as follows.
> * Edit the TaskManager deployment(via kubectl edit flink-taskmanager) and
> update the args field to the following.
>    args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"]    Given
> that "172.18.0.5" is the JobManager pod ip.
> * Delete the current TaskManager pod and let restart again
> * Now check the TaskManager logs to check whether it could register
> successfully
>
>
>
> Best,
> Yang
>
> superainbower <su...@163.com> 于2020年9月3日周四 上午9:35写道：
>
>> Hi Till,
>> I find something may be helpful.
>> The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip
>> 172.18.0.6
>> When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn
>> -- /bin/bash’ && ‘ping 172.18.0.5’
>> I can get response
>> But when I ping flink-jobmanager ,there is no response
>>
>> superainbower
>> superainbower@163.com
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>
>> On 09/3/2020 09:03，superainbower<su...@163.com>
>> <su...@163.com> wrote：
>>
>> Hi Till,
>> This is the taskManager log
>> As you see, the logs print  ‘line 92 -- Could not connect to
>> flink-jobmanager:6123’
>> then print ‘line 128 --Could not resolve ResourceManager address
>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’
>> And repeat print this
>>
>> A few minutes later, the taskmanger shut down and restart
>>
>> This is my yaml files, could u help me to confirm did I
>> omitted something? Thanks a lot!
>> ---------------------------------------------------
>> flink-configuration-configmap.yaml
>> apiVersion: v1
>> kind: ConfigMap
>> metadata:
>>   name: flink-config
>>   labels:
>>     app: flink
>> data:
>>   flink-conf.yaml: |+
>>     jobmanager.rpc.address: flink-jobmanager
>>     taskmanager.numberOfTaskSlots: 1
>>     blob.server.port: 6124
>>     jobmanager.rpc.port: 6123
>>     taskmanager.rpc.port: 6122
>>     queryable-state.proxy.ports: 6125
>>     jobmanager.memory.process.size: 1024m
>>     taskmanager.memory.process.size: 1024m
>>     parallelism.default: 1
>>   log4j-console.properties: |+
>>     rootLogger.level = INFO
>>     rootLogger.appenderRef.console.ref = ConsoleAppender
>>     rootLogger.appenderRef.rolling.ref = RollingFileAppender
>>     logger.akka.name = akka
>>     logger.akka.level = INFO
>>     logger.kafka.name= org.apache.kafka
>>     logger.kafka.level = INFO
>>     logger.hadoop.name = org.apache.hadoop
>>     logger.hadoop.level = INFO
>>     logger.zookeeper.name = org.apache.zookeeper
>>     logger.zookeeper.level = INFO
>>     appender.console.name = ConsoleAppender
>>     appender.console.type = CONSOLE
>>     appender.console.layout.type = PatternLayout
>>     appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p
>> %-60c %x - %m%n
>>     appender.rolling.name = RollingFileAppender
>>     appender.rolling.type = RollingFile
>>     appender.rolling.append = false
>>     appender.rolling.fileName = ${sys:log.file}
>>     appender.rolling.filePattern = ${sys:log.file}.%i
>>     appender.rolling.layout.type = PatternLayout
>>     appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p
>> %-60c %x - %m%n
>>     appender.rolling.policies.type = Policies
>>     appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
>>     appender.rolling.policies.size.size=100MB
>>     appender.rolling.strategy.type = DefaultRolloverStrategy
>>     appender.rolling.strategy.max = 10
>>     logger.netty.name =
>> org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
>>     logger.netty.level = OFF
>> ---------------------------------------------------
>> jobmanager-service.yaml
>> apiVersion: v1
>> kind: Service
>> metadata:
>>   name: flink-jobmanager
>> spec:
>>   type: ClusterIP
>>   ports:
>>   - name: rpc
>>     port: 6123
>>   - name: blob-server
>>     port: 6124
>>   - name: webui
>>     port: 8081
>>   selector:
>>     app: flink
>>     component: jobmanager
>> --------------------------------------------------
>> jobmanager-session-deployment.yaml
>> apiVersion: apps/v1
>> kind: Deployment
>> metadata:
>>   name: flink-jobmanager
>> spec:
>>   replicas: 1
>>   selector:
>>     matchLabels:
>>       app: flink
>>       component: jobmanager
>>   template:
>>     metadata:
>>       labels:
>>         app: flink
>>         component: jobmanager
>>     spec:
>>       containers:
>>       - name: jobmanager
>>         image:
>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>>         args: ["jobmanager"]
>>         ports:
>>         - containerPort: 6123
>>           name: rpc
>>         - containerPort: 6124
>>           name: blob-server
>>         - containerPort: 8081
>>           name: webui
>>         livenessProbe:
>>           tcpSocket:
>>             port: 6123
>>           initialDelaySeconds: 30
>>           periodSeconds: 60
>>         volumeMounts:
>>         - name: flink-config-volume
>>           mountPath: /opt/flink/conf
>>         securityContext:
>>           runAsUser: 9999  # refers to user _flink_ from official flink
>> image, change if necessary
>>       volumes:
>>       - name: flink-config-volume
>>         configMap:
>>           name: flink-config
>>           items:
>>           - key: flink-conf.yaml
>>             path: flink-conf.yaml
>>           - key: log4j-console.properties
>>             path: log4j-console.properties
>>       imagePullSecrets:
>>         - name: regcred
>> ---------------------------------------------------
>> taskmanager-session-deployment.yaml
>> apiVersion: apps/v1
>> kind: Deployment
>> metadata:
>>   name: flink-taskmanager
>> spec:
>>   replicas: 1
>>   selector:
>>     matchLabels:
>>       app: flink
>>       component: taskmanager
>>   template:
>>     metadata:
>>       labels:
>>         app: flink
>>         component: taskmanager
>>     spec:
>>       containers:
>>       - name: taskmanager
>>         image:
>> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>>         args: ["taskmanager"]
>>         ports:
>>         - containerPort: 6122
>>           name: rpc
>>         - containerPort: 6125
>>           name: query-state
>>         livenessProbe:
>>           tcpSocket:
>>             port: 6122
>>           initialDelaySeconds: 30
>>           periodSeconds: 60
>>         volumeMounts:
>>         - name: flink-config-volume
>>           mountPath: /opt/flink/conf/
>>         securityContext:
>>           runAsUser: 9999  # refers to user _flink_ from official flink
>> image, change if necessary
>>       volumes:
>>       - name: flink-config-volume
>>         configMap:
>>           name: flink-config
>>           items:
>>           - key: flink-conf.yaml
>>             path: flink-conf.yaml
>>           - key: log4j-console.properties
>>             path: log4j-console.properties
>>       imagePullSecrets:
>>         - name: regcred
>>
>>
>> superainbower
>> superainbower@163.com
>>
>> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
>> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>>
>> On 09/2/2020 20:38，Till Rohrmann<tr...@apache.org>
>> <tr...@apache.org> wrote：
>>
>> Hmm, this is indeed strange. Could you share the logs of the TaskManager
>> with us? Ideally you set the log level to debug. Thanks a lot.
>>
>> Cheers,
>> Till
>>
>> On Wed, Sep 2, 2020 at 12:45 PM art <Su...@163.com> wrote:
>>
>>> Hi Till,
>>>
>>> The full information when I run command ' kubectl get all’  like this:
>>>
>>> NAME                                     READY   STATUS    RESTARTS   AGE
>>> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0
>>>  2m34s
>>> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0
>>>  2m34s
>>>
>>> NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP
>>> PORT(S)                      AGE
>>> service/flink-jobmanager   ClusterIP   10.103.207.75   <none>
>>>  6123/TCP,6124/TCP,8081/TCP   2m34s
>>> service/kubernetes         ClusterIP   10.96.0.1       <none>
>>>  443/TCP                      5d2h
>>>
>>> NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
>>> deployment.apps/flink-jobmanager    1/1     1            1
>>> 2m34s
>>> deployment.apps/flink-taskmanager   1/1     1            1
>>> 2m34s
>>>
>>> NAME                                           DESIRED   CURRENT   READY
>>>   AGE
>>> replicaset.apps/flink-jobmanager-85bdbd98d8    1         1         1
>>>   2m34s
>>> replicaset.apps/flink-taskmanager-74c68c6f48   1         1         1
>>>   2m34s
>>>
>>> And I can open flink ui but the task manger is 0 ,so the job manger is
>>> work well
>>> I think the problem is taksmanger can not register itself to jobmanger,
>>>  did I miss some configure?
>>>
>>>
>>> 在 2020年9月2日，下午5:24，Till Rohrmann <tr...@apache.org> 写道：
>>>
>>> Hi art,
>>>
>>> could you check what `kubectl get services` returns? Usually if you run
>>> `kubectl get all` you should also see the services. But in your case there
>>> are no services listed. You have see something like
>>> service/flink-jobmanager otherwise the flink-jobmanager service (K8s
>>> service) is not running.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Sep 2, 2020 at 11:15 AM art <Su...@163.com> wrote:
>>>
>>>> Hi Till,
>>>>
>>>> I’m sure the job manager-service is started, I can find it in
>>>> Kubernetes DashBoard
>>>>
>>>> When I run command ' kubectl get deployment’ I can got this:
>>>> flink-jobmanager    1/1     1            1           33s
>>>> flink-taskmanager   1/1     1            1           33s
>>>>
>>>> When I run command ' kubectl get all’ I can got this:
>>>> NAME                                     READY   STATUS    RESTARTS
>>>> AGE
>>>> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0
>>>>  2m34s
>>>> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0
>>>>  2m34s
>>>>
>>>> So, I think flink-jobmanager works well, but taskmannger is restarted
>>>> every few minutes
>>>>
>>>> My minikube version: v1.12.3
>>>> Flink version:v1.11.1
>>>>
>>>> 在 2020年9月2日，下午4:27，Till Rohrmann <tr...@apache.org> 写道：
>>>>
>>>> Hi art,
>>>>
>>>> could you verify that the jobmanager-service has been started? It looks
>>>> as if the name flink-jobmanager is not resolvable. It could also help to
>>>> know the Minikube and K8s version you are using.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Wed, Sep 2, 2020 at 9:50 AM art <Su...@163.com> wrote:
>>>>
>>>>> Hi，I’m going to deploy flink on minikube referring to
>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html
>>>>> ;
>>>>> kubectl create -f flink-configuration-configmap.yaml
>>>>> kubectl create -f jobmanager-service.yaml
>>>>> kubectl create -f jobmanager-session-deployment.yaml
>>>>> kubectl create -f taskmanager-session-deployment.yaml
>>>>>
>>>>> But I got this
>>>>>
>>>>> 2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor
>>>>>                     [] - Association with remote system [
>>>>> akka.tcp://flink@flink-jobmanager:6123] has failed, address is now
>>>>> gated for [50] ms. Reason: [Association failed with [
>>>>> akka.tcp://flink@flink-jobmanager:6123]] Caused by:
>>>>> [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name
>>>>> resolution]
>>>>> 2020-09-02 06:45:42,691 INFO
>>>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>>>>> not resolve ResourceManager address
>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>>>> 2020-09-02 06:46:02,731 INFO
>>>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>>>>> not resolve ResourceManager address
>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>>>> 2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor
>>>>>                     [] - No response from remote for outbound association.
>>>>> Associate timed out after [20000 ms].
>>>>>
>>>>> And when I run the command 'kubectl exec -ti
>>>>> flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’
>>>>> , I find I cannot ping flink-jobmanager from taskmanager
>>>>>
>>>>> I am new to k8s, can anyone give me some tutorial? Thanks a lot !
>>>>>
>>>>
>>>>
>>>

Re: Fail to deploy Flink on minikube

Posted by superainbower <su...@163.com>.

HI Yang,
I update taskmanager-session-deployment.yaml like this:


apiVersion: apps/v1
kind: Deployment
metadata:
  name: flink-taskmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flink
      component: taskmanager
  template:
    metadata:
      labels:
        app: flink
        component: taskmanager
    spec:
      containers:
      - name: taskmanager
        image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
        args: ["taskmanager","-Djobmanager.rpc.address=172.18.0.5"]
        ports:
        - containerPort: 6122
          name: rpc
        - containerPort: 6125
          name: query-state
        livenessProbe:
          tcpSocket:
            port: 6122
          initialDelaySeconds: 30
          periodSeconds: 60
        volumeMounts:
        - name: flink-config-volume
          mountPath: /opt/flink/conf/
        securityContext:
          runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary
      volumes:
      - name: flink-config-volume
        configMap:
          name: flink-config
          items:
          - key: flink-conf.yaml
            path: flink-conf.yaml
          - key: log4j-console.properties
            path: log4j-console.properties
      imagePullSecrets:
        - name: regcred


And Delete the TaskManager pod and restart it , but the logs print this


Could not resolve ResourceManager address akka.tcp://flink@172.18.0.5:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@172.18.0.5:6123/user/rpc/resourcemanager_*


It change flink-jobmanager to 172.18.0.5 
| |
superainbower
|
|
superainbower@163.com
|
签名由网易邮箱大师定制


On 09/3/2020 11:09，Yang Wang<da...@gmail.com> wrote：
I guess something is wrong with your kube proxy, which causes TaskManager could not connect to JobManager.
You could verify this by directly using JobManager Pod ip instead of service name.


Please do as follows.
* Edit the TaskManager deployment(via kubectl edit flink-taskmanager) and update the args field to the following.
   args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"]    Given that "172.18.0.5" is the JobManager pod ip.
* Delete the current TaskManager pod and let restart again
* Now check the TaskManager logs to check whether it could register successfully






Best,
Yang


superainbower <su...@163.com> 于2020年9月3日周四 上午9:35写道：

Hi Till,
I find something may be helpful.
The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip 172.18.0.6
When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn -- /bin/bash’ && ‘ping 172.18.0.5’ 
I can get response
But when I ping flink-jobmanager ,there is no response


| |
superainbower
|
|
superainbower@163.com
|
签名由网易邮箱大师定制


On 09/3/2020 09:03，superainbower<su...@163.com> wrote：
Hi Till,
This is the taskManager log
As you see, the logs print  ‘line 92 -- Could not connect to flink-jobmanager:6123’
then print ‘line 128 --Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’   And repeat print this


A few minutes later, the taskmanger shut down and restart


This is my yaml files, could u help me to confirm did I omitted something? Thanks a lot!
---------------------------------------------------
flink-configuration-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: flink-config
  labels:
    app: flink
data:
  flink-conf.yaml: |+
    jobmanager.rpc.address: flink-jobmanager
    taskmanager.numberOfTaskSlots: 1
    blob.server.port: 6124
    jobmanager.rpc.port: 6123
    taskmanager.rpc.port: 6122
    queryable-state.proxy.ports: 6125
    jobmanager.memory.process.size: 1024m
    taskmanager.memory.process.size: 1024m
    parallelism.default: 1
  log4j-console.properties: |+
    rootLogger.level = INFO
    rootLogger.appenderRef.console.ref = ConsoleAppender
    rootLogger.appenderRef.rolling.ref = RollingFileAppender
    logger.akka.name = akka
    logger.akka.level = INFO
    logger.kafka.name= org.apache.kafka
    logger.kafka.level = INFO
    logger.hadoop.name = org.apache.hadoop
    logger.hadoop.level = INFO
    logger.zookeeper.name = org.apache.zookeeper
    logger.zookeeper.level = INFO
    appender.console.name = ConsoleAppender
    appender.console.type = CONSOLE
    appender.console.layout.type = PatternLayout
    appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
    appender.rolling.name = RollingFileAppender
    appender.rolling.type = RollingFile
    appender.rolling.append = false
    appender.rolling.fileName = ${sys:log.file}
    appender.rolling.filePattern = ${sys:log.file}.%i
    appender.rolling.layout.type = PatternLayout
    appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
    appender.rolling.policies.type = Policies
    appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
    appender.rolling.policies.size.size=100MB
    appender.rolling.strategy.type = DefaultRolloverStrategy
    appender.rolling.strategy.max = 10
    logger.netty.name = org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
    logger.netty.level = OFF
---------------------------------------------------
jobmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: flink-jobmanager
spec:
  type: ClusterIP
  ports:
  - name: rpc
    port: 6123
  - name: blob-server
    port: 6124
  - name: webui
    port: 8081
  selector:
    app: flink
    component: jobmanager
--------------------------------------------------
jobmanager-session-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flink-jobmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flink
      component: jobmanager
  template:
    metadata:
      labels:
        app: flink
        component: jobmanager
    spec:
      containers:
      - name: jobmanager
        image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
        args: ["jobmanager"]
        ports:
        - containerPort: 6123
          name: rpc
        - containerPort: 6124
          name: blob-server
        - containerPort: 8081
          name: webui
        livenessProbe:
          tcpSocket:
            port: 6123
          initialDelaySeconds: 30
          periodSeconds: 60
        volumeMounts:
        - name: flink-config-volume
          mountPath: /opt/flink/conf
        securityContext:
          runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary
      volumes:
      - name: flink-config-volume
        configMap:
          name: flink-config
          items:
          - key: flink-conf.yaml
            path: flink-conf.yaml
          - key: log4j-console.properties
            path: log4j-console.properties
      imagePullSecrets:
        - name: regcred
---------------------------------------------------
taskmanager-session-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flink-taskmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flink
      component: taskmanager
  template:
    metadata:
      labels:
        app: flink
        component: taskmanager
    spec:
      containers:
      - name: taskmanager
        image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
        args: ["taskmanager"]
        ports:
        - containerPort: 6122
          name: rpc
        - containerPort: 6125
          name: query-state
        livenessProbe:
          tcpSocket:
            port: 6122
          initialDelaySeconds: 30
          periodSeconds: 60
        volumeMounts:
        - name: flink-config-volume
          mountPath: /opt/flink/conf/
        securityContext:
          runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary
      volumes:
      - name: flink-config-volume
        configMap:
          name: flink-config
          items:
          - key: flink-conf.yaml
            path: flink-conf.yaml
          - key: log4j-console.properties
            path: log4j-console.properties
      imagePullSecrets:
        - name: regcred
       


| |
superainbower
|
|
superainbower@163.com
|
签名由网易邮箱大师定制


On 09/2/2020 20:38，Till Rohrmann<tr...@apache.org> wrote：
Hmm, this is indeed strange. Could you share the logs of the TaskManager with us? Ideally you set the log level to debug. Thanks a lot.


Cheers,
Till


On Wed, Sep 2, 2020 at 12:45 PM art <Su...@163.com> wrote:

Hi Till,
  
The full information when I run command ' kubectl get all’  like this:


NAME                                     READY   STATUS    RESTARTS   AGE
pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s
pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s


NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/flink-jobmanager   ClusterIP   10.103.207.75   <none>        6123/TCP,6124/TCP,8081/TCP   2m34s
service/kubernetes         ClusterIP   10.96.0.1       <none>        443/TCP                      5d2h


NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/flink-jobmanager    1/1     1            1           2m34s
deployment.apps/flink-taskmanager   1/1     1            1           2m34s


NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/flink-jobmanager-85bdbd98d8    1         1         1       2m34s
replicaset.apps/flink-taskmanager-74c68c6f48   1         1         1       2m34s


And I can open flink ui but the task manger is 0 ,so the job manger is work well
I think the problem is taksmanger can not register itself to jobmanger,  did I miss some configure?




在 2020年9月2日，下午5:24，Till Rohrmann <tr...@apache.org> 写道：


Hi art,


could you check what `kubectl get services` returns? Usually if you run `kubectl get all` you should also see the services. But in your case there are no services listed. You have see something like service/flink-jobmanager otherwise the flink-jobmanager service (K8s service) is not running.


Cheers,
Till


On Wed, Sep 2, 2020 at 11:15 AM art <Su...@163.com> wrote:

Hi Till,


I’m sure the job manager-service is started, I can find it in Kubernetes DashBoard


When I run command ' kubectl get deployment’ I can got this:
flink-jobmanager    1/1     1            1           33s
flink-taskmanager   1/1     1            1           33s


When I run command ' kubectl get all’ I can got this:
NAME                                     READY   STATUS    RESTARTS   AGE
pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s
pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s


So, I think flink-jobmanager works well, but taskmannger is restarted every few minutes 


My minikube version: v1.12.3
Flink version:v1.11.1



在 2020年9月2日，下午4:27，Till Rohrmann <tr...@apache.org> 写道：


Hi art,


could you verify that the jobmanager-service has been started? It looks as if the name flink-jobmanager is not resolvable. It could also help to know the Minikube and K8s version you are using.


Cheers,
Till


On Wed, Sep 2, 2020 at 9:50 AM art <Su...@163.com> wrote:

Hi，I’m going to deploy flink on minikube referring to https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html;
kubectl create -f flink-configuration-configmap.yaml
kubectl create -f jobmanager-service.yaml
kubectl create -f jobmanager-session-deployment.yaml
kubectl create -f taskmanager-session-deployment.yaml


But I got this


2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name resolution]
2020-09-02 06:45:42,691 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2020-09-02 06:46:02,731 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor                     [] - No response from remote for outbound association. Associate timed out after [20000 ms]. 


And when I run the command 'kubectl exec -ti flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’ , I find I cannot ping flink-jobmanager from taskmanager


I am new to k8s, can anyone give me some tutorial? Thanks a lot !

Re: Fail to deploy Flink on minikube

Posted by Yang Wang <da...@gmail.com>.

I guess something is wrong with your kube proxy, which causes TaskManager
could not connect to JobManager.
You could verify this by directly using JobManager Pod ip instead of
service name.

Please do as follows.
* Edit the TaskManager deployment(via kubectl edit flink-taskmanager) and
update the args field to the following.
   args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"]    Given
that "172.18.0.5" is the JobManager pod ip.
* Delete the current TaskManager pod and let restart again
* Now check the TaskManager logs to check whether it could register
successfully



Best,
Yang

superainbower <su...@163.com> 于2020年9月3日周四 上午9:35写道：

> Hi Till,
> I find something may be helpful.
> The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip
> 172.18.0.6
> When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn --
> /bin/bash’ && ‘ping 172.18.0.5’
> I can get response
> But when I ping flink-jobmanager ,there is no response
>
> superainbower
> superainbower@163.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
> On 09/3/2020 09:03，superainbower<su...@163.com>
> <su...@163.com> wrote：
>
> Hi Till,
> This is the taskManager log
> As you see, the logs print  ‘line 92 -- Could not connect to
> flink-jobmanager:6123’
> then print ‘line 128 --Could not resolve ResourceManager address
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’   And
> repeat print this
>
> A few minutes later, the taskmanger shut down and restart
>
> This is my yaml files, could u help me to confirm did I omitted something?
> Thanks a lot!
> ---------------------------------------------------
> flink-configuration-configmap.yaml
> apiVersion: v1
> kind: ConfigMap
> metadata:
>   name: flink-config
>   labels:
>     app: flink
> data:
>   flink-conf.yaml: |+
>     jobmanager.rpc.address: flink-jobmanager
>     taskmanager.numberOfTaskSlots: 1
>     blob.server.port: 6124
>     jobmanager.rpc.port: 6123
>     taskmanager.rpc.port: 6122
>     queryable-state.proxy.ports: 6125
>     jobmanager.memory.process.size: 1024m
>     taskmanager.memory.process.size: 1024m
>     parallelism.default: 1
>   log4j-console.properties: |+
>     rootLogger.level = INFO
>     rootLogger.appenderRef.console.ref = ConsoleAppender
>     rootLogger.appenderRef.rolling.ref = RollingFileAppender
>     logger.akka.name = akka
>     logger.akka.level = INFO
>     logger.kafka.name= org.apache.kafka
>     logger.kafka.level = INFO
>     logger.hadoop.name = org.apache.hadoop
>     logger.hadoop.level = INFO
>     logger.zookeeper.name = org.apache.zookeeper
>     logger.zookeeper.level = INFO
>     appender.console.name = ConsoleAppender
>     appender.console.type = CONSOLE
>     appender.console.layout.type = PatternLayout
>     appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p
> %-60c %x - %m%n
>     appender.rolling.name = RollingFileAppender
>     appender.rolling.type = RollingFile
>     appender.rolling.append = false
>     appender.rolling.fileName = ${sys:log.file}
>     appender.rolling.filePattern = ${sys:log.file}.%i
>     appender.rolling.layout.type = PatternLayout
>     appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p
> %-60c %x - %m%n
>     appender.rolling.policies.type = Policies
>     appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
>     appender.rolling.policies.size.size=100MB
>     appender.rolling.strategy.type = DefaultRolloverStrategy
>     appender.rolling.strategy.max = 10
>     logger.netty.name =
> org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
>     logger.netty.level = OFF
> ---------------------------------------------------
> jobmanager-service.yaml
> apiVersion: v1
> kind: Service
> metadata:
>   name: flink-jobmanager
> spec:
>   type: ClusterIP
>   ports:
>   - name: rpc
>     port: 6123
>   - name: blob-server
>     port: 6124
>   - name: webui
>     port: 8081
>   selector:
>     app: flink
>     component: jobmanager
> --------------------------------------------------
> jobmanager-session-deployment.yaml
> apiVersion: apps/v1
> kind: Deployment
> metadata:
>   name: flink-jobmanager
> spec:
>   replicas: 1
>   selector:
>     matchLabels:
>       app: flink
>       component: jobmanager
>   template:
>     metadata:
>       labels:
>         app: flink
>         component: jobmanager
>     spec:
>       containers:
>       - name: jobmanager
>         image:
> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>         args: ["jobmanager"]
>         ports:
>         - containerPort: 6123
>           name: rpc
>         - containerPort: 6124
>           name: blob-server
>         - containerPort: 8081
>           name: webui
>         livenessProbe:
>           tcpSocket:
>             port: 6123
>           initialDelaySeconds: 30
>           periodSeconds: 60
>         volumeMounts:
>         - name: flink-config-volume
>           mountPath: /opt/flink/conf
>         securityContext:
>           runAsUser: 9999  # refers to user _flink_ from official flink
> image, change if necessary
>       volumes:
>       - name: flink-config-volume
>         configMap:
>           name: flink-config
>           items:
>           - key: flink-conf.yaml
>             path: flink-conf.yaml
>           - key: log4j-console.properties
>             path: log4j-console.properties
>       imagePullSecrets:
>         - name: regcred
> ---------------------------------------------------
> taskmanager-session-deployment.yaml
> apiVersion: apps/v1
> kind: Deployment
> metadata:
>   name: flink-taskmanager
> spec:
>   replicas: 1
>   selector:
>     matchLabels:
>       app: flink
>       component: taskmanager
>   template:
>     metadata:
>       labels:
>         app: flink
>         component: taskmanager
>     spec:
>       containers:
>       - name: taskmanager
>         image:
> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>         args: ["taskmanager"]
>         ports:
>         - containerPort: 6122
>           name: rpc
>         - containerPort: 6125
>           name: query-state
>         livenessProbe:
>           tcpSocket:
>             port: 6122
>           initialDelaySeconds: 30
>           periodSeconds: 60
>         volumeMounts:
>         - name: flink-config-volume
>           mountPath: /opt/flink/conf/
>         securityContext:
>           runAsUser: 9999  # refers to user _flink_ from official flink
> image, change if necessary
>       volumes:
>       - name: flink-config-volume
>         configMap:
>           name: flink-config
>           items:
>           - key: flink-conf.yaml
>             path: flink-conf.yaml
>           - key: log4j-console.properties
>             path: log4j-console.properties
>       imagePullSecrets:
>         - name: regcred
>
>
> superainbower
> superainbower@163.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
> On 09/2/2020 20:38，Till Rohrmann<tr...@apache.org>
> <tr...@apache.org> wrote：
>
> Hmm, this is indeed strange. Could you share the logs of the TaskManager
> with us? Ideally you set the log level to debug. Thanks a lot.
>
> Cheers,
> Till
>
> On Wed, Sep 2, 2020 at 12:45 PM art <Su...@163.com> wrote:
>
>> Hi Till,
>>
>> The full information when I run command ' kubectl get all’  like this:
>>
>> NAME                                     READY   STATUS    RESTARTS   AGE
>> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0
>>  2m34s
>> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0
>>  2m34s
>>
>> NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP
>> PORT(S)                      AGE
>> service/flink-jobmanager   ClusterIP   10.103.207.75   <none>
>>  6123/TCP,6124/TCP,8081/TCP   2m34s
>> service/kubernetes         ClusterIP   10.96.0.1       <none>
>>  443/TCP                      5d2h
>>
>> NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
>> deployment.apps/flink-jobmanager    1/1     1            1           2m34s
>> deployment.apps/flink-taskmanager   1/1     1            1           2m34s
>>
>> NAME                                           DESIRED   CURRENT   READY
>>   AGE
>> replicaset.apps/flink-jobmanager-85bdbd98d8    1         1         1
>>   2m34s
>> replicaset.apps/flink-taskmanager-74c68c6f48   1         1         1
>>   2m34s
>>
>> And I can open flink ui but the task manger is 0 ,so the job manger is
>> work well
>> I think the problem is taksmanger can not register itself to jobmanger,
>>  did I miss some configure?
>>
>>
>> 在 2020年9月2日，下午5:24，Till Rohrmann <tr...@apache.org> 写道：
>>
>> Hi art,
>>
>> could you check what `kubectl get services` returns? Usually if you run
>> `kubectl get all` you should also see the services. But in your case there
>> are no services listed. You have see something like
>> service/flink-jobmanager otherwise the flink-jobmanager service (K8s
>> service) is not running.
>>
>> Cheers,
>> Till
>>
>> On Wed, Sep 2, 2020 at 11:15 AM art <Su...@163.com> wrote:
>>
>>> Hi Till,
>>>
>>> I’m sure the job manager-service is started, I can find it in Kubernetes
>>> DashBoard
>>>
>>> When I run command ' kubectl get deployment’ I can got this:
>>> flink-jobmanager    1/1     1            1           33s
>>> flink-taskmanager   1/1     1            1           33s
>>>
>>> When I run command ' kubectl get all’ I can got this:
>>> NAME                                     READY   STATUS    RESTARTS   AGE
>>> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0
>>>  2m34s
>>> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0
>>>  2m34s
>>>
>>> So, I think flink-jobmanager works well, but taskmannger is restarted
>>> every few minutes
>>>
>>> My minikube version: v1.12.3
>>> Flink version:v1.11.1
>>>
>>> 在 2020年9月2日，下午4:27，Till Rohrmann <tr...@apache.org> 写道：
>>>
>>> Hi art,
>>>
>>> could you verify that the jobmanager-service has been started? It looks
>>> as if the name flink-jobmanager is not resolvable. It could also help to
>>> know the Minikube and K8s version you are using.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Sep 2, 2020 at 9:50 AM art <Su...@163.com> wrote:
>>>
>>>> Hi，I’m going to deploy flink on minikube referring to
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html
>>>> ;
>>>> kubectl create -f flink-configuration-configmap.yaml
>>>> kubectl create -f jobmanager-service.yaml
>>>> kubectl create -f jobmanager-session-deployment.yaml
>>>> kubectl create -f taskmanager-session-deployment.yaml
>>>>
>>>> But I got this
>>>>
>>>> 2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor
>>>>                     [] - Association with remote system [
>>>> akka.tcp://flink@flink-jobmanager:6123] has failed, address is now
>>>> gated for [50] ms. Reason: [Association failed with [
>>>> akka.tcp://flink@flink-jobmanager:6123]] Caused by:
>>>> [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name
>>>> resolution]
>>>> 2020-09-02 06:45:42,691 INFO
>>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>>>> not resolve ResourceManager address
>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>>> 2020-09-02 06:46:02,731 INFO
>>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>>>> not resolve ResourceManager address
>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>>> 2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor
>>>>                     [] - No response from remote for outbound association.
>>>> Associate timed out after [20000 ms].
>>>>
>>>> And when I run the command 'kubectl exec -ti
>>>> flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’
>>>> , I find I cannot ping flink-jobmanager from taskmanager
>>>>
>>>> I am new to k8s, can anyone give me some tutorial? Thanks a lot !
>>>>
>>>
>>>
>>

Re: Fail to deploy Flink on minikube

Posted by superainbower <su...@163.com>.

Hi Till,
I find something may be helpful.
The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip 172.18.0.6
When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn -- /bin/bash’ && ‘ping 172.18.0.5’ 
I can get response
But when I ping flink-jobmanager ,there is no response


| |
superainbower
|
|
superainbower@163.com
|
签名由网易邮箱大师定制


On 09/3/2020 09:03，superainbower<su...@163.com> wrote：
Hi Till,
This is the taskManager log
As you see, the logs print  ‘line 92 -- Could not connect to flink-jobmanager:6123’
then print ‘line 128 --Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’   And repeat print this


A few minutes later, the taskmanger shut down and restart


This is my yaml files, could u help me to confirm did I omitted something? Thanks a lot!
---------------------------------------------------
flink-configuration-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: flink-config
  labels:
    app: flink
data:
  flink-conf.yaml: |+
    jobmanager.rpc.address: flink-jobmanager
    taskmanager.numberOfTaskSlots: 1
    blob.server.port: 6124
    jobmanager.rpc.port: 6123
    taskmanager.rpc.port: 6122
    queryable-state.proxy.ports: 6125
    jobmanager.memory.process.size: 1024m
    taskmanager.memory.process.size: 1024m
    parallelism.default: 1
  log4j-console.properties: |+
    rootLogger.level = INFO
    rootLogger.appenderRef.console.ref = ConsoleAppender
    rootLogger.appenderRef.rolling.ref = RollingFileAppender
    logger.akka.name = akka
    logger.akka.level = INFO
    logger.kafka.name= org.apache.kafka
    logger.kafka.level = INFO
    logger.hadoop.name = org.apache.hadoop
    logger.hadoop.level = INFO
    logger.zookeeper.name = org.apache.zookeeper
    logger.zookeeper.level = INFO
    appender.console.name = ConsoleAppender
    appender.console.type = CONSOLE
    appender.console.layout.type = PatternLayout
    appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
    appender.rolling.name = RollingFileAppender
    appender.rolling.type = RollingFile
    appender.rolling.append = false
    appender.rolling.fileName = ${sys:log.file}
    appender.rolling.filePattern = ${sys:log.file}.%i
    appender.rolling.layout.type = PatternLayout
    appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
    appender.rolling.policies.type = Policies
    appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
    appender.rolling.policies.size.size=100MB
    appender.rolling.strategy.type = DefaultRolloverStrategy
    appender.rolling.strategy.max = 10
    logger.netty.name = org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
    logger.netty.level = OFF
---------------------------------------------------
jobmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: flink-jobmanager
spec:
  type: ClusterIP
  ports:
  - name: rpc
    port: 6123
  - name: blob-server
    port: 6124
  - name: webui
    port: 8081
  selector:
    app: flink
    component: jobmanager
--------------------------------------------------
jobmanager-session-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flink-jobmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flink
      component: jobmanager
  template:
    metadata:
      labels:
        app: flink
        component: jobmanager
    spec:
      containers:
      - name: jobmanager
        image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
        args: ["jobmanager"]
        ports:
        - containerPort: 6123
          name: rpc
        - containerPort: 6124
          name: blob-server
        - containerPort: 8081
          name: webui
        livenessProbe:
          tcpSocket:
            port: 6123
          initialDelaySeconds: 30
          periodSeconds: 60
        volumeMounts:
        - name: flink-config-volume
          mountPath: /opt/flink/conf
        securityContext:
          runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary
      volumes:
      - name: flink-config-volume
        configMap:
          name: flink-config
          items:
          - key: flink-conf.yaml
            path: flink-conf.yaml
          - key: log4j-console.properties
            path: log4j-console.properties
      imagePullSecrets:
        - name: regcred
---------------------------------------------------
taskmanager-session-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flink-taskmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flink
      component: taskmanager
  template:
    metadata:
      labels:
        app: flink
        component: taskmanager
    spec:
      containers:
      - name: taskmanager
        image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
        args: ["taskmanager"]
        ports:
        - containerPort: 6122
          name: rpc
        - containerPort: 6125
          name: query-state
        livenessProbe:
          tcpSocket:
            port: 6122
          initialDelaySeconds: 30
          periodSeconds: 60
        volumeMounts:
        - name: flink-config-volume
          mountPath: /opt/flink/conf/
        securityContext:
          runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary
      volumes:
      - name: flink-config-volume
        configMap:
          name: flink-config
          items:
          - key: flink-conf.yaml
            path: flink-conf.yaml
          - key: log4j-console.properties
            path: log4j-console.properties
      imagePullSecrets:
        - name: regcred
       


| |
superainbower
|
|
superainbower@163.com
|
签名由网易邮箱大师定制


On 09/2/2020 20:38，Till Rohrmann<tr...@apache.org> wrote：
Hmm, this is indeed strange. Could you share the logs of the TaskManager with us? Ideally you set the log level to debug. Thanks a lot.


Cheers,
Till


On Wed, Sep 2, 2020 at 12:45 PM art <Su...@163.com> wrote:

Hi Till,
  
The full information when I run command ' kubectl get all’  like this:


NAME                                     READY   STATUS    RESTARTS   AGE
pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s
pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s


NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/flink-jobmanager   ClusterIP   10.103.207.75   <none>        6123/TCP,6124/TCP,8081/TCP   2m34s
service/kubernetes         ClusterIP   10.96.0.1       <none>        443/TCP                      5d2h


NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/flink-jobmanager    1/1     1            1           2m34s
deployment.apps/flink-taskmanager   1/1     1            1           2m34s


NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/flink-jobmanager-85bdbd98d8    1         1         1       2m34s
replicaset.apps/flink-taskmanager-74c68c6f48   1         1         1       2m34s


And I can open flink ui but the task manger is 0 ,so the job manger is work well
I think the problem is taksmanger can not register itself to jobmanger,  did I miss some configure?




在 2020年9月2日，下午5:24，Till Rohrmann <tr...@apache.org> 写道：


Hi art,


could you check what `kubectl get services` returns? Usually if you run `kubectl get all` you should also see the services. But in your case there are no services listed. You have see something like service/flink-jobmanager otherwise the flink-jobmanager service (K8s service) is not running.


Cheers,
Till


On Wed, Sep 2, 2020 at 11:15 AM art <Su...@163.com> wrote:

Hi Till,


I’m sure the job manager-service is started, I can find it in Kubernetes DashBoard


When I run command ' kubectl get deployment’ I can got this:
flink-jobmanager    1/1     1            1           33s
flink-taskmanager   1/1     1            1           33s


When I run command ' kubectl get all’ I can got this:
NAME                                     READY   STATUS    RESTARTS   AGE
pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s
pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s


So, I think flink-jobmanager works well, but taskmannger is restarted every few minutes 


My minikube version: v1.12.3
Flink version:v1.11.1



在 2020年9月2日，下午4:27，Till Rohrmann <tr...@apache.org> 写道：


Hi art,


could you verify that the jobmanager-service has been started? It looks as if the name flink-jobmanager is not resolvable. It could also help to know the Minikube and K8s version you are using.


Cheers,
Till


On Wed, Sep 2, 2020 at 9:50 AM art <Su...@163.com> wrote:

Hi，I’m going to deploy flink on minikube referring to https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html;
kubectl create -f flink-configuration-configmap.yaml
kubectl create -f jobmanager-service.yaml
kubectl create -f jobmanager-session-deployment.yaml
kubectl create -f taskmanager-session-deployment.yaml


But I got this


2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name resolution]
2020-09-02 06:45:42,691 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2020-09-02 06:46:02,731 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor                     [] - No response from remote for outbound association. Associate timed out after [20000 ms]. 


And when I run the command 'kubectl exec -ti flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’ , I find I cannot ping flink-jobmanager from taskmanager


I am new to k8s, can anyone give me some tutorial? Thanks a lot !

Re: Fail to deploy Flink on minikube

Posted by superainbower <su...@163.com>.

Hi Till,
This is the taskManager log
As you see, the logs print  ‘line 92 -- Could not connect to flink-jobmanager:6123’
then print ‘line 128 --Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’   And repeat print this


A few minutes later, the taskmanger shut down and restart


This is my yaml files, could u help me to confirm did I omitted something? Thanks a lot!
---------------------------------------------------
flink-configuration-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: flink-config
  labels:
    app: flink
data:
  flink-conf.yaml: |+
    jobmanager.rpc.address: flink-jobmanager
    taskmanager.numberOfTaskSlots: 1
    blob.server.port: 6124
    jobmanager.rpc.port: 6123
    taskmanager.rpc.port: 6122
    queryable-state.proxy.ports: 6125
    jobmanager.memory.process.size: 1024m
    taskmanager.memory.process.size: 1024m
    parallelism.default: 1
  log4j-console.properties: |+
    rootLogger.level = INFO
    rootLogger.appenderRef.console.ref = ConsoleAppender
    rootLogger.appenderRef.rolling.ref = RollingFileAppender
    logger.akka.name = akka
    logger.akka.level = INFO
    logger.kafka.name= org.apache.kafka
    logger.kafka.level = INFO
    logger.hadoop.name = org.apache.hadoop
    logger.hadoop.level = INFO
    logger.zookeeper.name = org.apache.zookeeper
    logger.zookeeper.level = INFO
    appender.console.name = ConsoleAppender
    appender.console.type = CONSOLE
    appender.console.layout.type = PatternLayout
    appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
    appender.rolling.name = RollingFileAppender
    appender.rolling.type = RollingFile
    appender.rolling.append = false
    appender.rolling.fileName = ${sys:log.file}
    appender.rolling.filePattern = ${sys:log.file}.%i
    appender.rolling.layout.type = PatternLayout
    appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
    appender.rolling.policies.type = Policies
    appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
    appender.rolling.policies.size.size=100MB
    appender.rolling.strategy.type = DefaultRolloverStrategy
    appender.rolling.strategy.max = 10
    logger.netty.name = org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
    logger.netty.level = OFF
---------------------------------------------------
jobmanager-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: flink-jobmanager
spec:
  type: ClusterIP
  ports:
  - name: rpc
    port: 6123
  - name: blob-server
    port: 6124
  - name: webui
    port: 8081
  selector:
    app: flink
    component: jobmanager
--------------------------------------------------
jobmanager-session-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flink-jobmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flink
      component: jobmanager
  template:
    metadata:
      labels:
        app: flink
        component: jobmanager
    spec:
      containers:
      - name: jobmanager
        image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
        args: ["jobmanager"]
        ports:
        - containerPort: 6123
          name: rpc
        - containerPort: 6124
          name: blob-server
        - containerPort: 8081
          name: webui
        livenessProbe:
          tcpSocket:
            port: 6123
          initialDelaySeconds: 30
          periodSeconds: 60
        volumeMounts:
        - name: flink-config-volume
          mountPath: /opt/flink/conf
        securityContext:
          runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary
      volumes:
      - name: flink-config-volume
        configMap:
          name: flink-config
          items:
          - key: flink-conf.yaml
            path: flink-conf.yaml
          - key: log4j-console.properties
            path: log4j-console.properties
      imagePullSecrets:
        - name: regcred
---------------------------------------------------
taskmanager-session-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: flink-taskmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: flink
      component: taskmanager
  template:
    metadata:
      labels:
        app: flink
        component: taskmanager
    spec:
      containers:
      - name: taskmanager
        image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
        args: ["taskmanager"]
        ports:
        - containerPort: 6122
          name: rpc
        - containerPort: 6125
          name: query-state
        livenessProbe:
          tcpSocket:
            port: 6122
          initialDelaySeconds: 30
          periodSeconds: 60
        volumeMounts:
        - name: flink-config-volume
          mountPath: /opt/flink/conf/
        securityContext:
          runAsUser: 9999  # refers to user _flink_ from official flink image, change if necessary
      volumes:
      - name: flink-config-volume
        configMap:
          name: flink-config
          items:
          - key: flink-conf.yaml
            path: flink-conf.yaml
          - key: log4j-console.properties
            path: log4j-console.properties
      imagePullSecrets:
        - name: regcred
       


| |
superainbower
|
|
superainbower@163.com
|
签名由网易邮箱大师定制


On 09/2/2020 20:38，Till Rohrmann<tr...@apache.org> wrote：
Hmm, this is indeed strange. Could you share the logs of the TaskManager with us? Ideally you set the log level to debug. Thanks a lot.


Cheers,
Till


On Wed, Sep 2, 2020 at 12:45 PM art <Su...@163.com> wrote:

Hi Till,
  
The full information when I run command ' kubectl get all’  like this:


NAME                                     READY   STATUS    RESTARTS   AGE
pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s
pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s


NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/flink-jobmanager   ClusterIP   10.103.207.75   <none>        6123/TCP,6124/TCP,8081/TCP   2m34s
service/kubernetes         ClusterIP   10.96.0.1       <none>        443/TCP                      5d2h


NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/flink-jobmanager    1/1     1            1           2m34s
deployment.apps/flink-taskmanager   1/1     1            1           2m34s


NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/flink-jobmanager-85bdbd98d8    1         1         1       2m34s
replicaset.apps/flink-taskmanager-74c68c6f48   1         1         1       2m34s


And I can open flink ui but the task manger is 0 ,so the job manger is work well
I think the problem is taksmanger can not register itself to jobmanger,  did I miss some configure?




在 2020年9月2日，下午5:24，Till Rohrmann <tr...@apache.org> 写道：


Hi art,


could you check what `kubectl get services` returns? Usually if you run `kubectl get all` you should also see the services. But in your case there are no services listed. You have see something like service/flink-jobmanager otherwise the flink-jobmanager service (K8s service) is not running.


Cheers,
Till


On Wed, Sep 2, 2020 at 11:15 AM art <Su...@163.com> wrote:

Hi Till,


I’m sure the job manager-service is started, I can find it in Kubernetes DashBoard


When I run command ' kubectl get deployment’ I can got this:
flink-jobmanager    1/1     1            1           33s
flink-taskmanager   1/1     1            1           33s


When I run command ' kubectl get all’ I can got this:
NAME                                     READY   STATUS    RESTARTS   AGE
pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s
pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s


So, I think flink-jobmanager works well, but taskmannger is restarted every few minutes 


My minikube version: v1.12.3
Flink version:v1.11.1



在 2020年9月2日，下午4:27，Till Rohrmann <tr...@apache.org> 写道：


Hi art,


could you verify that the jobmanager-service has been started? It looks as if the name flink-jobmanager is not resolvable. It could also help to know the Minikube and K8s version you are using.


Cheers,
Till


On Wed, Sep 2, 2020 at 9:50 AM art <Su...@163.com> wrote:

Hi，I’m going to deploy flink on minikube referring to https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html;
kubectl create -f flink-configuration-configmap.yaml
kubectl create -f jobmanager-service.yaml
kubectl create -f jobmanager-session-deployment.yaml
kubectl create -f taskmanager-session-deployment.yaml


But I got this


2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name resolution]
2020-09-02 06:45:42,691 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2020-09-02 06:46:02,731 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor                     [] - No response from remote for outbound association. Associate timed out after [20000 ms]. 


And when I run the command 'kubectl exec -ti flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’ , I find I cannot ping flink-jobmanager from taskmanager


I am new to k8s, can anyone give me some tutorial? Thanks a lot !

Re: Fail to deploy Flink on minikube

Posted by Till Rohrmann <tr...@apache.org>.

Hmm, this is indeed strange. Could you share the logs of the TaskManager
with us? Ideally you set the log level to debug. Thanks a lot.

Cheers,
Till

On Wed, Sep 2, 2020 at 12:45 PM art <Su...@163.com> wrote:

> Hi Till,
>
> The full information when I run command ' kubectl get all’  like this:
>
> NAME                                     READY   STATUS    RESTARTS   AGE
> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s
> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s
>
> NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP
> PORT(S)                      AGE
> service/flink-jobmanager   ClusterIP   10.103.207.75   <none>
>  6123/TCP,6124/TCP,8081/TCP   2m34s
> service/kubernetes         ClusterIP   10.96.0.1       <none>
>  443/TCP                      5d2h
>
> NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
> deployment.apps/flink-jobmanager    1/1     1            1           2m34s
> deployment.apps/flink-taskmanager   1/1     1            1           2m34s
>
> NAME                                           DESIRED   CURRENT   READY
> AGE
> replicaset.apps/flink-jobmanager-85bdbd98d8    1         1         1
> 2m34s
> replicaset.apps/flink-taskmanager-74c68c6f48   1         1         1
> 2m34s
>
> And I can open flink ui but the task manger is 0 ,so the job manger is
> work well
> I think the problem is taksmanger can not register itself to jobmanger,
>  did I miss some configure?
>
>
> 在 2020年9月2日，下午5:24，Till Rohrmann <tr...@apache.org> 写道：
>
> Hi art,
>
> could you check what `kubectl get services` returns? Usually if you run
> `kubectl get all` you should also see the services. But in your case there
> are no services listed. You have see something like
> service/flink-jobmanager otherwise the flink-jobmanager service (K8s
> service) is not running.
>
> Cheers,
> Till
>
> On Wed, Sep 2, 2020 at 11:15 AM art <Su...@163.com> wrote:
>
>> Hi Till,
>>
>> I’m sure the job manager-service is started, I can find it in Kubernetes
>> DashBoard
>>
>> When I run command ' kubectl get deployment’ I can got this:
>> flink-jobmanager    1/1     1            1           33s
>> flink-taskmanager   1/1     1            1           33s
>>
>> When I run command ' kubectl get all’ I can got this:
>> NAME                                     READY   STATUS    RESTARTS   AGE
>> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0
>>  2m34s
>> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0
>>  2m34s
>>
>> So, I think flink-jobmanager works well, but taskmannger is restarted
>> every few minutes
>>
>> My minikube version: v1.12.3
>> Flink version:v1.11.1
>>
>> 在 2020年9月2日，下午4:27，Till Rohrmann <tr...@apache.org> 写道：
>>
>> Hi art,
>>
>> could you verify that the jobmanager-service has been started? It looks
>> as if the name flink-jobmanager is not resolvable. It could also help to
>> know the Minikube and K8s version you are using.
>>
>> Cheers,
>> Till
>>
>> On Wed, Sep 2, 2020 at 9:50 AM art <Su...@163.com> wrote:
>>
>>> Hi，I’m going to deploy flink on minikube referring to
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html
>>> ;
>>> kubectl create -f flink-configuration-configmap.yaml
>>> kubectl create -f jobmanager-service.yaml
>>> kubectl create -f jobmanager-session-deployment.yaml
>>> kubectl create -f taskmanager-session-deployment.yaml
>>>
>>> But I got this
>>>
>>> 2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor
>>>                   [] - Association with remote system [
>>> akka.tcp://flink@flink-jobmanager:6123] has failed, address is now
>>> gated for [50] ms. Reason: [Association failed with [
>>> akka.tcp://flink@flink-jobmanager:6123]] Caused by:
>>> [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name
>>> resolution]
>>> 2020-09-02 06:45:42,691 INFO
>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>>> not resolve ResourceManager address
>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>> 2020-09-02 06:46:02,731 INFO
>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>>> not resolve ResourceManager address
>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>> 2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor
>>>                   [] - No response from remote for outbound association.
>>> Associate timed out after [20000 ms].
>>>
>>> And when I run the command 'kubectl exec -ti
>>> flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’
>>> , I find I cannot ping flink-jobmanager from taskmanager
>>>
>>> I am new to k8s, can anyone give me some tutorial? Thanks a lot !
>>>
>>
>>
>

Re: Fail to deploy Flink on minikube

Posted by art <Su...@163.com>.

Hi Till,
  
The full information when I run command ' kubectl get all’  like this:

NAME                                     READY   STATUS    RESTARTS   AGE
pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s
pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s

NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/flink-jobmanager   ClusterIP   10.103.207.75   <none>        6123/TCP,6124/TCP,8081/TCP   2m34s
service/kubernetes         ClusterIP   10.96.0.1       <none>        443/TCP                      5d2h

NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/flink-jobmanager    1/1     1            1           2m34s
deployment.apps/flink-taskmanager   1/1     1            1           2m34s

NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/flink-jobmanager-85bdbd98d8    1         1         1       2m34s
replicaset.apps/flink-taskmanager-74c68c6f48   1         1         1       2m34s

And I can open flink ui but the task manger is 0 ,so the job manger is work well
I think the problem is taksmanger can not register itself to jobmanger,  did I miss some configure?


> 在 2020年9月2日，下午5:24，Till Rohrmann <tr...@apache.org> 写道：
> 
> Hi art,
> 
> could you check what `kubectl get services` returns? Usually if you run `kubectl get all` you should also see the services. But in your case there are no services listed. You have see something like service/flink-jobmanager otherwise the flink-jobmanager service (K8s service) is not running.
> 
> Cheers,
> Till
> 
> On Wed, Sep 2, 2020 at 11:15 AM art <Superainbower@163.com <ma...@163.com>> wrote:
> Hi Till,
> 
> I’m sure the job manager-service is started, I can find it in Kubernetes DashBoard
> 
> When I run command ' kubectl get deployment’ I can got this:
> flink-jobmanager    1/1     1            1           33s
> flink-taskmanager   1/1     1            1           33s
> 
> When I run command ' kubectl get all’ I can got this:
> NAME                                     READY   STATUS    RESTARTS   AGE
> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s
> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s
> 
> So, I think flink-jobmanager works well, but taskmannger is restarted every few minutes 
> 
> My minikube version: v1.12.3
> Flink version:v1.11.1
> 
>> 在 2020年9月2日，下午4:27，Till Rohrmann <trohrmann@apache.org <ma...@apache.org>> 写道：
>> 
>> Hi art,
>> 
>> could you verify that the jobmanager-service has been started? It looks as if the name flink-jobmanager is not resolvable. It could also help to know the Minikube and K8s version you are using.
>> 
>> Cheers,
>> Till
>> 
>> On Wed, Sep 2, 2020 at 9:50 AM art <Superainbower@163.com <ma...@163.com>> wrote:
>> Hi，I’m going to deploy flink on minikube referring to https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html <https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html>;
>> kubectl create -f flink-configuration-configmap.yaml
>> kubectl create -f jobmanager-service.yaml
>> kubectl create -f jobmanager-session-deployment.yaml
>> kubectl create -f taskmanager-session-deployment.yaml
>> 
>> But I got this
>> 
>> 2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor                       [] - Association with remote system [akka.tcp://flink@flink-jobmanager:6123 <>] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123 <>]] Caused by: [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name resolution]
>> 2020-09-02 06:45:42,691 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_* <>, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_* <>.
>> 2020-09-02 06:46:02,731 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_* <>, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_* <>.
>> 2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor                     [] - No response from remote for outbound association. Associate timed out after [20000 ms]. 
>> 
>> And when I run the command 'kubectl exec -ti flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’ , I find I cannot ping flink-jobmanager from taskmanager
>> 
>> I am new to k8s, can anyone give me some tutorial? Thanks a lot !
>

Re: Fail to deploy Flink on minikube

Posted by Till Rohrmann <tr...@apache.org>.

Hi art,

could you check what `kubectl get services` returns? Usually if you run
`kubectl get all` you should also see the services. But in your case there
are no services listed. You have see something like
service/flink-jobmanager otherwise the flink-jobmanager service (K8s
service) is not running.

Cheers,
Till

On Wed, Sep 2, 2020 at 11:15 AM art <Su...@163.com> wrote:

> Hi Till,
>
> I’m sure the job manager-service is started, I can find it in Kubernetes
> DashBoard
>
> When I run command ' kubectl get deployment’ I can got this:
> flink-jobmanager    1/1     1            1           33s
> flink-taskmanager   1/1     1            1           33s
>
> When I run command ' kubectl get all’ I can got this:
> NAME                                     READY   STATUS    RESTARTS   AGE
> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0          2m34s
> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0          2m34s
>
> So, I think flink-jobmanager works well, but taskmannger is restarted
> every few minutes
>
> My minikube version: v1.12.3
> Flink version:v1.11.1
>
> 在 2020年9月2日，下午4:27，Till Rohrmann <tr...@apache.org> 写道：
>
> Hi art,
>
> could you verify that the jobmanager-service has been started? It looks as
> if the name flink-jobmanager is not resolvable. It could also help to know
> the Minikube and K8s version you are using.
>
> Cheers,
> Till
>
> On Wed, Sep 2, 2020 at 9:50 AM art <Su...@163.com> wrote:
>
>> Hi，I’m going to deploy flink on minikube referring to
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html
>> ;
>> kubectl create -f flink-configuration-configmap.yaml
>> kubectl create -f jobmanager-service.yaml
>> kubectl create -f jobmanager-session-deployment.yaml
>> kubectl create -f taskmanager-session-deployment.yaml
>>
>> But I got this
>>
>> 2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor
>>                   [] - Association with remote system [
>> akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated
>> for [50] ms. Reason: [Association failed with [
>> akka.tcp://flink@flink-jobmanager:6123]] Caused by:
>> [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name
>> resolution]
>> 2020-09-02 06:45:42,691 INFO
>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>> not resolve ResourceManager address
>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>> 2020-09-02 06:46:02,731 INFO
>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>> not resolve ResourceManager address
>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>> 2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor
>>                   [] - No response from remote for outbound association.
>> Associate timed out after [20000 ms].
>>
>> And when I run the command 'kubectl exec -ti
>> flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’
>> , I find I cannot ping flink-jobmanager from taskmanager
>>
>> I am new to k8s, can anyone give me some tutorial? Thanks a lot !
>>
>
>

Re: Fail to deploy Flink on minikube

Posted by Till Rohrmann <tr...@apache.org>.

Hi art,

could you verify that the jobmanager-service has been started? It looks as
if the name flink-jobmanager is not resolvable. It could also help to know
the Minikube and K8s version you are using.

Cheers,
Till

On Wed, Sep 2, 2020 at 9:50 AM art <Su...@163.com> wrote:

> Hi，I’m going to deploy flink on minikube referring to
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html
> ;
> kubectl create -f flink-configuration-configmap.yaml
> kubectl create -f jobmanager-service.yaml
> kubectl create -f jobmanager-session-deployment.yaml
> kubectl create -f taskmanager-session-deployment.yaml
>
> But I got this
>
> 2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor
>                 [] - Association with remote system [
> akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated
> for [50] ms. Reason: [Association failed with [
> akka.tcp://flink@flink-jobmanager:6123]] Caused by:
> [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name
> resolution]
> 2020-09-02 06:45:42,691 INFO
>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
> not resolve ResourceManager address
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
> 2020-09-02 06:46:02,731 INFO
>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
> not resolve ResourceManager address
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
> 2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor
>                 [] - No response from remote for outbound association.
> Associate timed out after [20000 ms].
>
> And when I run the command 'kubectl exec -ti
> flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’
> , I find I cannot ping flink-jobmanager from taskmanager
>
> I am new to k8s, can anyone give me some tutorial? Thanks a lot !
>