You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Xintong Song (Jira)" <ji...@apache.org> on 2023/04/13 08:42:00 UTC
[jira] [Closed] (FLINK-27328) Could not resolve ResourceManager address
[ https://issues.apache.org/jira/browse/FLINK-27328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xintong Song closed FLINK-27328.
--------------------------------
Resolution: Not A Bug
Closing the ticket due to:
1. There's no further feedback from the user on [~wangyang0918]'s explanation.
2. 1.14 is no longer supported
> Could not resolve ResourceManager address
> -----------------------------------------
>
> Key: FLINK-27328
> URL: https://issues.apache.org/jira/browse/FLINK-27328
> Project: Flink
> Issue Type: Bug
> Components: Deployment / Kubernetes
> Affects Versions: 1.14.4
> Environment: h3. JobManager
> {{apiVersion: v1
> kind: Service
> metadata:
> name: jobmanager-cs
> spec:
> type: NodePort
> ports:
> - name: ui
> port: 8081
> selector:
> app: flink
> component: jobmanager
> ---
> apiVersion: v1
> kind: Service
> metadata:
> name: jobmanager-hs
> spec:
> type: ClusterIP
> ports:
> - port: 6123
> name: rpc
> - port: 6124
> name: blob-server
> - port: 6125
> name: query
> selector:
> app: flink
> component: jobmanager
> ---
> apiVersion: apps/v1
> kind: Deployment
> metadata:
> name: flink-jobmanager
> spec:
> selector:
> matchLabels:
> app: flink
> template:
> metadata:
> labels:
> app: flink
> component: jobmanager
> spec:
> restartPolicy: Always
> containers:
> - name: jobmanager
> image: flink:1.13.1-scala_2.12
> command: [bash,"-ec",bin/jobmanager.sh start-foreground cluster]
> resources:
> limits:
> memory: "2024Mi"
> cpu: "500m"
> env:
> - name: JOB_MANAGER_ID
> valueFrom:
> fieldRef:
> apiVersion: v1
> fieldPath: status.podIP
> - name: POD_IP
> valueFrom:
> fieldRef:
> apiVersion: v1
> fieldPath: status.podIP
> # The following args overwrite the value of jobmanager.rpc.address configured in the configuration config map to POD_IP.
> args: ["standalone-job", "--host", "$POD_IP", "--job-classname", "org.apache.flink.application.Main"] #, <optional arguments>, <job arguments>] optional arguments: ["--job-id", "<job id>", "--fromSavepoint", "/path/to/savepoint", "--allowNonRestoredState"]
> ports:
> - containerPort: 6123
> name: rpc
> - containerPort: 6124
> name: blob-server
> - containerPort: 6125
> name: query
> - containerPort: 8081
> name: webui
> volumeMounts:
> - name: flink-config-volume
> mountPath: /opt/flink/conf
> - name: job-artifacts-volume
> mountPath: /opt/flink/usrlib
> securityContext:
> runAsUser: 9999
> volumes:
> - name: flink-config-volume
> configMap:
> name: flink-config
> items:
> - key: flink-conf.yaml
> path: flink-conf.yaml
> - key: log4j-console.properties
> path: log4j-console.properties
> - name: job-artifacts-volume
> hostPath:
> path: /config/flink}}
> h3. Task Manager
> {{apiVersion: apps/v1
> kind: Deployment
> metadata:
> name: flink-taskmanager
> spec:
> replicas: 2
> selector:
> matchLabels:
> app: flink
> component: taskmanager
> template:
> metadata:
> labels:
> app: flink
> component: taskmanager
> spec:
> containers:
> - name: taskmanager
> image: flink:1.13.1-scala_2.12
> env:
> - name: K8S_POD_IP
> valueFrom:
> fieldRef:
> fieldPath: status.podIP
> command: ["/bin/sh", "-ec", "sleep 1000"]
> resources:
> limits:
> memory: "800Mi"
> cpu: "2000m"
> args: ["taskmanager","start-foreground","-Dtaskmanager.host=$K8S_POD_IP"]
> ports:
> - containerPort: 6122
> name: rpc
> - containerPort: 6125
> name: query-state
> volumeMounts:
> - name: flink-config-volume
> mountPath: /opt/flink/conf/
> - name: job-artifacts-volume
> mountPath: /opt/flink/usrlib
> securityContext:
> runAsUser: 9999
> volumes:
> - name: flink-config-volume
> configMap:
> name: flink-config
> items:
> - key: flink-conf.yaml
> path: flink-conf.yaml
> - key: log4j-console.properties
> path: log4j-console.properties
> - name: job-artifacts-volume
> hostPath:
> path: /config/flink}}
> h3. ConfigMap
> {{apiVersion: v1
> kind: ConfigMap
> metadata:
> name: flink-config
> labels:
> app: flink
> data:
> flink-conf.yaml: |+
> jobmanager.rpc.address: jobmanager-hs
> taskmanager.numberOfTaskSlots: 1
> blob.server.port: 6124
> jobmanager.rpc.port: 6123
> taskmanager.rpc.port: 6122
> taskmanager.heap.size: 1024m
> jobmanager.heap.size: 1024m
> state.backend: filesystem
> s3.access-key: k8sdemo
> s3.secret-key: k8sdemo123
> state.checkpoints.dir: /opt/flink/usrlib/checkpoints
> state.savepoints.dir: /opt/flink/usrlib/savepoints
> metrics.reporters: prom
> metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
> metrics.reporter.promport: 9249
> queryable-state.proxy.ports: 6125
> jobmanager.memory.process.size: 1600m
> taskmanager.memory.process.size: 1728m
> parallelism.default: 1
> rest.address: 0.0.0.0
> rest.bind-address: 0.0.0.0
> jobmanager.execution.failover-strategy: region
> log4j-console.properties: |+
> # This affects logging for both user code and Flink
> rootLogger.level = DEBUG
> rootLogger.appenderRef.console.ref = ConsoleAppender
> rootLogger.appenderRef.rolling.ref = RollingFileAppender
> # Uncomment this if you want to _only_ change Flink's logging
> #logger.flink.name = org.apache.flink
> #logger.flink.level = DEBUG
> # The following lines keep the log level of common libraries/connectors on
> # log level INFO. The root logger does not override this. You have to manually
> # change the log levels here.
> logger.akka.name = akka
> logger.akka.level = DEBUG
> logger.kafka.name= org.apache.kafka
> logger.kafka.level = DEBUG
> logger.hadoop.name = org.apache.hadoop
> logger.hadoop.level = DEBUG
> logger.zookeeper.name = org.apache.zookeeper
> logger.zookeeper.level = DEBUG
> # Log all infos to the console
> appender.console.name = ConsoleAppender
> appender.console.type = CONSOLE
> appender.console.layout.type = PatternLayout
> appender.console.layout.pattern = %d\{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
> # Log all infos in the given rolling file
> appender.rolling.name = RollingFileAppender
> appender.rolling.type = RollingFile
> appender.rolling.append = false
> appender.rolling.fileName = ${sys:log.file}
> appender.rolling.filePattern = ${sys:log.file}.%i
> appender.rolling.layout.type = PatternLayout
> appender.rolling.layout.pattern = %d\{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n
> appender.rolling.policies.type = Policies
> appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
> appender.rolling.policies.size.size=100MB
> appender.rolling.strategy.type = DefaultRolloverStrategy
> appender.rolling.strategy.max = 10
> # Suppress the irrelevant (wrong) warnings from the Netty channel handler
> logger.netty.name = org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
> logger.netty.level = OFF }}
> Reporter: Viswanath Shanmugam
> Priority: Not a Priority
>
> and when i try to run the Task Manager with the follow command
> {quote}bin/taskmanager start-foreground -Dtaskmanager.host=$K8S_POD_IP
> {quote}
> I have the following exception
> JobManager :
> {quote}2021-08-27 09:16:57,917 ERROR akka.remote.EndpointWriter [] - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@jobmanager-hs:6123/]] arriving at [akka.tcp://flink@jobmanager-hs:6123] inbound addresses are [akka.tcp://flink@cluster:6123]
> 2021-08-27 09:17:01,255 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
> 2021-08-27 09:17:01,284 DEBUG org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger heartbeat request.
> 2021-08-27 09:17:10,008 DEBUG akka.remote.transport.netty.NettyTransport [] - Remote connection to [/172.17.0.1:34827] was disconnected because of [id: 0x13ae1d03, /172.17.0.1:34827 :> /172.17.0.23:6123] DISCONNECTED
> 2021-08-27 09:17:10,008 DEBUG akka.remote.transport.ProtocolStateActor [] - Association between local [tcp://flink@cluster:6123] and remote [tcp://flink@172.17.0.1:34827] was disassociated because the ProtocolStateActor failed: Unknown
> 2021-08-27 09:17:10,009 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@172.17.0.24:6122] has failed, address is now gated for [50] ms. Reason: [Disassociated]
> {quote}
> TaskManager:
> {quote}INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_{_}, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_{_}.
> INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_{_}, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_{_}.
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)