You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by vipul singh <ne...@gmail.com> on 2018/07/27 07:46:52 UTC

Flink on kubernetes: taskmanager error

Hello,

I am trying to run flink on a kubernetes cluster using minikube and
kubectl. I am following this example
<https://github.com/sedgewickmm18/flink-kubernetes>, which runs a flink 1.2
cluster ok.

I am interested in running flink 1.5.1, but when I modify the flink
version, I start to see these exceptions in taskmanager-controller logs.
The exceptions are below:

2018-07-27 07:34:22,429 INFO  org.apache.flink.core.fs.FileSystem
>                 - Hadoop is not in the classpath/dependencies. The
> extended set of supported File Systems via Hadoop is not available.
>
> 2018-07-27 07:34:22,442 INFO
> org.apache.flink.runtime.security.modules.HadoopModuleFactory  - Cannot
> create Hadoop Security Module because Hadoop cannot be found in the
> Classpath.
>
> 2018-07-27 07:34:22,460 INFO  org.apache.flink.runtime.security.SecurityUtils
>               - Cannot install HadoopSecurityContext because Hadoop
> cannot be found in the Classpath.
>
> 2018-07-27 07:34:22,622 WARN  org.apache.flink.configuration.Configuration
>                 - Config uses deprecated configuration key
> 'jobmanager.rpc.address' instead of proper key 'rest.address'
>
> 2018-07-27 07:34:22,626 INFO
> org.apache.flink.runtime.util.LeaderRetrievalUtils            - Trying to
> select the network interface and address to use by connecting to the
> leading JobManager.
>
> 2018-07-27 07:34:22,626 INFO
> org.apache.flink.runtime.util.LeaderRetrievalUtils            -
> TaskManager will try to connect for 10000 milliseconds before falling back
> to heuristics
>
> 2018-07-27 07:34:22,629 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Retrieved new target address
> taskmanager-controller-vncdz/172.17.0.7:6123.
>
> 2018-07-27 07:34:23,390 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Trying to connect to address
> taskmanager-controller-vncdz/172.17.0.7:6123
>
> 2018-07-27 07:34:23,391 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address
> 'taskmanager-controller-vncdz/172.17.0.7': Connection refused (Connection
> refused)
>
> 2018-07-27 07:34:23,391 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/172.17.0.7':
> Connection refused (Connection refused)
>
> 2018-07-27 07:34:23,392 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/172.17.0.7':
> Connection refused (Connection refused)
>
> 2018-07-27 07:34:23,392 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/127.0.0.1': Connection
> refused (Connection refused)
>
> 2018-07-27 07:34:23,393 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/172.17.0.7':
> Connection refused (Connection refused)
>
> 2018-07-27 07:34:23,393 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/127.0.0.1': Connection
> refused (Connection refused)
>
> 2018-07-27 07:34:24,195 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Trying to connect to address
> taskmanager-controller-vncdz/172.17.0.7:6123
>
> 2018-07-27 07:34:24,196 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address
> 'taskmanager-controller-vncdz/172.17.0.7': Connection refused (Connection
> refused)
>
> 2018-07-27 07:34:24,197 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/172.17.0.7':
> Connection refused (Connection refused)
>
> 2018-07-27 07:34:24,198 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/172.17.0.7':
> Connection refused (Connection refused)
>
> 2018-07-27 07:34:24,198 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/127.0.0.1': Connection
> refused (Connection refused)
>
> 2018-07-27 07:34:24,199 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/172.17.0.7':
> Connection refused (Connection refused)
>
> 2018-07-27 07:34:24,199 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/127.0.0.1': Connection
> refused (Connection refused)
>
> 2018-07-27 07:34:25,803 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Trying to connect to address
> taskmanager-controller-vncdz/172.17.0.7:6123
>
> 2018-07-27 07:34:25,811 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address
> 'taskmanager-controller-vncdz/172.17.0.7': Connection refused (Connection
> refused)
>
> 2018-07-27 07:34:25,811 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/172.17.0.7':
> Connection refused (Connection refused)
>
> 2018-07-27 07:34:25,812 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/172.17.0.7':
> Connection refused (Connection refused)
>
> 2018-07-27 07:34:25,812 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/127.0.0.1': Connection
> refused (Connection refused)
>
> 2018-07-27 07:34:25,813 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/172.17.0.7':
> Connection refused (Connection refused)
>
> 2018-07-27 07:34:25,813 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/127.0.0.1': Connection
> refused (Connection refused)
>
> 2018-07-27 07:34:29,018 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Trying to connect to address
> taskmanager-controller-vncdz/172.17.0.7:6123
>
> 2018-07-27 07:34:29,098 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address
> 'taskmanager-controller-vncdz/172.17.0.7': Connection refused (Connection
> refused)
>
> 2018-07-27 07:34:29,098 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/172.17.0.7':
> Connection refused (Connection refused)
>
> 2018-07-27 07:34:29,099 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/172.17.0.7':
> Connection refused (Connection refused)
>
> 2018-07-27 07:34:29,099 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/127.0.0.1': Connection
> refused (Connection refused)
>
> 2018-07-27 07:34:29,100 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/172.17.0.7':
> Connection refused (Connection refused)
>
> 2018-07-27 07:34:29,102 INFO  org.apache.flink.runtime.net.ConnectionUtils
>                 - Failed to connect from address '/127.0.0.1': Connection
> refused (Connection refused)
>
> 2018-07-27 07:34:32,628 WARN  org.apache.flink.runtime.net.ConnectionUtils
>                 - Could not connect to taskmanager-controller-vncdz/
> 172.17.0.7:6123. Selecting a local address using heuristics.
>
> 2018-07-27 07:34:32,630 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>       - TaskManager will use hostname/address
> 'taskmanager-controller-vncdz' (172.17.0.7) for communication.
>
> 2018-07-27 07:34:32,663 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils
>         - Starting AkkaRpcService at taskmanager-controller-vncdz:0.
>
> 2018-07-27 07:34:33,574 INFO  akka.event.slf4j.Slf4jLogger
>                   - Slf4jLogger started
>
> 2018-07-27 07:34:34,335 INFO  akka.remote.Remoting
>                   - Starting remoting
>
> 2018-07-27 07:34:34,661 INFO  akka.remote.Remoting
>                   - Remoting started; listening on addresses
> :[akka.tcp://flink@taskmanager-controller-vncdz:39769]
>
> 2018-07-27 07:34:34,698 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl
>           - No metrics reporter configured, no metrics will be
> exposed/reported.
>
> 2018-07-27 07:34:34,710 INFO
> org.apache.flink.runtime.blob.PermanentBlobCache              - Created
> BLOB cache storage directory
> /tmp/blobStore-376e1f5a-810b-4999-91eb-ca5292b50d12
>
> 2018-07-27 07:34:34,714 INFO
> org.apache.flink.runtime.blob.TransientBlobCache              - Created
> BLOB cache storage directory
> /tmp/blobStore-fb08f586-2992-4d4a-9e75-ed501bbdc4e3
>
> 2018-07-27 07:34:34,718 INFO  org.apache.flink.runtime.io.network.netty.NettyConfig
>         - NettyConfig [server address: taskmanager-controller-vncdz/
> 172.17.0.7, server port: 0, ssl enabled: false, memory segment size
> (bytes): 32768, transport type: NIO, number of server threads: 2 (manual),
> number of client threads: 2 (manual), server connect backlog: 0 (use
> Netty's default), client connect timeout (sec): 120, send/receive buffer
> size (bytes): 0 (use Netty's default)]
>
> 2018-07-27 07:34:34,916 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices
>     - Temporary file directory '/tmp': total 16 GB, usable 12 GB (75.00%
> usable)
>
> 2018-07-27 07:34:35,605 INFO
> org.apache.flink.runtime.io.network.buffer.NetworkBufferPool  - Allocated
> 102 MB for network buffer pool (number of memory segments: 3278, bytes per
> segment: 32768).
>
> 2018-07-27 07:34:35,899 INFO
> org.apache.flink.runtime.query.QueryableStateUtils            - Could not
> load Queryable State Client Proxy. Probable reason:
> flink-queryable-state-runtime is not in the classpath. To enable Queryable
> State, please move the flink-queryable-state-runtime jar from the opt to
> the lib folder.
>
> 2018-07-27 07:34:35,900 INFO
> org.apache.flink.runtime.query.QueryableStateUtils            - Could not
> load Queryable State Server. Probable reason: flink-queryable-state-runtime
> is not in the classpath. To enable Queryable State, please move the
> flink-queryable-state-runtime jar from the opt to the lib folder.
>
> 2018-07-27 07:34:35,901 INFO
> org.apache.flink.runtime.io.network.NetworkEnvironment        - Starting
> the network environment and its components.
>
> 2018-07-27 07:34:35,946 INFO  org.apache.flink.runtime.io.network.netty.NettyClient
>         - Successful initialization (took 37 ms).
>
> 2018-07-27 07:34:35,988 INFO  org.apache.flink.runtime.io.network.netty.NettyServer
>         - Successful initialization (took 42 ms). Listening on
> SocketAddress /172.17.0.7:41451.
>
> 2018-07-27 07:34:35,990 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices
>     - Limiting managed memory to 0.7 of the currently free heap space
> (641 MB), memory will be allocated lazily.
>
> 2018-07-27 07:34:36,000 INFO
> org.apache.flink.runtime.io.disk.iomanager.IOManager          - I/O
> manager uses directory /tmp/flink-io-48184970-5e3d-4ae7-9ba4-40850532367a
> for spill files.
>
> 2018-07-27 07:34:36,008 INFO  org.apache.flink.runtime.filecache.FileCache
>                 - User file cache uses directory
> /tmp/flink-dist-cache-adbfd785-de17-48ae-8677-cf360db1fac2
>
> 2018-07-27 07:34:36,199 INFO
> org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration  -
> Messages have a max timeout of 10000 ms
>
> 2018-07-27 07:34:36,211 INFO
> org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting
> RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at
> akka://flink/user/taskmanager_0 .
>
> 2018-07-27 07:34:36,226 INFO
> org.apache.flink.runtime.taskexecutor.JobLeaderService        - Start job
> leader service.
>
> 2018-07-27 07:34:36,231 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            -
> Connecting to ResourceManager akka.tcp://flink@taskmanager-controller-vncdz
> :6123/user/resourcemanager(00000000000000000000000000000000).
>
> 2018-07-27 07:34:36,513 WARN  akka.remote.transport.netty.NettyTransport
>                   - Remote connection to [null] failed with
> java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/
> 172.17.0.7:6123
>
> 2018-07-27 07:34:36,513 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address
> is now gated for [50] ms. Reason: [Association failed with
> [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by:
> [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]
>
> 2018-07-27 07:34:36,520 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..
>
> 2018-07-27 07:34:47,228 WARN  akka.remote.transport.netty.NettyTransport
>                   - Remote connection to [null] failed with
> java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/
> 172.17.0.7:6123
>
> 2018-07-27 07:34:47,233 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address
> is now gated for [50] ms. Reason: [Association failed with
> [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by:
> [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]
>
> 2018-07-27 07:34:47,234 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..
>
> 2018-07-27 07:34:57,255 WARN  akka.remote.transport.netty.NettyTransport
>                   - Remote connection to [null] failed with
> java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/
> 172.17.0.7:6123
>
> 2018-07-27 07:34:57,255 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address
> is now gated for [50] ms. Reason: [Association failed with
> [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by:
> [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]
>
> 2018-07-27 07:34:57,256 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..
>
> 2018-07-27 07:35:07,274 WARN  akka.remote.transport.netty.NettyTransport
>                   - Remote connection to [null] failed with
> java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/
> 172.17.0.7:6123
>
> 2018-07-27 07:35:07,276 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address
> is now gated for [50] ms. Reason: [Association failed with
> [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by:
> [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]
>
> 2018-07-27 07:35:07,276 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..
>
> 2018-07-27 07:35:17,294 WARN  akka.remote.transport.netty.NettyTransport
>                   - Remote connection to [null] failed with
> java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/
> 172.17.0.7:6123
>
> 2018-07-27 07:35:17,300 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..
>
> 2018-07-27 07:35:17,300 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address
> is now gated for [50] ms. Reason: [Association failed with
> [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by:
> [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]
>
> 2018-07-27 07:35:27,315 WARN  akka.remote.transport.netty.NettyTransport
>                   - Remote connection to [null] failed with
> java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/
> 172.17.0.7:6123
>
> 2018-07-27 07:35:27,316 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address
> is now gated for [50] ms. Reason: [Association failed with
> [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by:
> [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]
>
> 2018-07-27 07:35:27,318 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..
>
> 2018-07-27 07:35:37,340 WARN  akka.remote.transport.netty.NettyTransport
>                   - Remote connection to [null] failed with
> java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/
> 172.17.0.7:6123
>
> 2018-07-27 07:35:37,341 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address
> is now gated for [50] ms. Reason: [Association failed with
> [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by:
> [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]
>
> 2018-07-27 07:35:37,343 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..
>
> 2018-07-27 07:35:47,364 WARN  akka.remote.transport.netty.NettyTransport
>                   - Remote connection to [null] failed with
> java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/
> 172.17.0.7:6123
>
> 2018-07-27 07:35:47,365 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address
> is now gated for [50] ms. Reason: [Association failed with
> [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by:
> [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]
>
> 2018-07-27 07:35:47,365 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..
>
> 2018-07-27 07:35:57,385 WARN  akka.remote.transport.netty.NettyTransport
>                   - Remote connection to [null] failed with
> java.net.ConnectException: Connection refused: taskmanager-controller-vncdz/
> 172.17.0.7:6123
>
> 2018-07-27 07:35:57,387 WARN  akka.remote.ReliableDeliverySupervisor
>                   - Association with remote system
> [akka.tcp://flink@taskmanager-controller-vncdz:6123] has failed, address
> is now gated for [50] ms. Reason: [Association failed with
> [akka.tcp://flink@taskmanager-controller-vncdz:6123]] Caused by:
> [Connection refused: taskmanager-controller-vncdz/172.17.0.7:6123]
>
> 2018-07-27 07:35:57,387 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
> resolve ResourceManager address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@taskmanager-controller-vncdz:6123/user/resourcemanager..
>
>

Could anyone point me to as to what is wrong? This is my taskmanager
controller
<https://github.com/sedgewickmm18/flink-kubernetes/blob/master/taskmanager-controller.yaml>
file.

Also could someone please point me to some other docs if they exist, about
running flink 1.5 end to end on kubernetes.

Thanks,
Vipul