You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Kumar Bolar, Harshith" <hk...@arity.com> on 2018/10/22 10:52:30 UTC

Flink JobManager is not starting when running on a standalone cluster

Hi all,

We run Flink on a five node cluster – three task managers, two job managers. One of the job manager running on flink2-0 node is down and refuses to come back up, so the cluster is currently running with a single job manager. When I restart the service, I see this in the logs. Any idea what this issue might be?


2018-10-22 06:43:50,458 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager actor

2018-10-22 06:43:50,462 INFO  org.apache.flink.runtime.blob.BlobServer                      - Created BLOB server storage directory /tmp/blobStore-73e8dbe2-8fdb-4310-84d4-c9f3445723f3

2018-10-22 06:43:50,466 INFO  org.apache.flink.runtime.blob.BlobServer                      - Enabling ssl for the blob server

2018-10-22 06:43:50,482 INFO  org.apache.flink.runtime.blob.BlobServer                      - Started BLOB server at 0.0.0.0:36880 - max concurrent requests: 50 - max backlog: 1000

2018-10-22 06:43:50,501 INFO  org.apache.flink.runtime.jobmanager.MemoryArchivist           - Started memory archivist akka://flink/user/archive

2018-10-22 06:43:50,525 INFO  org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - Starting ZooKeeperLeaderRetrievalService.

2018-10-22 06:43:50,525 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager at akka.ssl.tcp://flink@flink2-0.flink2.us-east-1.prod.xxxxxxx.io:22902/user/jobmanager.

2018-10-22 06:43:50,526 INFO  org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  - Starting ZooKeeperLeaderElectionService org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService@2805f48f.

2018-10-22 06:43:50,532 INFO  org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - Starting ZooKeeperLeaderRetrievalService.

2018-10-22 06:43:50,557 INFO  org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager  - Received leader address but not running in leader ActorSystem. Cancelling registration.

Thanks,
Harshith

Re: Flink JobManager is not starting when running on a standalone cluster

Posted by miki haiat <mi...@gmail.com>.
I think it`s related to this issue
https://issues.apache.org/jira/browse/FLINK-10011




On Mon, Oct 22, 2018 at 1:52 PM Kumar Bolar, Harshith <hk...@arity.com>
wrote:

> Hi all,
>
>
>
> We run Flink on a five node cluster – three task managers, two job
> managers. One of the job manager running on flink2-0 node is down and
> refuses to come back up, so the cluster is currently running with a single
> job manager. When I restart the service, I see this in the logs. Any idea
> what this issue might be?
>
>
>
> 2018-10-22 06:43:50,458 INFO
> org.apache.flink.runtime.jobmanager.JobManager                - Starting
> JobManager actor
>
> 2018-10-22 06:43:50,462 INFO  org.apache.flink.runtime.blob.BlobServer
>                   - Created BLOB server storage directory
> /tmp/blobStore-73e8dbe2-8fdb-4310-84d4-c9f3445723f3
>
> 2018-10-22 06:43:50,466 INFO  org.apache.flink.runtime.blob.BlobServer
>                   - Enabling ssl for the blob server
>
> 2018-10-22 06:43:50,482 INFO  org.apache.flink.runtime.blob.BlobServer
>                   - Started BLOB server at 0.0.0.0:36880 - max concurrent
> requests: 50 - max backlog: 1000
>
> 2018-10-22 06:43:50,501 INFO  org.apache.flink.runtime.jobmanager.MemoryArchivist
>           - Started memory archivist akka://flink/user/archive
>
> 2018-10-22 06:43:50,525 INFO
> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
> Starting ZooKeeperLeaderRetrievalService.
>
> 2018-10-22 06:43:50,525 INFO
> org.apache.flink.runtime.jobmanager.JobManager                - Starting
> JobManager at akka.ssl.tcp://
> flink@flink2-0.flink2.us-east-1.prod.xxxxxxx.io:22902/user/jobmanager.
>
> 2018-10-22 06:43:50,526 INFO
> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService  -
> Starting ZooKeeperLeaderElectionService
> org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService@2805f48f.
>
> 2018-10-22 06:43:50,532 INFO
> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  -
> Starting ZooKeeperLeaderRetrievalService.
>
> 2018-10-22 06:43:50,557 INFO
> org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager
> - Received leader address but not running in leader ActorSystem.
> Cancelling registration.
>
>
>
> Thanks,
>
> Harshith
>