You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by amir bahmanyari <am...@yahoo.com.INVALID> on 2017/04/13 00:54:10 UTC

Jobmanager drops upon submitting a jar



Hi Colleagues,I have a simple test job when I submit it to the Flink cluster the JM seems to disconnect.Its a one node cluster implemented in a VirtualBox Centos 7 VM.Flink starts fine and everything else look fine. Following is stack trace.I appreciate a feedback.Cheers

17/04/12 15:53:04 INFO node.Node: Connected to Node 192.168.56.101
17/04/12 15:53:04 INFO config.ConfigurationProvider: Opened bucket default
17/04/12 15:53:04 INFO config.ConfigurationProvider: Closed bucket default
17/04/12 15:53:04 INFO node.Node: Disconnected from Node 192.168.56.101
17/04/12 15:53:07 INFO kafka.FlinkKafkaConsumerBase: Trying to get topic metadata from broker localhost:9092 in try 0/3
17/04/12 15:53:07 INFO kafka.FlinkKafkaConsumerBase: Consumer is going to read the following topics (with number of partitions): abc_pharma_qa (2),
17/04/12 15:53:07 INFO environment.RemoteStreamEnvironment: Running remotely at 192.168.56.101:6123
17/04/12 15:53:07 INFO program.StandaloneClusterClient: Submitting job with JobID: c9c717d6a6d0d5ce9a8758b0fb7dae7c. Waiting for job completion.
Submitting job with JobID: c9c717d6a6d0d5ce9a8758b0fb7dae7c. Waiting for job completion.
17/04/12 15:53:07 INFO program.StandaloneClusterClient: Starting client actor system.
17/04/12 15:53:08 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/04/12 15:53:08 INFO Remoting: Starting remoting
17/04/12 15:53:08 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://flink@127.0.0.1:32776]
17/04/12 15:53:08 INFO client.JobClientActor: Received job test (c9c717d6a6d0d5ce9a8758b0fb7dae7c).
17/04/12 15:53:08 INFO client.JobClientActor: Could not submit job test (c9c717d6a6d0d5ce9a8758b0fb7dae7c), because there is no connection to a JobManager.
17/04/12 15:53:08 INFO client.JobClientActor: Disconnect from JobManager null.

17/04/12 15:53:08 WARN remote.ReliableDeliverySupervisor: Association with remote system [akka.tcp://flink@192.168.56.101:6123] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
17/04/12 15:54:08 INFO client.JobClientActor: Terminate JobClientActor.
17/04/12 15:54:08 INFO client.JobClientActor: Disconnect from JobManager null.
17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
Exception in thread "main" org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Communication with JobManager failed: Lost connection to the JobManager.

        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:413)
        at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:92)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:389)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:381)
        at org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.executeRemotely(RemoteStreamEnvironment.java:209)
        at org.apache.flink.streaming.api.environment.RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:173)
        at com.rfxcel.rts.operations.EventProcessorDriver.start(EventProcessorDriver.java:103)
        at com.rfxcel.rts.operations.EventProcessorDriver.main(EventProcessorDriver.java:109)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Communication with JobManager failed: Lost connection to the JobManager.
        at org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:137)
        at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:409)
        ... 7 more
Caused by: org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException: Lost connection to the JobManager.
        at org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:245)
        at org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:90)
        at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:70)
        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Re: Jobmanager drops upon submitting a jar

Posted by amir bahmanyari <am...@yahoo.com.INVALID>.

Thanks so much for your help.Below is whats in JM logs.Appreciate your feedback.
2017-04-12 15:51:01,723 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2017-04-12 15:51:01,836 INFO  org.apache.flink.runtime.jobmanager.JobManager                - --------------------------------------------------------------------------------2017-04-12 15:51:01,836 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Starting JobManager (Version: 1.2.0, Rev:1c659cf, Date:29.01.2017 @ 21:19:15 UTC)2017-04-12 15:51:01,836 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Current user: root2017-04-12 15:51:01,836 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.121-b132017-04-12 15:51:01,836 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Maximum heap size: 245 MiBytes2017-04-12 15:51:01,836 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  JAVA_HOME: /opt/software/jdk1.8.0_1212017-04-12 15:51:01,840 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Hadoop version: 2.7.22017-04-12 15:51:01,840 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  JVM Options:2017-04-12 15:51:01,840 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     -Xms256m2017-04-12 15:51:01,841 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     -Xmx256m2017-04-12 15:51:01,841 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     -Dlog.file=/opt/software/flink-1.2.0/log/flink-root-jobmanager-0-localhost.localdomain.log2017-04-12 15:51:01,841 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     -Dlog4j.configuration=file:/opt/software/flink-1.2.0/conf/log4j.properties2017-04-12 15:51:01,841 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     -Dlogback.configurationFile=file:/opt/software/flink-1.2.0/conf/logback.xml2017-04-12 15:51:01,841 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Program Arguments:2017-04-12 15:51:01,841 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     --configDir2017-04-12 15:51:01,841 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     /opt/software/flink-1.2.0/conf2017-04-12 15:51:01,841 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     --executionMode2017-04-12 15:51:01,841 INFO  org.apache.flink.runtime.jobmanager.JobManager                -     cluster2017-04-12 15:51:01,841 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Classpath: /opt/software/flink-1.2.0/lib/log4j-1.2.17.jar:/opt/software/flink-1.2.0/lib/flink-python_2.11-1.2.0.jar:/opt/software/flink-1.2.0/lib/flink-dist_2.11-1.2.0.jar:/opt/software/flink-1.2.0/lib/slf4j-log4j12-1.7.7.jar:::2017-04-12 15:51:01,841 INFO  org.apache.flink.runtime.jobmanager.JobManager                - --------------------------------------------------------------------------------2017-04-12 15:51:01,842 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Registered UNIX signal handlers for [TERM, HUP, INT]2017-04-12 15:51:02,025 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Loading configuration from /opt/software/flink-1.2.0/conf2017-04-12 15:51:02,031 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, localhost2017-04-12 15:51:02,031 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 61232017-04-12 15:51:02,031 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.mb, 2562017-04-12 15:51:02,032 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.mb, 5122017-04-12 15:51:02,032 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 12017-04-12 15:51:02,032 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.memory.preallocate, false2017-04-12 15:51:02,032 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 12017-04-12 15:51:02,032 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.web.port, 80812017-04-12 15:51:02,043 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager without high-availability2017-04-12 15:51:02,070 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager on localhost:6123 with execution mode CLUSTER2017-04-12 15:51:02,082 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, localhost2017-04-12 15:51:02,082 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 61232017-04-12 15:51:02,105 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.mb, 2562017-04-12 15:51:02,105 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.mb, 5122017-04-12 15:51:02,106 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 12017-04-12 15:51:02,106 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.memory.preallocate, false2017-04-12 15:51:02,106 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 12017-04-12 15:51:02,106 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.web.port, 80812017-04-12 15:51:02,166 INFO  org.apache.flink.runtime.security.modules.HadoopModule        - Hadoop user set to root (auth:SIMPLE)2017-04-12 15:51:02,292 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager actor system reachable at localhost:61232017-04-12 15:51:02,792 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started2017-04-12 15:51:02,898 INFO  Remoting                                                      - Starting remoting2017-04-12 15:51:03,375 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager web frontend2017-04-12 15:51:03,382 INFO  Remoting                                                      - Remoting started; listening on addresses :[akka.tcp://flink@localhost:6123]2017-04-12 15:51:03,427 INFO  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Determined location of JobManager log file: /opt/software/flink-1.2.0/log/flink-root-jobmanager-0-localhost.localdomain.log2017-04-12 15:51:03,427 INFO  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Determined location of JobManager stdout file: /opt/software/flink-1.2.0/log/flink-root-jobmanager-0-localhost.localdomain.out2017-04-12 15:51:03,427 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Using directory /tmp/flink-web-d4f8595b-ea51-4bcc-8945-80208cdf461c for the web interface files2017-04-12 15:51:03,428 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Using directory /tmp/flink-web-ca6454d0-d777-4c64-ac13-f198681808a5 for web frontend JAR file uploads2017-04-12 15:51:03,848 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Web frontend listening at 0:0:0:0:0:0:0:0:80812017-04-12 15:51:03,849 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager actor2017-04-12 15:51:03,853 INFO  org.apache.flink.runtime.blob.BlobServer                      - Created BLOB server storage directory /tmp/blobStore-84d3fae1-9bb7-461a-91a5-e2b19f4f393b2017-04-12 15:51:03,855 INFO  org.apache.flink.runtime.blob.BlobServer                      - Started BLOB server at 0.0.0.0:36449 - max concurrent requests: 50 - max backlog: 10002017-04-12 15:51:03,866 INFO  org.apache.flink.runtime.metrics.MetricRegistry               - No metrics reporter configured, no metrics will be exposed/reported.2017-04-12 15:51:03,870 INFO  org.apache.flink.runtime.webmonitor.WebRuntimeMonitor         - Starting with JobManager akka.tcp://flink@localhost:6123/user/jobmanager on port 80812017-04-12 15:51:03,870 INFO  org.apache.flink.runtime.webmonitor.JobManagerRetriever       - New leader reachable under akka.tcp://flink@localhost:6123/user/jobmanager:null.2017-04-12 15:51:03,876 INFO  org.apache.flink.runtime.jobmanager.MemoryArchivist           - Started memory archivist akka://flink/user/archive2017-04-12 15:51:03,880 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager at akka.tcp://flink@localhost:6123/user/jobmanager.2017-04-12 15:51:03,942 INFO  org.apache.flink.runtime.jobmanager.JobManager                - JobManager akka.tcp://flink@localhost:6123/user/jobmanager was granted leadership with leader session ID None.2017-04-12 15:51:03,981 INFO  org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager  - Trying to associate with JobManager leader akka.tcp://flink@localhost:6123/user/jobmanager2017-04-12 15:51:03,991 INFO  org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager  - Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager#-601510079] - leader session null2017-04-12 15:51:08,899 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at localhost (akka.tcp://flink@localhost.localdomain:40810/user/taskmanager) as a56afe70c7c3a99e72e7deb539b6ded1. Current number of registered hosts is 1. Current number of alive task slots is 1.2017-04-12 15:51:08,901 INFO  org.apache.flink.runtime.clusterframework.standalone.StandaloneResourceManager  - TaskManager ResourceID{resourceId='70603e7d3e95cfd0ab1e6bdace98358c'} has started.2017-04-12 15:53:08,728 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@127.0.0.1:32776] has failed, address is now gated for [5000] ms. Reason: [scala.Option; local class incompatible: stream classdesc serialVersionUID = -2062608324514658839, local class serialVersionUID = -114498752079829388] 2017-04-12 15:53:08,729 ERROR Remoting                                                      - scala.Option; local class incompatible: stream classdesc serialVersionUID = -2062608324514658839, local class serialVersionUID = -114498752079829388java.io.InvalidClassException: scala.Option; local class incompatible: stream classdesc serialVersionUID = -2062608324514658839, local class serialVersionUID = -114498752079829388 at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1829) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1829) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1986) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:192) at akka.serialization.Serialization.deserialize(Serialization.scala:98) at akka.remote.serialization.MessageContainerSerializer.fromBinary(MessageContainerSerializer.scala:63) at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104) at scala.util.Try$.apply(Try.scala:192) at akka.serialization.Serialization.deserialize(Serialization.scala:98) at akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23) at akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:58) at akka.remote.DefaultMessageDispatcher.payloadClass$1(Endpoint.scala:59) at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:99) at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:967) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:437) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

      From: wenlong.lwl <we...@gmail.com>
 To: dev@flink.apache.org; amir bahmanyari <am...@yahoo.com> 
 Sent: Wednesday, April 12, 2017 8:01 PM
 Subject: Re: Jobmanager drops upon submitting a jar

Hi, amir, I think you could check the log of job manager to make sure that
job manager [192.168.56.101:6123 <http://flink@192.168.56.101:6123/>] is
running well firstly, you may get what is wrong in the log.

On 13 April 2017 at 08:54, amir bahmanyari <am...@yahoo.com.invalid>
wrote:

>
>
>
> Hi Colleagues,I have a simple test job when I submit it to the Flink
> cluster the JM seems to disconnect.Its a one node cluster implemented in a
> VirtualBox Centos 7 VM.Flink starts fine and everything else look fine.
> Following is stack trace.I appreciate a feedback.Cheers
>
> 17/04/12 15:53:04 INFO node.Node: Connected to Node 192.168.56.101
> 17/04/12 15:53:04 INFO config.ConfigurationProvider: Opened bucket default
> 17/04/12 15:53:04 INFO config.ConfigurationProvider: Closed bucket default
> 17/04/12 15:53:04 INFO node.Node: Disconnected from Node 192.168.56.101
> 17/04/12 15:53:07 INFO kafka.FlinkKafkaConsumerBase: Trying to get topic
> metadata from broker localhost:9092 in try 0/3
> 17/04/12 15:53:07 INFO kafka.FlinkKafkaConsumerBase: Consumer is going to
> read the following topics (with number of partitions): abc_pharma_qa (2),
> 17/04/12 15:53:07 INFO environment.RemoteStreamEnvironment: Running
> remotely at 192.168.56.101:6123
> 17/04/12 15:53:07 INFO program.StandaloneClusterClient: Submitting job
> with JobID: c9c717d6a6d0d5ce9a8758b0fb7dae7c. Waiting for job completion.
> Submitting job with JobID: c9c717d6a6d0d5ce9a8758b0fb7dae7c. Waiting for
> job completion.
> 17/04/12 15:53:07 INFO program.StandaloneClusterClient: Starting client
> actor system.
> 17/04/12 15:53:08 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 17/04/12 15:53:08 INFO Remoting: Starting remoting
> 17/04/12 15:53:08 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://flink@127.0.0.1:32776]
> 17/04/12 15:53:08 INFO client.JobClientActor: Received job test (
> c9c717d6a6d0d5ce9a8758b0fb7dae7c).
> 17/04/12 15:53:08 INFO client.JobClientActor: Could not submit job test (
> c9c717d6a6d0d5ce9a8758b0fb7dae7c), because there is no connection to a
> JobManager.
> 17/04/12 15:53:08 INFO client.JobClientActor: Disconnect from JobManager
> null.
>
> 17/04/12 15:53:08 WARN remote.ReliableDeliverySupervisor: Association
> with remote system [akka.tcp://flink@192.168.56.101:6123] has failed,
> address is now gated for [5000] ms. Reason is: [Disassociated].
> 17/04/12 15:54:08 INFO client.JobClientActor: Terminate JobClientActor.
> 17/04/12 15:54:08 INFO client.JobClientActor: Disconnect from JobManager
> null.
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:
> Shutting down remote daemon.
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:
> Remote daemon shut down; proceeding with flushing remote transports.
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:
> Remoting shut down.
> Exception in thread "main" org.apache.flink.client.program.ProgramInvocationException:
> The program execution failed: Communication with JobManager failed: Lost
> connection to the JobManager.
>
>        at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:413)
>        at org.apache.flink.client.program.StandaloneClusterClient.
> submitJob(StandaloneClusterClient.java:92)
>        at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:389)
>        at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:381)
>        at org.apache.flink.streaming.api.environment.
> RemoteStreamEnvironment.executeRemotely(RemoteStreamEnvironment.java:209)
>        at org.apache.flink.streaming.api.environment.
> RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:173)
>        at com.rfxcel.rts.operations.EventProcessorDriver.start(
> EventProcessorDriver.java:103)
>        at com.rfxcel.rts.operations.EventProcessorDriver.main(
> EventProcessorDriver.java:109)
> Caused by: org.apache.flink.runtime.client.JobExecutionException:
> Communication with JobManager failed: Lost connection to the JobManager.
>        at org.apache.flink.runtime.client.JobClient.
> submitJobAndWait(JobClient.java:137)
>        at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:409)
>        ... 7 more
> Caused by: org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException:
> Lost connection to the JobManager.
>        at org.apache.flink.runtime.client.JobClientActor.
> handleMessage(JobClientActor.java:245)
>        at org.apache.flink.runtime.akka.FlinkUntypedActor.
> handleLeaderSessionID(FlinkUntypedActor.java:90)
>        at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(
> FlinkUntypedActor.java:70)
>        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(
> UntypedActor.scala:167)
>        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
>        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>        at scala.concurrent.forkjoin.ForkJoinTask.doExec(
> ForkJoinTask.java:260)
>        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> pollAndExecAll(ForkJoinPool.java:1253)
>        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1346)
>        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
>        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
>

Re: Jobmanager drops upon submitting a jar

Posted by amir bahmanyari <am...@yahoo.com.INVALID>.

Hi Gordon,I am moving forward using FlinkKafkaConsumer08.Any best practices recomm. integrating Kafka+Flink?Thanks+regards

      From: Tzu-Li (Gordon) Tai <tz...@apache.org>
 To: dev@flink.apache.org 
 Sent: Sunday, April 16, 2017 10:25 PM
 Subject: Re: Jobmanager drops upon submitting a jar
   
Hi Amir,

What do you mean by “modernizing” the FlinkKafkaConsumer implementation? Could you explain a bit more?
Thanks :) Chiming in just to check if there’s any issue we need to be aware of ..

Cheers,
Gordon

On 14 April 2017 at 5:35:43 AM, amir bahmanyari (amirtousa@yahoo.com.invalid) wrote:

It ends up to be a release gap between the build env libs and the runtime.nothing else.Am updating everything to the latest+greatest.With the latest Flink, and the current (old) code the Maven reports:[ERROR]   symbol:   class FlinkKafkaConsumer08  
Meaning it needs to be replaced with the latest consumer object.  
Any suggestions on modernizing the FlinkKafkaConsumer implementation?Thanks+regards  

From: wenlong.lwl <we...@gmail.com>  
To: dev@flink.apache.org; amir bahmanyari <am...@yahoo.com>  
Sent: Wednesday, April 12, 2017 8:01 PM  
Subject: Re: Jobmanager drops upon submitting a jar  

Hi, amir, I think you could check the log of job manager to make sure that  
job manager [192.168.56.101:6123 <http://flink@192.168.56.101:6123/>] is  
running well firstly, you may get what is wrong in the log.  

On 13 April 2017 at 08:54, amir bahmanyari <am...@yahoo.com.invalid>  
wrote:  

>  
>  
>  
> Hi Colleagues,I have a simple test job when I submit it to the Flink  
> cluster the JM seems to disconnect.Its a one node cluster implemented in a  
> VirtualBox Centos 7 VM.Flink starts fine and everything else look fine.  
> Following is stack trace.I appreciate a feedback.Cheers  
>  
> 17/04/12 15:53:04 INFO node.Node: Connected to Node 192.168.56.101  
> 17/04/12 15:53:04 INFO config.ConfigurationProvider: Opened bucket default  
> 17/04/12 15:53:04 INFO config.ConfigurationProvider: Closed bucket default  
> 17/04/12 15:53:04 INFO node.Node: Disconnected from Node 192.168.56.101  
> 17/04/12 15:53:07 INFO kafka.FlinkKafkaConsumerBase: Trying to get topic  
> metadata from broker localhost:9092 in try 0/3  
> 17/04/12 15:53:07 INFO kafka.FlinkKafkaConsumerBase: Consumer is going to  
> read the following topics (with number of partitions): abc_pharma_qa (2),  
> 17/04/12 15:53:07 INFO environment.RemoteStreamEnvironment: Running  
> remotely at 192.168.56.101:6123  
> 17/04/12 15:53:07 INFO program.StandaloneClusterClient: Submitting job  
> with JobID: c9c717d6a6d0d5ce9a8758b0fb7dae7c. Waiting for job completion.  
> Submitting job with JobID: c9c717d6a6d0d5ce9a8758b0fb7dae7c. Waiting for  
> job completion.  
> 17/04/12 15:53:07 INFO program.StandaloneClusterClient: Starting client  
> actor system.  
> 17/04/12 15:53:08 INFO slf4j.Slf4jLogger: Slf4jLogger started  
> 17/04/12 15:53:08 INFO Remoting: Starting remoting  
> 17/04/12 15:53:08 INFO Remoting: Remoting started; listening on addresses  
> :[akka.tcp://flink@127.0.0.1:32776]  
> 17/04/12 15:53:08 INFO client.JobClientActor: Received job test (  
> c9c717d6a6d0d5ce9a8758b0fb7dae7c).  
> 17/04/12 15:53:08 INFO client.JobClientActor: Could not submit job test (  
> c9c717d6a6d0d5ce9a8758b0fb7dae7c), because there is no connection to a  
> JobManager.  
> 17/04/12 15:53:08 INFO client.JobClientActor: Disconnect from JobManager  
> null.  
>  
> 17/04/12 15:53:08 WARN remote.ReliableDeliverySupervisor: Association  
> with remote system [akka.tcp://flink@192.168.56.101:6123] has failed,  
> address is now gated for [5000] ms. Reason is: [Disassociated].  
> 17/04/12 15:54:08 INFO client.JobClientActor: Terminate JobClientActor.  
> 17/04/12 15:54:08 INFO client.JobClientActor: Disconnect from JobManager  
> null.  
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:  
> Shutting down remote daemon.  
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:  
> Remote daemon shut down; proceeding with flushing remote transports.  
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:  
> Remoting shut down.  
> Exception in thread "main" org.apache.flink.client.program.ProgramInvocationException:  
> The program execution failed: Communication with JobManager failed: Lost  
> connection to the JobManager.  
>  
>        at org.apache.flink.client.program.ClusterClient.run(  
> ClusterClient.java:413)  
>        at org.apache.flink.client.program.StandaloneClusterClient.  
> submitJob(StandaloneClusterClient.java:92)  
>        at org.apache.flink.client.program.ClusterClient.run(  
> ClusterClient.java:389)  
>        at org.apache.flink.client.program.ClusterClient.run(  
> ClusterClient.java:381)  
>        at org.apache.flink.streaming.api.environment.  
> RemoteStreamEnvironment.executeRemotely(RemoteStreamEnvironment.java:209)  
>        at org.apache.flink.streaming.api.environment.  
> RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:173)  
>        at com.rfxcel.rts.operations.EventProcessorDriver.start(  
> EventProcessorDriver.java:103)  
>        at com.rfxcel.rts.operations.EventProcessorDriver.main(  
> EventProcessorDriver.java:109)  
> Caused by: org.apache.flink.runtime.client.JobExecutionException:  
> Communication with JobManager failed: Lost connection to the JobManager.  
>        at org.apache.flink.runtime.client.JobClient.  
> submitJobAndWait(JobClient.java:137)  
>        at org.apache.flink.client.program.ClusterClient.run(  
> ClusterClient.java:409)  
>        ... 7 more  
> Caused by: org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException:  
> Lost connection to the JobManager.  
>        at org.apache.flink.runtime.client.JobClientActor.  
> handleMessage(JobClientActor.java:245)  
>        at org.apache.flink.runtime.akka.FlinkUntypedActor.  
> handleLeaderSessionID(FlinkUntypedActor.java:90)  
>        at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(  
> FlinkUntypedActor.java:70)  
>        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(  
> UntypedActor.scala:167)  
>        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)  
>        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)  
>        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)  
>        at akka.actor.ActorCell.invoke(ActorCell.scala:487)  
>        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)  
>        at akka.dispatch.Mailbox.run(Mailbox.scala:221)  
>        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)  
>        at scala.concurrent.forkjoin.ForkJoinTask.doExec(  
> ForkJoinTask.java:260)  
>        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.  
> pollAndExecAll(ForkJoinPool.java:1253)  
>        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.  
> runTask(ForkJoinPool.java:1346)  
>        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(  
> ForkJoinPool.java:1979)  
>        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(  
> ForkJoinWorkerThread.java:107)  
>

Re: Jobmanager drops upon submitting a jar

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.

Hi Amir,

What do you mean by “modernizing” the FlinkKafkaConsumer implementation? Could you explain a bit more?
Thanks :) Chiming in just to check if there’s any issue we need to be aware of ..

Cheers,
Gordon

On 14 April 2017 at 5:35:43 AM, amir bahmanyari (amirtousa@yahoo.com.invalid) wrote:

It ends up to be a release gap between the build env libs and the runtime.nothing else.Am updating everything to the latest+greatest.With the latest Flink, and the current (old) code the Maven reports:[ERROR]   symbol:   class FlinkKafkaConsumer08  
Meaning it needs to be replaced with the latest consumer object.  
Any suggestions on modernizing the FlinkKafkaConsumer implementation?Thanks+regards  

From: wenlong.lwl <we...@gmail.com>  
To: dev@flink.apache.org; amir bahmanyari <am...@yahoo.com>  
Sent: Wednesday, April 12, 2017 8:01 PM  
Subject: Re: Jobmanager drops upon submitting a jar  

Hi, amir, I think you could check the log of job manager to make sure that  
job manager [192.168.56.101:6123 <http://flink@192.168.56.101:6123/>] is  
running well firstly, you may get what is wrong in the log.  

On 13 April 2017 at 08:54, amir bahmanyari <am...@yahoo.com.invalid>  
wrote:  

>  
>  
>  
> Hi Colleagues,I have a simple test job when I submit it to the Flink  
> cluster the JM seems to disconnect.Its a one node cluster implemented in a  
> VirtualBox Centos 7 VM.Flink starts fine and everything else look fine.  
> Following is stack trace.I appreciate a feedback.Cheers  
>  
> 17/04/12 15:53:04 INFO node.Node: Connected to Node 192.168.56.101  
> 17/04/12 15:53:04 INFO config.ConfigurationProvider: Opened bucket default  
> 17/04/12 15:53:04 INFO config.ConfigurationProvider: Closed bucket default  
> 17/04/12 15:53:04 INFO node.Node: Disconnected from Node 192.168.56.101  
> 17/04/12 15:53:07 INFO kafka.FlinkKafkaConsumerBase: Trying to get topic  
> metadata from broker localhost:9092 in try 0/3  
> 17/04/12 15:53:07 INFO kafka.FlinkKafkaConsumerBase: Consumer is going to  
> read the following topics (with number of partitions): abc_pharma_qa (2),  
> 17/04/12 15:53:07 INFO environment.RemoteStreamEnvironment: Running  
> remotely at 192.168.56.101:6123  
> 17/04/12 15:53:07 INFO program.StandaloneClusterClient: Submitting job  
> with JobID: c9c717d6a6d0d5ce9a8758b0fb7dae7c. Waiting for job completion.  
> Submitting job with JobID: c9c717d6a6d0d5ce9a8758b0fb7dae7c. Waiting for  
> job completion.  
> 17/04/12 15:53:07 INFO program.StandaloneClusterClient: Starting client  
> actor system.  
> 17/04/12 15:53:08 INFO slf4j.Slf4jLogger: Slf4jLogger started  
> 17/04/12 15:53:08 INFO Remoting: Starting remoting  
> 17/04/12 15:53:08 INFO Remoting: Remoting started; listening on addresses  
> :[akka.tcp://flink@127.0.0.1:32776]  
> 17/04/12 15:53:08 INFO client.JobClientActor: Received job test (  
> c9c717d6a6d0d5ce9a8758b0fb7dae7c).  
> 17/04/12 15:53:08 INFO client.JobClientActor: Could not submit job test (  
> c9c717d6a6d0d5ce9a8758b0fb7dae7c), because there is no connection to a  
> JobManager.  
> 17/04/12 15:53:08 INFO client.JobClientActor: Disconnect from JobManager  
> null.  
>  
> 17/04/12 15:53:08 WARN remote.ReliableDeliverySupervisor: Association  
> with remote system [akka.tcp://flink@192.168.56.101:6123] has failed,  
> address is now gated for [5000] ms. Reason is: [Disassociated].  
> 17/04/12 15:54:08 INFO client.JobClientActor: Terminate JobClientActor.  
> 17/04/12 15:54:08 INFO client.JobClientActor: Disconnect from JobManager  
> null.  
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:  
> Shutting down remote daemon.  
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:  
> Remote daemon shut down; proceeding with flushing remote transports.  
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:  
> Remoting shut down.  
> Exception in thread "main" org.apache.flink.client.program.ProgramInvocationException:  
> The program execution failed: Communication with JobManager failed: Lost  
> connection to the JobManager.  
>  
>        at org.apache.flink.client.program.ClusterClient.run(  
> ClusterClient.java:413)  
>        at org.apache.flink.client.program.StandaloneClusterClient.  
> submitJob(StandaloneClusterClient.java:92)  
>        at org.apache.flink.client.program.ClusterClient.run(  
> ClusterClient.java:389)  
>        at org.apache.flink.client.program.ClusterClient.run(  
> ClusterClient.java:381)  
>        at org.apache.flink.streaming.api.environment.  
> RemoteStreamEnvironment.executeRemotely(RemoteStreamEnvironment.java:209)  
>        at org.apache.flink.streaming.api.environment.  
> RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:173)  
>        at com.rfxcel.rts.operations.EventProcessorDriver.start(  
> EventProcessorDriver.java:103)  
>        at com.rfxcel.rts.operations.EventProcessorDriver.main(  
> EventProcessorDriver.java:109)  
> Caused by: org.apache.flink.runtime.client.JobExecutionException:  
> Communication with JobManager failed: Lost connection to the JobManager.  
>        at org.apache.flink.runtime.client.JobClient.  
> submitJobAndWait(JobClient.java:137)  
>        at org.apache.flink.client.program.ClusterClient.run(  
> ClusterClient.java:409)  
>        ... 7 more  
> Caused by: org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException:  
> Lost connection to the JobManager.  
>        at org.apache.flink.runtime.client.JobClientActor.  
> handleMessage(JobClientActor.java:245)  
>        at org.apache.flink.runtime.akka.FlinkUntypedActor.  
> handleLeaderSessionID(FlinkUntypedActor.java:90)  
>        at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(  
> FlinkUntypedActor.java:70)  
>        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(  
> UntypedActor.scala:167)  
>        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)  
>        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)  
>        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)  
>        at akka.actor.ActorCell.invoke(ActorCell.scala:487)  
>        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)  
>        at akka.dispatch.Mailbox.run(Mailbox.scala:221)  
>        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)  
>        at scala.concurrent.forkjoin.ForkJoinTask.doExec(  
> ForkJoinTask.java:260)  
>        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.  
> pollAndExecAll(ForkJoinPool.java:1253)  
>        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.  
> runTask(ForkJoinPool.java:1346)  
>        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(  
> ForkJoinPool.java:1979)  
>        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(  
> ForkJoinWorkerThread.java:107)  
>

Re: Jobmanager drops upon submitting a jar

Posted by amir bahmanyari <am...@yahoo.com.INVALID>.

It ends up to be a release gap between the build env libs and the runtime.nothing else.Am updating everything to the latest+greatest.With the latest Flink, and the current (old) code the Maven reports:[ERROR]   symbol:   class FlinkKafkaConsumer08
Meaning it needs to be replaced with the latest consumer object.
Any suggestions on modernizing the FlinkKafkaConsumer implementation?Thanks+regards

      From: wenlong.lwl <we...@gmail.com>
 To: dev@flink.apache.org; amir bahmanyari <am...@yahoo.com> 
 Sent: Wednesday, April 12, 2017 8:01 PM
 Subject: Re: Jobmanager drops upon submitting a jar
   
Hi, amir, I think you could check the log of job manager to make sure that
job manager [192.168.56.101:6123 <http://flink@192.168.56.101:6123/>] is
running well firstly, you may get what is wrong in the log.

On 13 April 2017 at 08:54, amir bahmanyari <am...@yahoo.com.invalid>
wrote:

>
>
>
> Hi Colleagues,I have a simple test job when I submit it to the Flink
> cluster the JM seems to disconnect.Its a one node cluster implemented in a
> VirtualBox Centos 7 VM.Flink starts fine and everything else look fine.
> Following is stack trace.I appreciate a feedback.Cheers
>
> 17/04/12 15:53:04 INFO node.Node: Connected to Node 192.168.56.101
> 17/04/12 15:53:04 INFO config.ConfigurationProvider: Opened bucket default
> 17/04/12 15:53:04 INFO config.ConfigurationProvider: Closed bucket default
> 17/04/12 15:53:04 INFO node.Node: Disconnected from Node 192.168.56.101
> 17/04/12 15:53:07 INFO kafka.FlinkKafkaConsumerBase: Trying to get topic
> metadata from broker localhost:9092 in try 0/3
> 17/04/12 15:53:07 INFO kafka.FlinkKafkaConsumerBase: Consumer is going to
> read the following topics (with number of partitions): abc_pharma_qa (2),
> 17/04/12 15:53:07 INFO environment.RemoteStreamEnvironment: Running
> remotely at 192.168.56.101:6123
> 17/04/12 15:53:07 INFO program.StandaloneClusterClient: Submitting job
> with JobID: c9c717d6a6d0d5ce9a8758b0fb7dae7c. Waiting for job completion.
> Submitting job with JobID: c9c717d6a6d0d5ce9a8758b0fb7dae7c. Waiting for
> job completion.
> 17/04/12 15:53:07 INFO program.StandaloneClusterClient: Starting client
> actor system.
> 17/04/12 15:53:08 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 17/04/12 15:53:08 INFO Remoting: Starting remoting
> 17/04/12 15:53:08 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://flink@127.0.0.1:32776]
> 17/04/12 15:53:08 INFO client.JobClientActor: Received job test (
> c9c717d6a6d0d5ce9a8758b0fb7dae7c).
> 17/04/12 15:53:08 INFO client.JobClientActor: Could not submit job test (
> c9c717d6a6d0d5ce9a8758b0fb7dae7c), because there is no connection to a
> JobManager.
> 17/04/12 15:53:08 INFO client.JobClientActor: Disconnect from JobManager
> null.
>
> 17/04/12 15:53:08 WARN remote.ReliableDeliverySupervisor: Association
> with remote system [akka.tcp://flink@192.168.56.101:6123] has failed,
> address is now gated for [5000] ms. Reason is: [Disassociated].
> 17/04/12 15:54:08 INFO client.JobClientActor: Terminate JobClientActor.
> 17/04/12 15:54:08 INFO client.JobClientActor: Disconnect from JobManager
> null.
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:
> Shutting down remote daemon.
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:
> Remote daemon shut down; proceeding with flushing remote transports.
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:
> Remoting shut down.
> Exception in thread "main" org.apache.flink.client.program.ProgramInvocationException:
> The program execution failed: Communication with JobManager failed: Lost
> connection to the JobManager.
>
>        at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:413)
>        at org.apache.flink.client.program.StandaloneClusterClient.
> submitJob(StandaloneClusterClient.java:92)
>        at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:389)
>        at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:381)
>        at org.apache.flink.streaming.api.environment.
> RemoteStreamEnvironment.executeRemotely(RemoteStreamEnvironment.java:209)
>        at org.apache.flink.streaming.api.environment.
> RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:173)
>        at com.rfxcel.rts.operations.EventProcessorDriver.start(
> EventProcessorDriver.java:103)
>        at com.rfxcel.rts.operations.EventProcessorDriver.main(
> EventProcessorDriver.java:109)
> Caused by: org.apache.flink.runtime.client.JobExecutionException:
> Communication with JobManager failed: Lost connection to the JobManager.
>        at org.apache.flink.runtime.client.JobClient.
> submitJobAndWait(JobClient.java:137)
>        at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:409)
>        ... 7 more
> Caused by: org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException:
> Lost connection to the JobManager.
>        at org.apache.flink.runtime.client.JobClientActor.
> handleMessage(JobClientActor.java:245)
>        at org.apache.flink.runtime.akka.FlinkUntypedActor.
> handleLeaderSessionID(FlinkUntypedActor.java:90)
>        at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(
> FlinkUntypedActor.java:70)
>        at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(
> UntypedActor.scala:167)
>        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>        at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
>        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>        at scala.concurrent.forkjoin.ForkJoinTask.doExec(
> ForkJoinTask.java:260)
>        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> pollAndExecAll(ForkJoinPool.java:1253)
>        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1346)
>        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
>        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
>

Re: Jobmanager drops upon submitting a jar

Posted by "wenlong.lwl" <we...@gmail.com>.

Hi, amir, I think you could check the log of job manager to make sure that
job manager [192.168.56.101:6123 <http://flink@192.168.56.101:6123/>] is
running well firstly, you may get what is wrong in the log.

On 13 April 2017 at 08:54, amir bahmanyari <am...@yahoo.com.invalid>
wrote:

>
>
>
> Hi Colleagues,I have a simple test job when I submit it to the Flink
> cluster the JM seems to disconnect.Its a one node cluster implemented in a
> VirtualBox Centos 7 VM.Flink starts fine and everything else look fine.
> Following is stack trace.I appreciate a feedback.Cheers
>
> 17/04/12 15:53:04 INFO node.Node: Connected to Node 192.168.56.101
> 17/04/12 15:53:04 INFO config.ConfigurationProvider: Opened bucket default
> 17/04/12 15:53:04 INFO config.ConfigurationProvider: Closed bucket default
> 17/04/12 15:53:04 INFO node.Node: Disconnected from Node 192.168.56.101
> 17/04/12 15:53:07 INFO kafka.FlinkKafkaConsumerBase: Trying to get topic
> metadata from broker localhost:9092 in try 0/3
> 17/04/12 15:53:07 INFO kafka.FlinkKafkaConsumerBase: Consumer is going to
> read the following topics (with number of partitions): abc_pharma_qa (2),
> 17/04/12 15:53:07 INFO environment.RemoteStreamEnvironment: Running
> remotely at 192.168.56.101:6123
> 17/04/12 15:53:07 INFO program.StandaloneClusterClient: Submitting job
> with JobID: c9c717d6a6d0d5ce9a8758b0fb7dae7c. Waiting for job completion.
> Submitting job with JobID: c9c717d6a6d0d5ce9a8758b0fb7dae7c. Waiting for
> job completion.
> 17/04/12 15:53:07 INFO program.StandaloneClusterClient: Starting client
> actor system.
> 17/04/12 15:53:08 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 17/04/12 15:53:08 INFO Remoting: Starting remoting
> 17/04/12 15:53:08 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://flink@127.0.0.1:32776]
> 17/04/12 15:53:08 INFO client.JobClientActor: Received job test (
> c9c717d6a6d0d5ce9a8758b0fb7dae7c).
> 17/04/12 15:53:08 INFO client.JobClientActor: Could not submit job test (
> c9c717d6a6d0d5ce9a8758b0fb7dae7c), because there is no connection to a
> JobManager.
> 17/04/12 15:53:08 INFO client.JobClientActor: Disconnect from JobManager
> null.
>
> 17/04/12 15:53:08 WARN remote.ReliableDeliverySupervisor: Association
> with remote system [akka.tcp://flink@192.168.56.101:6123] has failed,
> address is now gated for [5000] ms. Reason is: [Disassociated].
> 17/04/12 15:54:08 INFO client.JobClientActor: Terminate JobClientActor.
> 17/04/12 15:54:08 INFO client.JobClientActor: Disconnect from JobManager
> null.
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:
> Shutting down remote daemon.
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:
> Remote daemon shut down; proceeding with flushing remote transports.
> 17/04/12 15:54:08 INFO remote.RemoteActorRefProvider$RemotingTerminator:
> Remoting shut down.
> Exception in thread "main" org.apache.flink.client.program.ProgramInvocationException:
> The program execution failed: Communication with JobManager failed: Lost
> connection to the JobManager.
>
>         at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:413)
>         at org.apache.flink.client.program.StandaloneClusterClient.
> submitJob(StandaloneClusterClient.java:92)
>         at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:389)
>         at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:381)
>         at org.apache.flink.streaming.api.environment.
> RemoteStreamEnvironment.executeRemotely(RemoteStreamEnvironment.java:209)
>         at org.apache.flink.streaming.api.environment.
> RemoteStreamEnvironment.execute(RemoteStreamEnvironment.java:173)
>         at com.rfxcel.rts.operations.EventProcessorDriver.start(
> EventProcessorDriver.java:103)
>         at com.rfxcel.rts.operations.EventProcessorDriver.main(
> EventProcessorDriver.java:109)
> Caused by: org.apache.flink.runtime.client.JobExecutionException:
> Communication with JobManager failed: Lost connection to the JobManager.
>         at org.apache.flink.runtime.client.JobClient.
> submitJobAndWait(JobClient.java:137)
>         at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:409)
>         ... 7 more
> Caused by: org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException:
> Lost connection to the JobManager.
>         at org.apache.flink.runtime.client.JobClientActor.
> handleMessage(JobClientActor.java:245)
>         at org.apache.flink.runtime.akka.FlinkUntypedActor.
> handleLeaderSessionID(FlinkUntypedActor.java:90)
>         at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(
> FlinkUntypedActor.java:70)
>         at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(
> UntypedActor.scala:167)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(
> ForkJoinTask.java:260)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> pollAndExecAll(ForkJoinPool.java:1253)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1346)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
>