You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Felipe Gutierrez <fe...@gmail.com> on 2018/08/06 09:01:01 UTC

connection failed when running flink in a cluster

Hello everyone,

I am trying to run Flink on Raspberry Pis. My first test for word count in
a single node worked. I just have to decrease the Heap memory of the
jobmanager.heap.mb and taskmanager.heap.mb to 512.
My second test is to add 2 slave nodes I got the error: "Java HotSpot(TM)
Client VM warning: G1 GC is disabled in this release." at the file
log/flink-root-taskexecutor-0-*.out.

This link (
https://blog.sflow.com/2016/06/raspberry-pi-real-time-network-analytics.html)
says that in order to Raspberry Pi ARM architecture works with JVM it is
necessary to configure the JVM as:
-Xms600M
-Xmx600M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode

then I set this variables on the path inside the file flink-conf.yaml
env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode"
env.java.opts.jobmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode"
env.java.opts.taskmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode"

and the error "Java HotSpot(TM) Client VM warning: G1 GC is disabled in
this release." is not showing anymore. However, the connection from the
master node to the slave node is still not possible. Does anybody know how
I must configure flink to deal with that?

This is the error stack trace:

2017-05-25 12:40:26,421 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
from DEPLOYING to RUNNING.
2017-05-25 12:40:26,891 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
from RUNNING to FAILED.
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at
org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
at
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:745)
2017-05-25 12:40:26,898 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Socket
Window WordCount (71c6d7796eccf6587d9d1deda0490e09) switched from state
RUNNING to FAILING.
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at
org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
at
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:745)
2017-05-25 12:40:26,921 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        -
Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
(1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from RUNNING to CANCELING.
2017-05-25 12:40:26,975 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        -
Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
(1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from CANCELING to
CANCELED.



Thanks, Felipe
*--*
*-- Felipe Gutierrez*

*-- skype: felipe.o.gutierrez*
*--* *https://felipeogutierrez.blogspot.com
<https://felipeogutierrez.blogspot.com>*

Re: connection failed when running flink in a cluster

Posted by Felipe Gutierrez <fe...@gmail.com>.
Worked! this was exactly the problem. I have to set the IP otherwise it
does not accept the jobs that I submit.

Even if I set the IP and localhost at the /etc/hosts file and the command
"ping localhost" returns my IP, it does not work. It is mandatory to use
--hostname <IP>.

Thanks Gary. Best Regards,
Felipe



*--*
*-- Felipe Gutierrez*

*-- skype: felipe.o.gutierrez*
*--* *https://felipeogutierrez.blogspot.com
<https://felipeogutierrez.blogspot.com>*


On Mon, Aug 6, 2018 at 9:57 PM Gary Yao <ga...@data-artisans.com> wrote:

> Hi,
>
> Can you try submitting with:
>
>     ./bin/flink run examples/streaming/SocketWindowWordCount.jar
> --hostname <IP> --port 9000
>
> where IP is the IP of the node where you started nc? If not specified, the
> default hostname is localhost. This problematic is if the source operator
> is
> scheduled on a different physical machine.
>
> Best,
> Gary
>
> On Mon, Aug 6, 2018 at 6:01 PM, Felipe Gutierrez <
> felipe.o.gutierrez@gmail.com> wrote:
>
>> Hi Vino,
>>
>> the UI shows the job as completed.
>> I had run "./bin/flink run examples/streaming/WordCount.jar" and I get no
>> error.
>>
>> When I start netcat "nc -l 9000" and in other terminal I run "./bin/flink
>> run examples/streaming/SocketWindowWordCount.jar --port 9000" I have this
>> exception.
>>
>> Starting execution of program
>>
>> ------------------------------------------------------------
>>  The program finished with the following exception:
>>
>> org.apache.flink.client.program.ProgramInvocationException:
>> java.net.ConnectException: Connection refused
>> at
>> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:264)
>> at
>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
>> at
>> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
>> at
>> org.apache.flink.streaming.examples.socket.SocketWindowWordCount.main(SocketWindowWordCount.java:92)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:497)
>> at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
>> at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
>> at
>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)
>> at
>> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:785)
>> at
>> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:279)
>> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214)
>> at
>> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1025)
>> at
>> org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>> at
>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
>> Caused by: java.net.ConnectException: Connection refused
>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>> at java.net
>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>> at java.net
>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>> at java.net
>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>> at java.net.Socket.connect(Socket.java:589)
>> at
>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>> at
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>> at
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>> at
>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> *--*
>> *-- Felipe Gutierrez*
>>
>> *-- skype: felipe.o.gutierrez*
>> *--* *https://felipeogutierrez.blogspot.com
>> <https://felipeogutierrez.blogspot.com>*
>>
>>
>> On Mon, Aug 6, 2018 at 5:54 PM vino yang <ya...@gmail.com> wrote:
>>
>>> Hi Felipe,
>>>
>>> You got the result? And the web UI shown the job is completed?
>>>
>>> If it throws the exception you provided, the job's status should be
>>> failed.
>>>
>>> Thanks, vino.
>>>
>>> 2018-08-06 23:42 GMT+08:00 Felipe Gutierrez <
>>> felipe.o.gutierrez@gmail.com>:
>>>
>>>> yes. with this example (examples/streaming/WordCount.jar) my cluster
>>>> worked.
>>>>
>>>> the file log/*out from the master is still empty and the file log/*out
>>>> from the slave node has my result. The dashboard also shows that the job is
>>>> completed.
>>>>
>>>> So, like you said there are some external dependencies that I didn`t
>>>> include in my deploy. Do you have any clue?
>>>>
>>>> I am following the original quickstart (
>>>> https://ci.apache.org/projects/flink/flink-docs-master/quickstart/setup_quickstart.html
>>>> )
>>>>
>>>> Kind Regards,
>>>> Felipe
>>>>
>>>>
>>>>
>>>>
>>>> *--*
>>>> *-- Felipe Gutierrez*
>>>>
>>>> *-- skype: felipe.o.gutierrez*
>>>> *--* *https://felipeogutierrez.blogspot.com
>>>> <https://felipeogutierrez.blogspot.com>*
>>>>
>>>>
>>>> On Mon, Aug 6, 2018 at 5:01 PM Gary Yao <ga...@data-artisans.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> nc exits after the first connection is closed. Are you re-running the
>>>>> nc
>>>>> command every time the job finishes?
>>>>>
>>>>> The stacktrace you copied does not indicate that a TaskManager cannot
>>>>> connect
>>>>> to the JobManager. I can only see that the SocketTextStreamFunction
>>>>> (from the
>>>>> SocketWindowWordCount job?) cannot open the connection to the address
>>>>> that you
>>>>> specified.
>>>>>
>>>>> Can you try to run examples/streaming/WordCount.jar. It is a simpler
>>>>> job which
>>>>> does not rely on external dependencies.
>>>>>
>>>>> If all the above fails, can you tell us how you submit the job? Can
>>>>> you post
>>>>> the full command? Can you also post the full JobManager & TaskManager
>>>>> logs?
>>>>>
>>>>> Best,
>>>>> Gary
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Aug 6, 2018 at 4:10 PM, Felipe Gutierrez <
>>>>> felipe.o.gutierrez@gmail.com> wrote:
>>>>>
>>>>>> do you mean "nc -l 9000"? If so, I did start before.
>>>>>> the task manager running on the master can connect to the job
>>>>>> manager. but the task manager on the slave node cannot. The second time
>>>>>> that I start the WordCount task it recognizes only one task manager (from
>>>>>> the master) and runs my task. But the task manager from the slave does not
>>>>>> process anything and it is started.
>>>>>>
>>>>>> here is the error stack trace from the slave node:
>>>>>>
>>>>>> 2017-05-30 05:10:39,853 INFO
>>>>>> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend     -
>>>>>> Initializing heap keyed state backend with stream factory.
>>>>>> 2017-05-30 05:10:39,977 INFO
>>>>>> org.apache.flink.runtime.taskmanager.Task                     - Source:
>>>>>> Socket Stream -> Flat Map (1/1) (d5e3d87395995d3977d2f472de896e23) switched
>>>>>> from RUNNING to FAILED.
>>>>>> java.net.ConnectException: Connection refused
>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>> at java.net
>>>>>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>>>>> at java.net
>>>>>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>>>>> at java.net
>>>>>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>> at
>>>>>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>> at
>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>>>>>> at
>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>>>>>> at
>>>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>>>>>> at
>>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2017-05-30 05:10:40,016 INFO
>>>>>> org.apache.flink.runtime.taskmanager.Task                     - Freeing
>>>>>> task resources for Source: Socket Stream -> Flat Map (1/1)
>>>>>> (d5e3d87395995d3977d2f472de896e23).
>>>>>>
>>>>>>
>>>>>> *--*
>>>>>> *-- Felipe Gutierrez*
>>>>>>
>>>>>> *-- skype: felipe.o.gutierrez*
>>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 6, 2018 at 2:17 PM vino yang <ya...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Felipe,
>>>>>>>
>>>>>>> From the exception information, it seems that you did not start the
>>>>>>> socket server, the socket source needs to connect to the socket server.
>>>>>>>
>>>>>>> Please make sure the socket server has started and is available.
>>>>>>>
>>>>>>> Thanks, vino.
>>>>>>>
>>>>>>> 2018-08-06 18:45 GMT+08:00 Felipe Gutierrez <
>>>>>>> felipe.o.gutierrez@gmail.com>:
>>>>>>>
>>>>>>>> yes.
>>>>>>>>
>>>>>>>> when I execute the jps command on the master node I
>>>>>>>> see TaskManagerRunner and StandaloneSessionClusterEntrypoint (which I
>>>>>>>> believe it is the  jobManager). On the slave nodes I see TaskManagerRunner
>>>>>>>> when I run jps command
>>>>>>>>
>>>>>>>>
>>>>>>>> *--*
>>>>>>>> *-- Felipe Gutierrez*
>>>>>>>>
>>>>>>>> *-- skype: felipe.o.gutierrez*
>>>>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 6, 2018 at 12:13 PM miki haiat <mi...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Did you start job manager and task manager on the same resbery pi ?
>>>>>>>>>
>>>>>>>>> On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <
>>>>>>>>> felipe.o.gutierrez@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello everyone,
>>>>>>>>>>
>>>>>>>>>> I am trying to run Flink on Raspberry Pis. My first test for word
>>>>>>>>>> count in a single node worked. I just have to decrease the Heap memory of
>>>>>>>>>> the jobmanager.heap.mb and taskmanager.heap.mb to 512.
>>>>>>>>>> My second test is to add 2 slave nodes I got the error: "Java
>>>>>>>>>> HotSpot(TM) Client VM warning: G1 GC is disabled in this release." at the
>>>>>>>>>> file log/flink-root-taskexecutor-0-*.out.
>>>>>>>>>>
>>>>>>>>>> This link (
>>>>>>>>>> https://blog.sflow.com/2016/06/raspberry-pi-real-time-network-analytics.html)
>>>>>>>>>> says that in order to Raspberry Pi ARM architecture works with JVM it is
>>>>>>>>>> necessary to configure the JVM as:
>>>>>>>>>> -Xms600M
>>>>>>>>>> -Xmx600M
>>>>>>>>>> -XX:+UseParNewGC
>>>>>>>>>> -XX:+UseConcMarkSweepGC
>>>>>>>>>> -XX:+CMSIncrementalMode
>>>>>>>>>>
>>>>>>>>>> then I set this variables on the path inside the file
>>>>>>>>>> flink-conf.yaml
>>>>>>>>>> env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>>>>>>> -XX:+CMSIncrementalMode"
>>>>>>>>>> env.java.opts.jobmanager: "-XX:+UseParNewGC
>>>>>>>>>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
>>>>>>>>>> env.java.opts.taskmanager: "-XX:+UseParNewGC
>>>>>>>>>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
>>>>>>>>>>
>>>>>>>>>> and the error "Java HotSpot(TM) Client VM warning: G1 GC is
>>>>>>>>>> disabled in this release." is not showing anymore. However, the connection
>>>>>>>>>> from the master node to the slave node is still not possible. Does anybody
>>>>>>>>>> know how I must configure flink to deal with that?
>>>>>>>>>>
>>>>>>>>>> This is the error stack trace:
>>>>>>>>>>
>>>>>>>>>> 2017-05-25 12:40:26,421 INFO
>>>>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>>>>>>>>> Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
>>>>>>>>>> from DEPLOYING to RUNNING.
>>>>>>>>>> 2017-05-25 12:40:26,891 INFO
>>>>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>>>>>>>>> Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
>>>>>>>>>> from RUNNING to FAILED.
>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>>>>>> at java.net
>>>>>>>>>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>>>>>>>>> at java.net
>>>>>>>>>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>>>>>>>>> at java.net
>>>>>>>>>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>>>>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>>>>>> at
>>>>>>>>>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>>>>>> at
>>>>>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>>>>>>>>>> at
>>>>>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>>>>>>>>>> at
>>>>>>>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>>>>>>>>>> at
>>>>>>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>>>>>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>>>> 2017-05-25 12:40:26,898 INFO
>>>>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Socket
>>>>>>>>>> Window WordCount (71c6d7796eccf6587d9d1deda0490e09) switched from state
>>>>>>>>>> RUNNING to FAILING.
>>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>>>>>> at java.net
>>>>>>>>>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>>>>>>>>> at java.net
>>>>>>>>>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>>>>>>>>> at java.net
>>>>>>>>>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>>>>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>>>>>> at
>>>>>>>>>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>>>>>> at
>>>>>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>>>>>>>>>> at
>>>>>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>>>>>>>>>> at
>>>>>>>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>>>>>>>>>> at
>>>>>>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>>>>>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>>>> 2017-05-25 12:40:26,921 INFO
>>>>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        -
>>>>>>>>>> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
>>>>>>>>>> ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
>>>>>>>>>> (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from RUNNING to CANCELING.
>>>>>>>>>> 2017-05-25 12:40:26,975 INFO
>>>>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        -
>>>>>>>>>> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
>>>>>>>>>> ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
>>>>>>>>>> (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from CANCELING to
>>>>>>>>>> CANCELED.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks, Felipe
>>>>>>>>>> *--*
>>>>>>>>>> *-- Felipe Gutierrez*
>>>>>>>>>>
>>>>>>>>>> *-- skype: felipe.o.gutierrez*
>>>>>>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>>>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>

Re: connection failed when running flink in a cluster

Posted by Gary Yao <ga...@data-artisans.com>.
Hi,

Can you try submitting with:

    ./bin/flink run examples/streaming/SocketWindowWordCount.jar --hostname
<IP> --port 9000

where IP is the IP of the node where you started nc? If not specified, the
default hostname is localhost. This problematic is if the source operator is
scheduled on a different physical machine.

Best,
Gary

On Mon, Aug 6, 2018 at 6:01 PM, Felipe Gutierrez <
felipe.o.gutierrez@gmail.com> wrote:

> Hi Vino,
>
> the UI shows the job as completed.
> I had run "./bin/flink run examples/streaming/WordCount.jar" and I get no
> error.
>
> When I start netcat "nc -l 9000" and in other terminal I run "./bin/flink
> run examples/streaming/SocketWindowWordCount.jar --port 9000" I have this
> exception.
>
> Starting execution of program
>
> ------------------------------------------------------------
>  The program finished with the following exception:
>
> org.apache.flink.client.program.ProgramInvocationException:
> java.net.ConnectException: Connection refused
> at org.apache.flink.client.program.rest.RestClusterClient.submitJob(
> RestClusterClient.java:264)
> at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:464)
> at org.apache.flink.streaming.api.environment.StreamContextEnvironment.
> execute(StreamContextEnvironment.java:66)
> at org.apache.flink.streaming.examples.socket.SocketWindowWordCount.main(
> SocketWindowWordCount.java:92)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.apache.flink.client.program.PackagedProgram.callMainMethod(
> PackagedProgram.java:528)
> at org.apache.flink.client.program.PackagedProgram.
> invokeInteractiveModeForExecution(PackagedProgram.java:420)
> at org.apache.flink.client.program.ClusterClient.run(
> ClusterClient.java:404)
> at org.apache.flink.client.cli.CliFrontend.executeProgram(
> CliFrontend.java:785)
> at org.apache.flink.client.cli.CliFrontend.runProgram(
> CliFrontend.java:279)
> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214)
> at org.apache.flink.client.cli.CliFrontend.parseParameters(
> CliFrontend.java:1025)
> at org.apache.flink.client.cli.CliFrontend.lambda$main$9(
> CliFrontend.java:1101)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1836)
> at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(
> HadoopSecurityContext.java:41)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
> Caused by: java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.AbstractPlainSocketImpl.doConnect(
> AbstractPlainSocketImpl.java:350)
> at java.net.AbstractPlainSocketImpl.connectToAddress(
> AbstractPlainSocketImpl.java:206)
> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:
> 188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at org.apache.flink.streaming.api.functions.source.
> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
> at org.apache.flink.streaming.api.operators.StreamSource.
> run(StreamSource.java:87)
> at org.apache.flink.streaming.api.operators.StreamSource.
> run(StreamSource.java:56)
> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
> SourceStreamTask.java:99)
> at org.apache.flink.streaming.runtime.tasks.StreamTask.
> invoke(StreamTask.java:306)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> *--*
> *-- Felipe Gutierrez*
>
> *-- skype: felipe.o.gutierrez*
> *--* *https://felipeogutierrez.blogspot.com
> <https://felipeogutierrez.blogspot.com>*
>
>
> On Mon, Aug 6, 2018 at 5:54 PM vino yang <ya...@gmail.com> wrote:
>
>> Hi Felipe,
>>
>> You got the result? And the web UI shown the job is completed?
>>
>> If it throws the exception you provided, the job's status should be
>> failed.
>>
>> Thanks, vino.
>>
>> 2018-08-06 23:42 GMT+08:00 Felipe Gutierrez <felipe.o.gutierrez@gmail.com
>> >:
>>
>>> yes. with this example (examples/streaming/WordCount.jar) my cluster
>>> worked.
>>>
>>> the file log/*out from the master is still empty and the file log/*out
>>> from the slave node has my result. The dashboard also shows that the job is
>>> completed.
>>>
>>> So, like you said there are some external dependencies that I didn`t
>>> include in my deploy. Do you have any clue?
>>>
>>> I am following the original quickstart (https://ci.apache.org/
>>> projects/flink/flink-docs-master/quickstart/setup_quickstart.html)
>>>
>>> Kind Regards,
>>> Felipe
>>>
>>>
>>>
>>>
>>> *--*
>>> *-- Felipe Gutierrez*
>>>
>>> *-- skype: felipe.o.gutierrez*
>>> *--* *https://felipeogutierrez.blogspot.com
>>> <https://felipeogutierrez.blogspot.com>*
>>>
>>>
>>> On Mon, Aug 6, 2018 at 5:01 PM Gary Yao <ga...@data-artisans.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> nc exits after the first connection is closed. Are you re-running the nc
>>>> command every time the job finishes?
>>>>
>>>> The stacktrace you copied does not indicate that a TaskManager cannot
>>>> connect
>>>> to the JobManager. I can only see that the SocketTextStreamFunction
>>>> (from the
>>>> SocketWindowWordCount job?) cannot open the connection to the address
>>>> that you
>>>> specified.
>>>>
>>>> Can you try to run examples/streaming/WordCount.jar. It is a simpler
>>>> job which
>>>> does not rely on external dependencies.
>>>>
>>>> If all the above fails, can you tell us how you submit the job? Can you
>>>> post
>>>> the full command? Can you also post the full JobManager & TaskManager
>>>> logs?
>>>>
>>>> Best,
>>>> Gary
>>>>
>>>>
>>>>
>>>> On Mon, Aug 6, 2018 at 4:10 PM, Felipe Gutierrez <
>>>> felipe.o.gutierrez@gmail.com> wrote:
>>>>
>>>>> do you mean "nc -l 9000"? If so, I did start before.
>>>>> the task manager running on the master can connect to the job manager.
>>>>> but the task manager on the slave node cannot. The second time that I start
>>>>> the WordCount task it recognizes only one task manager (from the master)
>>>>> and runs my task. But the task manager from the slave does not process
>>>>> anything and it is started.
>>>>>
>>>>> here is the error stack trace from the slave node:
>>>>>
>>>>> 2017-05-30 05:10:39,853 INFO  org.apache.flink.runtime.state.heap.HeapKeyedStateBackend
>>>>>    - Initializing heap keyed state backend with stream factory.
>>>>> 2017-05-30 05:10:39,977 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>>                    - Source: Socket Stream -> Flat Map (1/1) (
>>>>> d5e3d87395995d3977d2f472de896e23) switched from RUNNING to FAILED.
>>>>> java.net.ConnectException: Connection refused
>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>>>> AbstractPlainSocketImpl.java:350)
>>>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>>>> AbstractPlainSocketImpl.java:206)
>>>>> at java.net.AbstractPlainSocketImpl.connect(
>>>>> AbstractPlainSocketImpl.java:188)
>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>> at org.apache.flink.streaming.api.functions.source.
>>>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>> run(StreamSource.java:87)
>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>> run(StreamSource.java:56)
>>>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>>>> SourceStreamTask.java:99)
>>>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>>>> invoke(StreamTask.java:306)
>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2017-05-30 05:10:40,016 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>>                    - Freeing task resources for Source: Socket Stream ->
>>>>> Flat Map (1/1) (d5e3d87395995d3977d2f472de896e23).
>>>>>
>>>>>
>>>>> *--*
>>>>> *-- Felipe Gutierrez*
>>>>>
>>>>> *-- skype: felipe.o.gutierrez*
>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>
>>>>>
>>>>> On Mon, Aug 6, 2018 at 2:17 PM vino yang <ya...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Felipe,
>>>>>>
>>>>>> From the exception information, it seems that you did not start the
>>>>>> socket server, the socket source needs to connect to the socket server.
>>>>>>
>>>>>> Please make sure the socket server has started and is available.
>>>>>>
>>>>>> Thanks, vino.
>>>>>>
>>>>>> 2018-08-06 18:45 GMT+08:00 Felipe Gutierrez <
>>>>>> felipe.o.gutierrez@gmail.com>:
>>>>>>
>>>>>>> yes.
>>>>>>>
>>>>>>> when I execute the jps command on the master node I
>>>>>>> see TaskManagerRunner and StandaloneSessionClusterEntrypoint (which
>>>>>>> I believe it is the  jobManager). On the slave nodes I
>>>>>>> see TaskManagerRunner when I run jps command
>>>>>>>
>>>>>>>
>>>>>>> *--*
>>>>>>> *-- Felipe Gutierrez*
>>>>>>>
>>>>>>> *-- skype: felipe.o.gutierrez*
>>>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 6, 2018 at 12:13 PM miki haiat <mi...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Did you start job manager and task manager on the same resbery pi ?
>>>>>>>>
>>>>>>>> On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <
>>>>>>>> felipe.o.gutierrez@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello everyone,
>>>>>>>>>
>>>>>>>>> I am trying to run Flink on Raspberry Pis. My first test for word
>>>>>>>>> count in a single node worked. I just have to decrease the Heap memory of
>>>>>>>>> the jobmanager.heap.mb and taskmanager.heap.mb to 512.
>>>>>>>>> My second test is to add 2 slave nodes I got the error: "Java
>>>>>>>>> HotSpot(TM) Client VM warning: G1 GC is disabled in this release." at the
>>>>>>>>> file log/flink-root-taskexecutor-0-*.out.
>>>>>>>>>
>>>>>>>>> This link (https://blog.sflow.com/2016/06/raspberry-pi-real-time-
>>>>>>>>> network-analytics.html) says that in order to Raspberry Pi ARM
>>>>>>>>> architecture works with JVM it is necessary to configure the JVM as:
>>>>>>>>> -Xms600M
>>>>>>>>> -Xmx600M
>>>>>>>>> -XX:+UseParNewGC
>>>>>>>>> -XX:+UseConcMarkSweepGC
>>>>>>>>> -XX:+CMSIncrementalMode
>>>>>>>>>
>>>>>>>>> then I set this variables on the path inside the file
>>>>>>>>> flink-conf.yaml
>>>>>>>>> env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>>>>>> -XX:+CMSIncrementalMode"
>>>>>>>>> env.java.opts.jobmanager: "-XX:+UseParNewGC
>>>>>>>>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
>>>>>>>>> env.java.opts.taskmanager: "-XX:+UseParNewGC
>>>>>>>>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
>>>>>>>>>
>>>>>>>>> and the error "Java HotSpot(TM) Client VM warning: G1 GC is
>>>>>>>>> disabled in this release." is not showing anymore. However, the connection
>>>>>>>>> from the master node to the slave node is still not possible. Does anybody
>>>>>>>>> know how I must configure flink to deal with that?
>>>>>>>>>
>>>>>>>>> This is the error stack trace:
>>>>>>>>>
>>>>>>>>> 2017-05-25 12:40:26,421 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>>>       - Source: Socket Stream -> Flat Map (1/1) (
>>>>>>>>> b81b6492fc0860367be422d0b0bf4358) switched from DEPLOYING to
>>>>>>>>> RUNNING.
>>>>>>>>> 2017-05-25 12:40:26,891 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>>>       - Source: Socket Stream -> Flat Map (1/1) (
>>>>>>>>> b81b6492fc0860367be422d0b0bf4358) switched from RUNNING to FAILED.
>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>>>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>>>>>>>> AbstractPlainSocketImpl.java:350)
>>>>>>>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>>>>>>>> AbstractPlainSocketImpl.java:206)
>>>>>>>>> at java.net.AbstractPlainSocketImpl.connect(
>>>>>>>>> AbstractPlainSocketImpl.java:188)
>>>>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>>>>> at org.apache.flink.streaming.api.functions.source.
>>>>>>>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>>>>>> run(StreamSource.java:87)
>>>>>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>>>>>> run(StreamSource.java:56)
>>>>>>>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>>>>>>>> SourceStreamTask.java:99)
>>>>>>>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>>>>>>>> invoke(StreamTask.java:306)
>>>>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>>> 2017-05-25 12:40:26,898 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>>>       - Job Socket Window WordCount (
>>>>>>>>> 71c6d7796eccf6587d9d1deda0490e09) switched from state RUNNING to
>>>>>>>>> FAILING.
>>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>>>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>>>>>>>> AbstractPlainSocketImpl.java:350)
>>>>>>>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>>>>>>>> AbstractPlainSocketImpl.java:206)
>>>>>>>>> at java.net.AbstractPlainSocketImpl.connect(
>>>>>>>>> AbstractPlainSocketImpl.java:188)
>>>>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>>>>> at org.apache.flink.streaming.api.functions.source.
>>>>>>>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>>>>>> run(StreamSource.java:87)
>>>>>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>>>>>> run(StreamSource.java:56)
>>>>>>>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>>>>>>>> SourceStreamTask.java:99)
>>>>>>>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>>>>>>>> invoke(StreamTask.java:306)
>>>>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>>> 2017-05-25 12:40:26,921 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>>>       - Window(TumblingProcessingTimeWindows(5000),
>>>>>>>>> ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) ->
>>>>>>>>> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b)
>>>>>>>>> switched from RUNNING to CANCELING.
>>>>>>>>> 2017-05-25 12:40:26,975 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>>>       - Window(TumblingProcessingTimeWindows(5000),
>>>>>>>>> ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) ->
>>>>>>>>> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b)
>>>>>>>>> switched from CANCELING to CANCELED.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks, Felipe
>>>>>>>>> *--*
>>>>>>>>> *-- Felipe Gutierrez*
>>>>>>>>>
>>>>>>>>> *-- skype: felipe.o.gutierrez*
>>>>>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

Re: connection failed when running flink in a cluster

Posted by Felipe Gutierrez <fe...@gmail.com>.
Hi Vino,

the UI shows the job as completed.
I had run "./bin/flink run examples/streaming/WordCount.jar" and I get no
error.

When I start netcat "nc -l 9000" and in other terminal I run "./bin/flink
run examples/streaming/SocketWindowWordCount.jar --port 9000" I have this
exception.

Starting execution of program

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException:
java.net.ConnectException: Connection refused
at
org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:264)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
at
org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
at
org.apache.flink.streaming.examples.socket.SocketWindowWordCount.main(SocketWindowWordCount.java:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)
at
org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:785)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:279)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1025)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at
org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
at
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:745)



*--*
*-- Felipe Gutierrez*

*-- skype: felipe.o.gutierrez*
*--* *https://felipeogutierrez.blogspot.com
<https://felipeogutierrez.blogspot.com>*


On Mon, Aug 6, 2018 at 5:54 PM vino yang <ya...@gmail.com> wrote:

> Hi Felipe,
>
> You got the result? And the web UI shown the job is completed?
>
> If it throws the exception you provided, the job's status should be failed.
>
> Thanks, vino.
>
> 2018-08-06 23:42 GMT+08:00 Felipe Gutierrez <fe...@gmail.com>
> :
>
>> yes. with this example (examples/streaming/WordCount.jar) my cluster
>> worked.
>>
>> the file log/*out from the master is still empty and the file log/*out
>> from the slave node has my result. The dashboard also shows that the job is
>> completed.
>>
>> So, like you said there are some external dependencies that I didn`t
>> include in my deploy. Do you have any clue?
>>
>> I am following the original quickstart (
>> https://ci.apache.org/projects/flink/flink-docs-master/quickstart/setup_quickstart.html
>> )
>>
>> Kind Regards,
>> Felipe
>>
>>
>>
>>
>> *--*
>> *-- Felipe Gutierrez*
>>
>> *-- skype: felipe.o.gutierrez*
>> *--* *https://felipeogutierrez.blogspot.com
>> <https://felipeogutierrez.blogspot.com>*
>>
>>
>> On Mon, Aug 6, 2018 at 5:01 PM Gary Yao <ga...@data-artisans.com> wrote:
>>
>>> Hi,
>>>
>>> nc exits after the first connection is closed. Are you re-running the nc
>>> command every time the job finishes?
>>>
>>> The stacktrace you copied does not indicate that a TaskManager cannot
>>> connect
>>> to the JobManager. I can only see that the SocketTextStreamFunction
>>> (from the
>>> SocketWindowWordCount job?) cannot open the connection to the address
>>> that you
>>> specified.
>>>
>>> Can you try to run examples/streaming/WordCount.jar. It is a simpler job
>>> which
>>> does not rely on external dependencies.
>>>
>>> If all the above fails, can you tell us how you submit the job? Can you
>>> post
>>> the full command? Can you also post the full JobManager & TaskManager
>>> logs?
>>>
>>> Best,
>>> Gary
>>>
>>>
>>>
>>> On Mon, Aug 6, 2018 at 4:10 PM, Felipe Gutierrez <
>>> felipe.o.gutierrez@gmail.com> wrote:
>>>
>>>> do you mean "nc -l 9000"? If so, I did start before.
>>>> the task manager running on the master can connect to the job manager.
>>>> but the task manager on the slave node cannot. The second time that I start
>>>> the WordCount task it recognizes only one task manager (from the master)
>>>> and runs my task. But the task manager from the slave does not process
>>>> anything and it is started.
>>>>
>>>> here is the error stack trace from the slave node:
>>>>
>>>> 2017-05-30 05:10:39,853 INFO
>>>> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend     -
>>>> Initializing heap keyed state backend with stream factory.
>>>> 2017-05-30 05:10:39,977 INFO
>>>> org.apache.flink.runtime.taskmanager.Task                     - Source:
>>>> Socket Stream -> Flat Map (1/1) (d5e3d87395995d3977d2f472de896e23) switched
>>>> from RUNNING to FAILED.
>>>> java.net.ConnectException: Connection refused
>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>> at java.net
>>>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>>> at java.net
>>>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>>> at java.net
>>>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>> at java.net.Socket.connect(Socket.java:589)
>>>> at
>>>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>> at
>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>>>> at
>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2017-05-30 05:10:40,016 INFO
>>>> org.apache.flink.runtime.taskmanager.Task                     - Freeing
>>>> task resources for Source: Socket Stream -> Flat Map (1/1)
>>>> (d5e3d87395995d3977d2f472de896e23).
>>>>
>>>>
>>>> *--*
>>>> *-- Felipe Gutierrez*
>>>>
>>>> *-- skype: felipe.o.gutierrez*
>>>> *--* *https://felipeogutierrez.blogspot.com
>>>> <https://felipeogutierrez.blogspot.com>*
>>>>
>>>>
>>>> On Mon, Aug 6, 2018 at 2:17 PM vino yang <ya...@gmail.com> wrote:
>>>>
>>>>> Hi Felipe,
>>>>>
>>>>> From the exception information, it seems that you did not start the
>>>>> socket server, the socket source needs to connect to the socket server.
>>>>>
>>>>> Please make sure the socket server has started and is available.
>>>>>
>>>>> Thanks, vino.
>>>>>
>>>>> 2018-08-06 18:45 GMT+08:00 Felipe Gutierrez <
>>>>> felipe.o.gutierrez@gmail.com>:
>>>>>
>>>>>> yes.
>>>>>>
>>>>>> when I execute the jps command on the master node I
>>>>>> see TaskManagerRunner and StandaloneSessionClusterEntrypoint (which I
>>>>>> believe it is the  jobManager). On the slave nodes I see TaskManagerRunner
>>>>>> when I run jps command
>>>>>>
>>>>>>
>>>>>> *--*
>>>>>> *-- Felipe Gutierrez*
>>>>>>
>>>>>> *-- skype: felipe.o.gutierrez*
>>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 6, 2018 at 12:13 PM miki haiat <mi...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Did you start job manager and task manager on the same resbery pi ?
>>>>>>>
>>>>>>> On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <
>>>>>>> felipe.o.gutierrez@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> I am trying to run Flink on Raspberry Pis. My first test for word
>>>>>>>> count in a single node worked. I just have to decrease the Heap memory of
>>>>>>>> the jobmanager.heap.mb and taskmanager.heap.mb to 512.
>>>>>>>> My second test is to add 2 slave nodes I got the error: "Java
>>>>>>>> HotSpot(TM) Client VM warning: G1 GC is disabled in this release." at the
>>>>>>>> file log/flink-root-taskexecutor-0-*.out.
>>>>>>>>
>>>>>>>> This link (
>>>>>>>> https://blog.sflow.com/2016/06/raspberry-pi-real-time-network-analytics.html)
>>>>>>>> says that in order to Raspberry Pi ARM architecture works with JVM it is
>>>>>>>> necessary to configure the JVM as:
>>>>>>>> -Xms600M
>>>>>>>> -Xmx600M
>>>>>>>> -XX:+UseParNewGC
>>>>>>>> -XX:+UseConcMarkSweepGC
>>>>>>>> -XX:+CMSIncrementalMode
>>>>>>>>
>>>>>>>> then I set this variables on the path inside the file
>>>>>>>> flink-conf.yaml
>>>>>>>> env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>>>>> -XX:+CMSIncrementalMode"
>>>>>>>> env.java.opts.jobmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>>>>> -XX:+CMSIncrementalMode"
>>>>>>>> env.java.opts.taskmanager: "-XX:+UseParNewGC
>>>>>>>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
>>>>>>>>
>>>>>>>> and the error "Java HotSpot(TM) Client VM warning: G1 GC is
>>>>>>>> disabled in this release." is not showing anymore. However, the connection
>>>>>>>> from the master node to the slave node is still not possible. Does anybody
>>>>>>>> know how I must configure flink to deal with that?
>>>>>>>>
>>>>>>>> This is the error stack trace:
>>>>>>>>
>>>>>>>> 2017-05-25 12:40:26,421 INFO
>>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>>>>>>> Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
>>>>>>>> from DEPLOYING to RUNNING.
>>>>>>>> 2017-05-25 12:40:26,891 INFO
>>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>>>>>>> Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
>>>>>>>> from RUNNING to FAILED.
>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>>>> at java.net
>>>>>>>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>>>>>>> at java.net
>>>>>>>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>>>>>>> at java.net
>>>>>>>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>>>> at
>>>>>>>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>>>> at
>>>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>>>>>>>> at
>>>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>>>>>>>> at
>>>>>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>>>>>>>> at
>>>>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>>>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>> 2017-05-25 12:40:26,898 INFO
>>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Socket
>>>>>>>> Window WordCount (71c6d7796eccf6587d9d1deda0490e09) switched from state
>>>>>>>> RUNNING to FAILING.
>>>>>>>> java.net.ConnectException: Connection refused
>>>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>>>> at java.net
>>>>>>>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>>>>>>> at java.net
>>>>>>>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>>>>>>> at java.net
>>>>>>>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>>>> at
>>>>>>>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>>>> at
>>>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>>>>>>>> at
>>>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>>>>>>>> at
>>>>>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>>>>>>>> at
>>>>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>>>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>>> 2017-05-25 12:40:26,921 INFO
>>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        -
>>>>>>>> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
>>>>>>>> ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
>>>>>>>> (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from RUNNING to CANCELING.
>>>>>>>> 2017-05-25 12:40:26,975 INFO
>>>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        -
>>>>>>>> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
>>>>>>>> ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
>>>>>>>> (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from CANCELING to
>>>>>>>> CANCELED.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks, Felipe
>>>>>>>> *--*
>>>>>>>> *-- Felipe Gutierrez*
>>>>>>>>
>>>>>>>> *-- skype: felipe.o.gutierrez*
>>>>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>>>>
>>>>>>>
>>>>>
>>>
>

Re: connection failed when running flink in a cluster

Posted by vino yang <ya...@gmail.com>.
Hi Felipe,

You got the result? And the web UI shown the job is completed?

If it throws the exception you provided, the job's status should be failed.

Thanks, vino.

2018-08-06 23:42 GMT+08:00 Felipe Gutierrez <fe...@gmail.com>:

> yes. with this example (examples/streaming/WordCount.jar) my cluster
> worked.
>
> the file log/*out from the master is still empty and the file log/*out
> from the slave node has my result. The dashboard also shows that the job is
> completed.
>
> So, like you said there are some external dependencies that I didn`t
> include in my deploy. Do you have any clue?
>
> I am following the original quickstart (https://ci.apache.org/
> projects/flink/flink-docs-master/quickstart/setup_quickstart.html)
>
> Kind Regards,
> Felipe
>
>
>
>
> *--*
> *-- Felipe Gutierrez*
>
> *-- skype: felipe.o.gutierrez*
> *--* *https://felipeogutierrez.blogspot.com
> <https://felipeogutierrez.blogspot.com>*
>
>
> On Mon, Aug 6, 2018 at 5:01 PM Gary Yao <ga...@data-artisans.com> wrote:
>
>> Hi,
>>
>> nc exits after the first connection is closed. Are you re-running the nc
>> command every time the job finishes?
>>
>> The stacktrace you copied does not indicate that a TaskManager cannot
>> connect
>> to the JobManager. I can only see that the SocketTextStreamFunction (from
>> the
>> SocketWindowWordCount job?) cannot open the connection to the address
>> that you
>> specified.
>>
>> Can you try to run examples/streaming/WordCount.jar. It is a simpler job
>> which
>> does not rely on external dependencies.
>>
>> If all the above fails, can you tell us how you submit the job? Can you
>> post
>> the full command? Can you also post the full JobManager & TaskManager
>> logs?
>>
>> Best,
>> Gary
>>
>>
>>
>> On Mon, Aug 6, 2018 at 4:10 PM, Felipe Gutierrez <
>> felipe.o.gutierrez@gmail.com> wrote:
>>
>>> do you mean "nc -l 9000"? If so, I did start before.
>>> the task manager running on the master can connect to the job manager.
>>> but the task manager on the slave node cannot. The second time that I start
>>> the WordCount task it recognizes only one task manager (from the master)
>>> and runs my task. But the task manager from the slave does not process
>>> anything and it is started.
>>>
>>> here is the error stack trace from the slave node:
>>>
>>> 2017-05-30 05:10:39,853 INFO  org.apache.flink.runtime.state.heap.HeapKeyedStateBackend
>>>    - Initializing heap keyed state backend with stream factory.
>>> 2017-05-30 05:10:39,977 INFO  org.apache.flink.runtime.taskmanager.Task
>>>                    - Source: Socket Stream -> Flat Map (1/1) (
>>> d5e3d87395995d3977d2f472de896e23) switched from RUNNING to FAILED.
>>> java.net.ConnectException: Connection refused
>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>> AbstractPlainSocketImpl.java:350)
>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>> AbstractPlainSocketImpl.java:206)
>>> at java.net.AbstractPlainSocketImpl.connect(
>>> AbstractPlainSocketImpl.java:188)
>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>> at java.net.Socket.connect(Socket.java:589)
>>> at org.apache.flink.streaming.api.functions.source.
>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>> run(StreamSource.java:87)
>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>> run(StreamSource.java:56)
>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>> SourceStreamTask.java:99)
>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>> invoke(StreamTask.java:306)
>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2017-05-30 05:10:40,016 INFO  org.apache.flink.runtime.taskmanager.Task
>>>                    - Freeing task resources for Source: Socket Stream ->
>>> Flat Map (1/1) (d5e3d87395995d3977d2f472de896e23).
>>>
>>>
>>> *--*
>>> *-- Felipe Gutierrez*
>>>
>>> *-- skype: felipe.o.gutierrez*
>>> *--* *https://felipeogutierrez.blogspot.com
>>> <https://felipeogutierrez.blogspot.com>*
>>>
>>>
>>> On Mon, Aug 6, 2018 at 2:17 PM vino yang <ya...@gmail.com> wrote:
>>>
>>>> Hi Felipe,
>>>>
>>>> From the exception information, it seems that you did not start the
>>>> socket server, the socket source needs to connect to the socket server.
>>>>
>>>> Please make sure the socket server has started and is available.
>>>>
>>>> Thanks, vino.
>>>>
>>>> 2018-08-06 18:45 GMT+08:00 Felipe Gutierrez <
>>>> felipe.o.gutierrez@gmail.com>:
>>>>
>>>>> yes.
>>>>>
>>>>> when I execute the jps command on the master node I
>>>>> see TaskManagerRunner and StandaloneSessionClusterEntrypoint (which I
>>>>> believe it is the  jobManager). On the slave nodes I see TaskManagerRunner
>>>>> when I run jps command
>>>>>
>>>>>
>>>>> *--*
>>>>> *-- Felipe Gutierrez*
>>>>>
>>>>> *-- skype: felipe.o.gutierrez*
>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>
>>>>>
>>>>> On Mon, Aug 6, 2018 at 12:13 PM miki haiat <mi...@gmail.com> wrote:
>>>>>
>>>>>> Did you start job manager and task manager on the same resbery pi ?
>>>>>>
>>>>>> On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <
>>>>>> felipe.o.gutierrez@gmail.com> wrote:
>>>>>>
>>>>>>> Hello everyone,
>>>>>>>
>>>>>>> I am trying to run Flink on Raspberry Pis. My first test for word
>>>>>>> count in a single node worked. I just have to decrease the Heap memory of
>>>>>>> the jobmanager.heap.mb and taskmanager.heap.mb to 512.
>>>>>>> My second test is to add 2 slave nodes I got the error: "Java
>>>>>>> HotSpot(TM) Client VM warning: G1 GC is disabled in this release." at the
>>>>>>> file log/flink-root-taskexecutor-0-*.out.
>>>>>>>
>>>>>>> This link (https://blog.sflow.com/2016/06/raspberry-pi-real-time-
>>>>>>> network-analytics.html) says that in order to Raspberry Pi ARM
>>>>>>> architecture works with JVM it is necessary to configure the JVM as:
>>>>>>> -Xms600M
>>>>>>> -Xmx600M
>>>>>>> -XX:+UseParNewGC
>>>>>>> -XX:+UseConcMarkSweepGC
>>>>>>> -XX:+CMSIncrementalMode
>>>>>>>
>>>>>>> then I set this variables on the path inside the file flink-conf.yaml
>>>>>>> env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>>>> -XX:+CMSIncrementalMode"
>>>>>>> env.java.opts.jobmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>>>> -XX:+CMSIncrementalMode"
>>>>>>> env.java.opts.taskmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>>>> -XX:+CMSIncrementalMode"
>>>>>>>
>>>>>>> and the error "Java HotSpot(TM) Client VM warning: G1 GC is disabled
>>>>>>> in this release." is not showing anymore. However, the connection from the
>>>>>>> master node to the slave node is still not possible. Does anybody know how
>>>>>>> I must configure flink to deal with that?
>>>>>>>
>>>>>>> This is the error stack trace:
>>>>>>>
>>>>>>> 2017-05-25 12:40:26,421 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>       - Source: Socket Stream -> Flat Map (1/1) (
>>>>>>> b81b6492fc0860367be422d0b0bf4358) switched from DEPLOYING to
>>>>>>> RUNNING.
>>>>>>> 2017-05-25 12:40:26,891 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>       - Source: Socket Stream -> Flat Map (1/1) (
>>>>>>> b81b6492fc0860367be422d0b0bf4358) switched from RUNNING to FAILED.
>>>>>>> java.net.ConnectException: Connection refused
>>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>>>>>> AbstractPlainSocketImpl.java:350)
>>>>>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>>>>>> AbstractPlainSocketImpl.java:206)
>>>>>>> at java.net.AbstractPlainSocketImpl.connect(
>>>>>>> AbstractPlainSocketImpl.java:188)
>>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>>> at org.apache.flink.streaming.api.functions.source.
>>>>>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>>>> run(StreamSource.java:87)
>>>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>>>> run(StreamSource.java:56)
>>>>>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>>>>>> SourceStreamTask.java:99)
>>>>>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>>>>>> invoke(StreamTask.java:306)
>>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2017-05-25 12:40:26,898 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>       - Job Socket Window WordCount (71c6d7796eccf6587d9d1deda0490e09)
>>>>>>> switched from state RUNNING to FAILING.
>>>>>>> java.net.ConnectException: Connection refused
>>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>>>>>> AbstractPlainSocketImpl.java:350)
>>>>>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>>>>>> AbstractPlainSocketImpl.java:206)
>>>>>>> at java.net.AbstractPlainSocketImpl.connect(
>>>>>>> AbstractPlainSocketImpl.java:188)
>>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>>> at org.apache.flink.streaming.api.functions.source.
>>>>>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>>>> run(StreamSource.java:87)
>>>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>>>> run(StreamSource.java:56)
>>>>>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>>>>>> SourceStreamTask.java:99)
>>>>>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>>>>>> invoke(StreamTask.java:306)
>>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>>> 2017-05-25 12:40:26,921 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>       - Window(TumblingProcessingTimeWindows(5000),
>>>>>>> ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) ->
>>>>>>> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b)
>>>>>>> switched from RUNNING to CANCELING.
>>>>>>> 2017-05-25 12:40:26,975 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>>>       - Window(TumblingProcessingTimeWindows(5000),
>>>>>>> ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) ->
>>>>>>> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b)
>>>>>>> switched from CANCELING to CANCELED.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks, Felipe
>>>>>>> *--*
>>>>>>> *-- Felipe Gutierrez*
>>>>>>>
>>>>>>> *-- skype: felipe.o.gutierrez*
>>>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>>>
>>>>>>
>>>>
>>

Re: connection failed when running flink in a cluster

Posted by Felipe Gutierrez <fe...@gmail.com>.
yes. with this example (examples/streaming/WordCount.jar) my cluster worked.

the file log/*out from the master is still empty and the file log/*out from
the slave node has my result. The dashboard also shows that the job is
completed.

So, like you said there are some external dependencies that I didn`t
include in my deploy. Do you have any clue?

I am following the original quickstart (
https://ci.apache.org/projects/flink/flink-docs-master/quickstart/setup_quickstart.html
)

Kind Regards,
Felipe




*--*
*-- Felipe Gutierrez*

*-- skype: felipe.o.gutierrez*
*--* *https://felipeogutierrez.blogspot.com
<https://felipeogutierrez.blogspot.com>*


On Mon, Aug 6, 2018 at 5:01 PM Gary Yao <ga...@data-artisans.com> wrote:

> Hi,
>
> nc exits after the first connection is closed. Are you re-running the nc
> command every time the job finishes?
>
> The stacktrace you copied does not indicate that a TaskManager cannot
> connect
> to the JobManager. I can only see that the SocketTextStreamFunction (from
> the
> SocketWindowWordCount job?) cannot open the connection to the address that
> you
> specified.
>
> Can you try to run examples/streaming/WordCount.jar. It is a simpler job
> which
> does not rely on external dependencies.
>
> If all the above fails, can you tell us how you submit the job? Can you
> post
> the full command? Can you also post the full JobManager & TaskManager logs?
>
> Best,
> Gary
>
>
>
> On Mon, Aug 6, 2018 at 4:10 PM, Felipe Gutierrez <
> felipe.o.gutierrez@gmail.com> wrote:
>
>> do you mean "nc -l 9000"? If so, I did start before.
>> the task manager running on the master can connect to the job manager.
>> but the task manager on the slave node cannot. The second time that I start
>> the WordCount task it recognizes only one task manager (from the master)
>> and runs my task. But the task manager from the slave does not process
>> anything and it is started.
>>
>> here is the error stack trace from the slave node:
>>
>> 2017-05-30 05:10:39,853 INFO
>> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend     -
>> Initializing heap keyed state backend with stream factory.
>> 2017-05-30 05:10:39,977 INFO  org.apache.flink.runtime.taskmanager.Task
>>                    - Source: Socket Stream -> Flat Map (1/1)
>> (d5e3d87395995d3977d2f472de896e23) switched from RUNNING to FAILED.
>> java.net.ConnectException: Connection refused
>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>> at java.net
>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>> at java.net
>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>> at java.net
>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>> at java.net.Socket.connect(Socket.java:589)
>> at
>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>> at
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>> at
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>> at
>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>> at java.lang.Thread.run(Thread.java:745)
>> 2017-05-30 05:10:40,016 INFO  org.apache.flink.runtime.taskmanager.Task
>>                    - Freeing task resources for Source: Socket Stream ->
>> Flat Map (1/1) (d5e3d87395995d3977d2f472de896e23).
>>
>>
>> *--*
>> *-- Felipe Gutierrez*
>>
>> *-- skype: felipe.o.gutierrez*
>> *--* *https://felipeogutierrez.blogspot.com
>> <https://felipeogutierrez.blogspot.com>*
>>
>>
>> On Mon, Aug 6, 2018 at 2:17 PM vino yang <ya...@gmail.com> wrote:
>>
>>> Hi Felipe,
>>>
>>> From the exception information, it seems that you did not start the
>>> socket server, the socket source needs to connect to the socket server.
>>>
>>> Please make sure the socket server has started and is available.
>>>
>>> Thanks, vino.
>>>
>>> 2018-08-06 18:45 GMT+08:00 Felipe Gutierrez <
>>> felipe.o.gutierrez@gmail.com>:
>>>
>>>> yes.
>>>>
>>>> when I execute the jps command on the master node I
>>>> see TaskManagerRunner and StandaloneSessionClusterEntrypoint (which I
>>>> believe it is the  jobManager). On the slave nodes I see TaskManagerRunner
>>>> when I run jps command
>>>>
>>>>
>>>> *--*
>>>> *-- Felipe Gutierrez*
>>>>
>>>> *-- skype: felipe.o.gutierrez*
>>>> *--* *https://felipeogutierrez.blogspot.com
>>>> <https://felipeogutierrez.blogspot.com>*
>>>>
>>>>
>>>> On Mon, Aug 6, 2018 at 12:13 PM miki haiat <mi...@gmail.com> wrote:
>>>>
>>>>> Did you start job manager and task manager on the same resbery pi ?
>>>>>
>>>>> On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <
>>>>> felipe.o.gutierrez@gmail.com> wrote:
>>>>>
>>>>>> Hello everyone,
>>>>>>
>>>>>> I am trying to run Flink on Raspberry Pis. My first test for word
>>>>>> count in a single node worked. I just have to decrease the Heap memory of
>>>>>> the jobmanager.heap.mb and taskmanager.heap.mb to 512.
>>>>>> My second test is to add 2 slave nodes I got the error: "Java
>>>>>> HotSpot(TM) Client VM warning: G1 GC is disabled in this release." at the
>>>>>> file log/flink-root-taskexecutor-0-*.out.
>>>>>>
>>>>>> This link (
>>>>>> https://blog.sflow.com/2016/06/raspberry-pi-real-time-network-analytics.html)
>>>>>> says that in order to Raspberry Pi ARM architecture works with JVM it is
>>>>>> necessary to configure the JVM as:
>>>>>> -Xms600M
>>>>>> -Xmx600M
>>>>>> -XX:+UseParNewGC
>>>>>> -XX:+UseConcMarkSweepGC
>>>>>> -XX:+CMSIncrementalMode
>>>>>>
>>>>>> then I set this variables on the path inside the file flink-conf.yaml
>>>>>> env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>>> -XX:+CMSIncrementalMode"
>>>>>> env.java.opts.jobmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>>> -XX:+CMSIncrementalMode"
>>>>>> env.java.opts.taskmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>>> -XX:+CMSIncrementalMode"
>>>>>>
>>>>>> and the error "Java HotSpot(TM) Client VM warning: G1 GC is disabled
>>>>>> in this release." is not showing anymore. However, the connection from the
>>>>>> master node to the slave node is still not possible. Does anybody know how
>>>>>> I must configure flink to deal with that?
>>>>>>
>>>>>> This is the error stack trace:
>>>>>>
>>>>>> 2017-05-25 12:40:26,421 INFO
>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>>>>> Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
>>>>>> from DEPLOYING to RUNNING.
>>>>>> 2017-05-25 12:40:26,891 INFO
>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>>>>> Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
>>>>>> from RUNNING to FAILED.
>>>>>> java.net.ConnectException: Connection refused
>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>> at java.net
>>>>>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>>>>> at java.net
>>>>>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>>>>> at java.net
>>>>>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>> at
>>>>>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>> at
>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>>>>>> at
>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>>>>>> at
>>>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>>>>>> at
>>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2017-05-25 12:40:26,898 INFO
>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Socket
>>>>>> Window WordCount (71c6d7796eccf6587d9d1deda0490e09) switched from state
>>>>>> RUNNING to FAILING.
>>>>>> java.net.ConnectException: Connection refused
>>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>>> at java.net
>>>>>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>>>>> at java.net
>>>>>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>>>>> at java.net
>>>>>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>>> at
>>>>>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>>> at
>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>>>>>> at
>>>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>>>>>> at
>>>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>>>>>> at
>>>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>>> 2017-05-25 12:40:26,921 INFO
>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        -
>>>>>> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
>>>>>> ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
>>>>>> (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from RUNNING to CANCELING.
>>>>>> 2017-05-25 12:40:26,975 INFO
>>>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        -
>>>>>> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
>>>>>> ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
>>>>>> (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from CANCELING to
>>>>>> CANCELED.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks, Felipe
>>>>>> *--*
>>>>>> *-- Felipe Gutierrez*
>>>>>>
>>>>>> *-- skype: felipe.o.gutierrez*
>>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>>
>>>>>
>>>
>

Re: connection failed when running flink in a cluster

Posted by Gary Yao <ga...@data-artisans.com>.
Hi,

nc exits after the first connection is closed. Are you re-running the nc
command every time the job finishes?

The stacktrace you copied does not indicate that a TaskManager cannot
connect
to the JobManager. I can only see that the SocketTextStreamFunction (from
the
SocketWindowWordCount job?) cannot open the connection to the address that
you
specified.

Can you try to run examples/streaming/WordCount.jar. It is a simpler job
which
does not rely on external dependencies.

If all the above fails, can you tell us how you submit the job? Can you post
the full command? Can you also post the full JobManager & TaskManager logs?

Best,
Gary



On Mon, Aug 6, 2018 at 4:10 PM, Felipe Gutierrez <
felipe.o.gutierrez@gmail.com> wrote:

> do you mean "nc -l 9000"? If so, I did start before.
> the task manager running on the master can connect to the job manager. but
> the task manager on the slave node cannot. The second time that I start the
> WordCount task it recognizes only one task manager (from the master) and
> runs my task. But the task manager from the slave does not process anything
> and it is started.
>
> here is the error stack trace from the slave node:
>
> 2017-05-30 05:10:39,853 INFO  org.apache.flink.runtime.state.heap.HeapKeyedStateBackend
>    - Initializing heap keyed state backend with stream factory.
> 2017-05-30 05:10:39,977 INFO  org.apache.flink.runtime.taskmanager.Task
>                    - Source: Socket Stream -> Flat Map (1/1) (
> d5e3d87395995d3977d2f472de896e23) switched from RUNNING to FAILED.
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.AbstractPlainSocketImpl.doConnect(
> AbstractPlainSocketImpl.java:350)
> at java.net.AbstractPlainSocketImpl.connectToAddress(
> AbstractPlainSocketImpl.java:206)
> at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:
> 188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at org.apache.flink.streaming.api.functions.source.
> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
> at org.apache.flink.streaming.api.operators.StreamSource.
> run(StreamSource.java:87)
> at org.apache.flink.streaming.api.operators.StreamSource.
> run(StreamSource.java:56)
> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
> SourceStreamTask.java:99)
> at org.apache.flink.streaming.runtime.tasks.StreamTask.
> invoke(StreamTask.java:306)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
> at java.lang.Thread.run(Thread.java:745)
> 2017-05-30 05:10:40,016 INFO  org.apache.flink.runtime.taskmanager.Task
>                    - Freeing task resources for Source: Socket Stream ->
> Flat Map (1/1) (d5e3d87395995d3977d2f472de896e23).
>
>
> *--*
> *-- Felipe Gutierrez*
>
> *-- skype: felipe.o.gutierrez*
> *--* *https://felipeogutierrez.blogspot.com
> <https://felipeogutierrez.blogspot.com>*
>
>
> On Mon, Aug 6, 2018 at 2:17 PM vino yang <ya...@gmail.com> wrote:
>
>> Hi Felipe,
>>
>> From the exception information, it seems that you did not start the
>> socket server, the socket source needs to connect to the socket server.
>>
>> Please make sure the socket server has started and is available.
>>
>> Thanks, vino.
>>
>> 2018-08-06 18:45 GMT+08:00 Felipe Gutierrez <felipe.o.gutierrez@gmail.com
>> >:
>>
>>> yes.
>>>
>>> when I execute the jps command on the master node I
>>> see TaskManagerRunner and StandaloneSessionClusterEntrypoint (which I
>>> believe it is the  jobManager). On the slave nodes I see TaskManagerRunner
>>> when I run jps command
>>>
>>>
>>> *--*
>>> *-- Felipe Gutierrez*
>>>
>>> *-- skype: felipe.o.gutierrez*
>>> *--* *https://felipeogutierrez.blogspot.com
>>> <https://felipeogutierrez.blogspot.com>*
>>>
>>>
>>> On Mon, Aug 6, 2018 at 12:13 PM miki haiat <mi...@gmail.com> wrote:
>>>
>>>> Did you start job manager and task manager on the same resbery pi ?
>>>>
>>>> On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <
>>>> felipe.o.gutierrez@gmail.com> wrote:
>>>>
>>>>> Hello everyone,
>>>>>
>>>>> I am trying to run Flink on Raspberry Pis. My first test for word
>>>>> count in a single node worked. I just have to decrease the Heap memory of
>>>>> the jobmanager.heap.mb and taskmanager.heap.mb to 512.
>>>>> My second test is to add 2 slave nodes I got the error: "Java
>>>>> HotSpot(TM) Client VM warning: G1 GC is disabled in this release." at the
>>>>> file log/flink-root-taskexecutor-0-*.out.
>>>>>
>>>>> This link (https://blog.sflow.com/2016/06/raspberry-pi-real-time-
>>>>> network-analytics.html) says that in order to Raspberry Pi ARM
>>>>> architecture works with JVM it is necessary to configure the JVM as:
>>>>> -Xms600M
>>>>> -Xmx600M
>>>>> -XX:+UseParNewGC
>>>>> -XX:+UseConcMarkSweepGC
>>>>> -XX:+CMSIncrementalMode
>>>>>
>>>>> then I set this variables on the path inside the file flink-conf.yaml
>>>>> env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>> -XX:+CMSIncrementalMode"
>>>>> env.java.opts.jobmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>> -XX:+CMSIncrementalMode"
>>>>> env.java.opts.taskmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>>> -XX:+CMSIncrementalMode"
>>>>>
>>>>> and the error "Java HotSpot(TM) Client VM warning: G1 GC is disabled
>>>>> in this release." is not showing anymore. However, the connection from the
>>>>> master node to the slave node is still not possible. Does anybody know how
>>>>> I must configure flink to deal with that?
>>>>>
>>>>> This is the error stack trace:
>>>>>
>>>>> 2017-05-25 12:40:26,421 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>       - Source: Socket Stream -> Flat Map (1/1) (
>>>>> b81b6492fc0860367be422d0b0bf4358) switched from DEPLOYING to RUNNING.
>>>>> 2017-05-25 12:40:26,891 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>       - Source: Socket Stream -> Flat Map (1/1) (
>>>>> b81b6492fc0860367be422d0b0bf4358) switched from RUNNING to FAILED.
>>>>> java.net.ConnectException: Connection refused
>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>>>> AbstractPlainSocketImpl.java:350)
>>>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>>>> AbstractPlainSocketImpl.java:206)
>>>>> at java.net.AbstractPlainSocketImpl.connect(
>>>>> AbstractPlainSocketImpl.java:188)
>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>> at org.apache.flink.streaming.api.functions.source.
>>>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>> run(StreamSource.java:87)
>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>> run(StreamSource.java:56)
>>>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>>>> SourceStreamTask.java:99)
>>>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>>>> invoke(StreamTask.java:306)
>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2017-05-25 12:40:26,898 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>       - Job Socket Window WordCount (71c6d7796eccf6587d9d1deda0490e09)
>>>>> switched from state RUNNING to FAILING.
>>>>> java.net.ConnectException: Connection refused
>>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>>>> AbstractPlainSocketImpl.java:350)
>>>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>>>> AbstractPlainSocketImpl.java:206)
>>>>> at java.net.AbstractPlainSocketImpl.connect(
>>>>> AbstractPlainSocketImpl.java:188)
>>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>>> at java.net.Socket.connect(Socket.java:589)
>>>>> at org.apache.flink.streaming.api.functions.source.
>>>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>> run(StreamSource.java:87)
>>>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>>>> run(StreamSource.java:56)
>>>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>>>> SourceStreamTask.java:99)
>>>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>>>> invoke(StreamTask.java:306)
>>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>>> at java.lang.Thread.run(Thread.java:745)
>>>>> 2017-05-25 12:40:26,921 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>       - Window(TumblingProcessingTimeWindows(5000),
>>>>> ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) ->
>>>>> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b)
>>>>> switched from RUNNING to CANCELING.
>>>>> 2017-05-25 12:40:26,975 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>>>       - Window(TumblingProcessingTimeWindows(5000),
>>>>> ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) ->
>>>>> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b)
>>>>> switched from CANCELING to CANCELED.
>>>>>
>>>>>
>>>>>
>>>>> Thanks, Felipe
>>>>> *--*
>>>>> *-- Felipe Gutierrez*
>>>>>
>>>>> *-- skype: felipe.o.gutierrez*
>>>>> *--* *https://felipeogutierrez.blogspot.com
>>>>> <https://felipeogutierrez.blogspot.com>*
>>>>>
>>>>
>>

Re: connection failed when running flink in a cluster

Posted by Felipe Gutierrez <fe...@gmail.com>.
do you mean "nc -l 9000"? If so, I did start before.
the task manager running on the master can connect to the job manager. but
the task manager on the slave node cannot. The second time that I start the
WordCount task it recognizes only one task manager (from the master) and
runs my task. But the task manager from the slave does not process anything
and it is started.

here is the error stack trace from the slave node:

2017-05-30 05:10:39,853 INFO
org.apache.flink.runtime.state.heap.HeapKeyedStateBackend     -
Initializing heap keyed state backend with stream factory.
2017-05-30 05:10:39,977 INFO  org.apache.flink.runtime.taskmanager.Task
                 - Source: Socket Stream -> Flat Map (1/1)
(d5e3d87395995d3977d2f472de896e23) switched from RUNNING to FAILED.
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at
org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
at
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:745)
2017-05-30 05:10:40,016 INFO  org.apache.flink.runtime.taskmanager.Task
                 - Freeing task resources for Source: Socket Stream -> Flat
Map (1/1) (d5e3d87395995d3977d2f472de896e23).


*--*
*-- Felipe Gutierrez*

*-- skype: felipe.o.gutierrez*
*--* *https://felipeogutierrez.blogspot.com
<https://felipeogutierrez.blogspot.com>*


On Mon, Aug 6, 2018 at 2:17 PM vino yang <ya...@gmail.com> wrote:

> Hi Felipe,
>
> From the exception information, it seems that you did not start the socket
> server, the socket source needs to connect to the socket server.
>
> Please make sure the socket server has started and is available.
>
> Thanks, vino.
>
> 2018-08-06 18:45 GMT+08:00 Felipe Gutierrez <fe...@gmail.com>
> :
>
>> yes.
>>
>> when I execute the jps command on the master node I see TaskManagerRunner
>> and StandaloneSessionClusterEntrypoint (which I believe it is
>> the  jobManager). On the slave nodes I see TaskManagerRunner when I run jps
>> command
>>
>>
>> *--*
>> *-- Felipe Gutierrez*
>>
>> *-- skype: felipe.o.gutierrez*
>> *--* *https://felipeogutierrez.blogspot.com
>> <https://felipeogutierrez.blogspot.com>*
>>
>>
>> On Mon, Aug 6, 2018 at 12:13 PM miki haiat <mi...@gmail.com> wrote:
>>
>>> Did you start job manager and task manager on the same resbery pi ?
>>>
>>> On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <
>>> felipe.o.gutierrez@gmail.com> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> I am trying to run Flink on Raspberry Pis. My first test for word count
>>>> in a single node worked. I just have to decrease the Heap memory of the
>>>> jobmanager.heap.mb and taskmanager.heap.mb to 512.
>>>> My second test is to add 2 slave nodes I got the error: "Java
>>>> HotSpot(TM) Client VM warning: G1 GC is disabled in this release." at the
>>>> file log/flink-root-taskexecutor-0-*.out.
>>>>
>>>> This link (
>>>> https://blog.sflow.com/2016/06/raspberry-pi-real-time-network-analytics.html)
>>>> says that in order to Raspberry Pi ARM architecture works with JVM it is
>>>> necessary to configure the JVM as:
>>>> -Xms600M
>>>> -Xmx600M
>>>> -XX:+UseParNewGC
>>>> -XX:+UseConcMarkSweepGC
>>>> -XX:+CMSIncrementalMode
>>>>
>>>> then I set this variables on the path inside the file flink-conf.yaml
>>>> env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>> -XX:+CMSIncrementalMode"
>>>> env.java.opts.jobmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>> -XX:+CMSIncrementalMode"
>>>> env.java.opts.taskmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>>> -XX:+CMSIncrementalMode"
>>>>
>>>> and the error "Java HotSpot(TM) Client VM warning: G1 GC is disabled in
>>>> this release." is not showing anymore. However, the connection from the
>>>> master node to the slave node is still not possible. Does anybody know how
>>>> I must configure flink to deal with that?
>>>>
>>>> This is the error stack trace:
>>>>
>>>> 2017-05-25 12:40:26,421 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>>> Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
>>>> from DEPLOYING to RUNNING.
>>>> 2017-05-25 12:40:26,891 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>>>> Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
>>>> from RUNNING to FAILED.
>>>> java.net.ConnectException: Connection refused
>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>> at java.net
>>>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>>> at java.net
>>>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>>> at java.net
>>>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>> at java.net.Socket.connect(Socket.java:589)
>>>> at
>>>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>> at
>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>>>> at
>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2017-05-25 12:40:26,898 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Socket
>>>> Window WordCount (71c6d7796eccf6587d9d1deda0490e09) switched from state
>>>> RUNNING to FAILING.
>>>> java.net.ConnectException: Connection refused
>>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>> at java.net
>>>> .AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>>>> at java.net
>>>> .AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>>>> at java.net
>>>> .AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>>> at java.net.Socket.connect(Socket.java:589)
>>>> at
>>>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>>> at
>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>>>> at
>>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>>>> at
>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>> 2017-05-25 12:40:26,921 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        -
>>>> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
>>>> ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
>>>> (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from RUNNING to CANCELING.
>>>> 2017-05-25 12:40:26,975 INFO
>>>> org.apache.flink.runtime.executiongraph.ExecutionGraph        -
>>>> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
>>>> ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
>>>> (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from CANCELING to
>>>> CANCELED.
>>>>
>>>>
>>>>
>>>> Thanks, Felipe
>>>> *--*
>>>> *-- Felipe Gutierrez*
>>>>
>>>> *-- skype: felipe.o.gutierrez*
>>>> *--* *https://felipeogutierrez.blogspot.com
>>>> <https://felipeogutierrez.blogspot.com>*
>>>>
>>>
>

Re: connection failed when running flink in a cluster

Posted by vino yang <ya...@gmail.com>.
Hi Felipe,

From the exception information, it seems that you did not start the socket
server, the socket source needs to connect to the socket server.

Please make sure the socket server has started and is available.

Thanks, vino.

2018-08-06 18:45 GMT+08:00 Felipe Gutierrez <fe...@gmail.com>:

> yes.
>
> when I execute the jps command on the master node I see TaskManagerRunner
> and StandaloneSessionClusterEntrypoint (which I believe it is
> the  jobManager). On the slave nodes I see TaskManagerRunner when I run jps
> command
>
>
> *--*
> *-- Felipe Gutierrez*
>
> *-- skype: felipe.o.gutierrez*
> *--* *https://felipeogutierrez.blogspot.com
> <https://felipeogutierrez.blogspot.com>*
>
>
> On Mon, Aug 6, 2018 at 12:13 PM miki haiat <mi...@gmail.com> wrote:
>
>> Did you start job manager and task manager on the same resbery pi ?
>>
>> On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <fe...@gmail.com>
>> wrote:
>>
>>> Hello everyone,
>>>
>>> I am trying to run Flink on Raspberry Pis. My first test for word count
>>> in a single node worked. I just have to decrease the Heap memory of the
>>> jobmanager.heap.mb and taskmanager.heap.mb to 512.
>>> My second test is to add 2 slave nodes I got the error: "Java
>>> HotSpot(TM) Client VM warning: G1 GC is disabled in this release." at the
>>> file log/flink-root-taskexecutor-0-*.out.
>>>
>>> This link (https://blog.sflow.com/2016/06/raspberry-pi-real-time-
>>> network-analytics.html) says that in order to Raspberry Pi ARM
>>> architecture works with JVM it is necessary to configure the JVM as:
>>> -Xms600M
>>> -Xmx600M
>>> -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC
>>> -XX:+CMSIncrementalMode
>>>
>>> then I set this variables on the path inside the file flink-conf.yaml
>>> env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>> -XX:+CMSIncrementalMode"
>>> env.java.opts.jobmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>> -XX:+CMSIncrementalMode"
>>> env.java.opts.taskmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>>> -XX:+CMSIncrementalMode"
>>>
>>> and the error "Java HotSpot(TM) Client VM warning: G1 GC is disabled in
>>> this release." is not showing anymore. However, the connection from the
>>> master node to the slave node is still not possible. Does anybody know how
>>> I must configure flink to deal with that?
>>>
>>> This is the error stack trace:
>>>
>>> 2017-05-25 12:40:26,421 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>       - Source: Socket Stream -> Flat Map (1/1) (
>>> b81b6492fc0860367be422d0b0bf4358) switched from DEPLOYING to RUNNING.
>>> 2017-05-25 12:40:26,891 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>       - Source: Socket Stream -> Flat Map (1/1) (
>>> b81b6492fc0860367be422d0b0bf4358) switched from RUNNING to FAILED.
>>> java.net.ConnectException: Connection refused
>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>> AbstractPlainSocketImpl.java:350)
>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>> AbstractPlainSocketImpl.java:206)
>>> at java.net.AbstractPlainSocketImpl.connect(
>>> AbstractPlainSocketImpl.java:188)
>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>> at java.net.Socket.connect(Socket.java:589)
>>> at org.apache.flink.streaming.api.functions.source.
>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>> run(StreamSource.java:87)
>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>> run(StreamSource.java:56)
>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>> SourceStreamTask.java:99)
>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>> invoke(StreamTask.java:306)
>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2017-05-25 12:40:26,898 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>       - Job Socket Window WordCount (71c6d7796eccf6587d9d1deda0490e09)
>>> switched from state RUNNING to FAILING.
>>> java.net.ConnectException: Connection refused
>>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>>> at java.net.AbstractPlainSocketImpl.doConnect(
>>> AbstractPlainSocketImpl.java:350)
>>> at java.net.AbstractPlainSocketImpl.connectToAddress(
>>> AbstractPlainSocketImpl.java:206)
>>> at java.net.AbstractPlainSocketImpl.connect(
>>> AbstractPlainSocketImpl.java:188)
>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>>> at java.net.Socket.connect(Socket.java:589)
>>> at org.apache.flink.streaming.api.functions.source.
>>> SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>> run(StreamSource.java:87)
>>> at org.apache.flink.streaming.api.operators.StreamSource.
>>> run(StreamSource.java:56)
>>> at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(
>>> SourceStreamTask.java:99)
>>> at org.apache.flink.streaming.runtime.tasks.StreamTask.
>>> invoke(StreamTask.java:306)
>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>>> at java.lang.Thread.run(Thread.java:745)
>>> 2017-05-25 12:40:26,921 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>       - Window(TumblingProcessingTimeWindows(5000),
>>> ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) ->
>>> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b)
>>> switched from RUNNING to CANCELING.
>>> 2017-05-25 12:40:26,975 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>       - Window(TumblingProcessingTimeWindows(5000),
>>> ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) ->
>>> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b)
>>> switched from CANCELING to CANCELED.
>>>
>>>
>>>
>>> Thanks, Felipe
>>> *--*
>>> *-- Felipe Gutierrez*
>>>
>>> *-- skype: felipe.o.gutierrez*
>>> *--* *https://felipeogutierrez.blogspot.com
>>> <https://felipeogutierrez.blogspot.com>*
>>>
>>

Re: connection failed when running flink in a cluster

Posted by Felipe Gutierrez <fe...@gmail.com>.
yes.

when I execute the jps command on the master node I see TaskManagerRunner
and StandaloneSessionClusterEntrypoint (which I believe it is
the  jobManager). On the slave nodes I see TaskManagerRunner when I run jps
command


*--*
*-- Felipe Gutierrez*

*-- skype: felipe.o.gutierrez*
*--* *https://felipeogutierrez.blogspot.com
<https://felipeogutierrez.blogspot.com>*


On Mon, Aug 6, 2018 at 12:13 PM miki haiat <mi...@gmail.com> wrote:

> Did you start job manager and task manager on the same resbery pi ?
>
> On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <fe...@gmail.com>
> wrote:
>
>> Hello everyone,
>>
>> I am trying to run Flink on Raspberry Pis. My first test for word count
>> in a single node worked. I just have to decrease the Heap memory of the
>> jobmanager.heap.mb and taskmanager.heap.mb to 512.
>> My second test is to add 2 slave nodes I got the error: "Java HotSpot(TM)
>> Client VM warning: G1 GC is disabled in this release." at the file
>> log/flink-root-taskexecutor-0-*.out.
>>
>> This link (
>> https://blog.sflow.com/2016/06/raspberry-pi-real-time-network-analytics.html)
>> says that in order to Raspberry Pi ARM architecture works with JVM it is
>> necessary to configure the JVM as:
>> -Xms600M
>> -Xmx600M
>> -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC
>> -XX:+CMSIncrementalMode
>>
>> then I set this variables on the path inside the file flink-conf.yaml
>> env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>> -XX:+CMSIncrementalMode"
>> env.java.opts.jobmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>> -XX:+CMSIncrementalMode"
>> env.java.opts.taskmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>> -XX:+CMSIncrementalMode"
>>
>> and the error "Java HotSpot(TM) Client VM warning: G1 GC is disabled in
>> this release." is not showing anymore. However, the connection from the
>> master node to the slave node is still not possible. Does anybody know how
>> I must configure flink to deal with that?
>>
>> This is the error stack trace:
>>
>> 2017-05-25 12:40:26,421 INFO
>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>> Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
>> from DEPLOYING to RUNNING.
>> 2017-05-25 12:40:26,891 INFO
>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
>> Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
>> from RUNNING to FAILED.
>> java.net.ConnectException: Connection refused
>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>> at
>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>> at
>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>> at
>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>> at java.net.Socket.connect(Socket.java:589)
>> at
>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>> at
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>> at
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>> at
>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>> at java.lang.Thread.run(Thread.java:745)
>> 2017-05-25 12:40:26,898 INFO
>> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Socket
>> Window WordCount (71c6d7796eccf6587d9d1deda0490e09) switched from state
>> RUNNING to FAILING.
>> java.net.ConnectException: Connection refused
>> at java.net.PlainSocketImpl.socketConnect(Native Method)
>> at
>> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>> at
>> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>> at
>> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>> at java.net.Socket.connect(Socket.java:589)
>> at
>> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
>> at
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
>> at
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
>> at
>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
>> at java.lang.Thread.run(Thread.java:745)
>> 2017-05-25 12:40:26,921 INFO
>> org.apache.flink.runtime.executiongraph.ExecutionGraph        -
>> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
>> ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
>> (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from RUNNING to CANCELING.
>> 2017-05-25 12:40:26,975 INFO
>> org.apache.flink.runtime.executiongraph.ExecutionGraph        -
>> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
>> ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
>> (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from CANCELING to
>> CANCELED.
>>
>>
>>
>> Thanks, Felipe
>> *--*
>> *-- Felipe Gutierrez*
>>
>> *-- skype: felipe.o.gutierrez*
>> *--* *https://felipeogutierrez.blogspot.com
>> <https://felipeogutierrez.blogspot.com>*
>>
>

Re: connection failed when running flink in a cluster

Posted by miki haiat <mi...@gmail.com>.
Did you start job manager and task manager on the same resbery pi ?

On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <fe...@gmail.com>
wrote:

> Hello everyone,
>
> I am trying to run Flink on Raspberry Pis. My first test for word count in
> a single node worked. I just have to decrease the Heap memory of the
> jobmanager.heap.mb and taskmanager.heap.mb to 512.
> My second test is to add 2 slave nodes I got the error: "Java HotSpot(TM)
> Client VM warning: G1 GC is disabled in this release." at the file
> log/flink-root-taskexecutor-0-*.out.
>
> This link (
> https://blog.sflow.com/2016/06/raspberry-pi-real-time-network-analytics.html)
> says that in order to Raspberry Pi ARM architecture works with JVM it is
> necessary to configure the JVM as:
> -Xms600M
> -Xmx600M
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode
>
> then I set this variables on the path inside the file flink-conf.yaml
> env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode"
> env.java.opts.jobmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode"
> env.java.opts.taskmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode"
>
> and the error "Java HotSpot(TM) Client VM warning: G1 GC is disabled in
> this release." is not showing anymore. However, the connection from the
> master node to the slave node is still not possible. Does anybody know how
> I must configure flink to deal with that?
>
> This is the error stack trace:
>
> 2017-05-25 12:40:26,421 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
> from DEPLOYING to RUNNING.
> 2017-05-25 12:40:26,891 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched
> from RUNNING to FAILED.
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at
> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
> at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
> at java.lang.Thread.run(Thread.java:745)
> 2017-05-25 12:40:26,898 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Socket
> Window WordCount (71c6d7796eccf6587d9d1deda0490e09) switched from state
> RUNNING to FAILING.
> java.net.ConnectException: Connection refused
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
> at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
> at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at
> org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
> at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
> at java.lang.Thread.run(Thread.java:745)
> 2017-05-25 12:40:26,921 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        -
> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
> ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
> (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from RUNNING to CANCELING.
> 2017-05-25 12:40:26,975 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        -
> Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger,
> ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out
> (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from CANCELING to
> CANCELED.
>
>
>
> Thanks, Felipe
> *--*
> *-- Felipe Gutierrez*
>
> *-- skype: felipe.o.gutierrez*
> *--* *https://felipeogutierrez.blogspot.com
> <https://felipeogutierrez.blogspot.com>*
>