You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Dulaj Viduranga <vi...@icloud.com> on 2015/03/02 09:30:03 UTC

Re: Could not build up connection to JobManager

Hi,
I found the fix for this issue and I'll create a pull request in the following day.

Re: Could not build up connection to JobManager

Posted by Till Rohrmann <tr...@apache.org>.

Could you please upload the logs? They would be really helpful.

On Mon, Mar 16, 2015 at 6:11 PM, Dulaj Viduranga <vi...@icloud.com>
wrote:

> Hi,
> I tested the update but it’s still the same. I think it isn’t a problem
> with my system because, I have an XAMPP server working totally fine (I
> tried with it is shut down as well) and also I doubly checked hosts files.
> I had little snitch installed but I also tried uninstalling it.
> Isn’t there a way around without using DNS to resolve localhost?
>
> > On Mar 16, 2015, at 10:04 PM, Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > It is really strange. It's right that the CliFrontend now resolves
> > localhost to the correct local address 10.218.100.122. Moreover,
> according
> > to the logs, the JobManager is also started and binds to akka.tcp://
> > flink@10.218.100.122:6123. According to the logs, this is also the
> address
> > the CliFrontend uses to connect to the JobManager. If the timestamps are
> > correct, then the JobManager was still alive when the job was sent. I
> don't
> > really understand why this happens. Can it be that the CliFrontend which
> > binds to 127.0.0.1 cannot communicate with 10.218.100.122? Can it be that
> > you have some settings which prevent this? For the failing 127.0.0.1
> case,
> > it would be helpful to have access to the JobManager log.
> >
> > I've updated the branch
> > https://github.com/tillrohrmann/flink/tree/fixJobClient with a new fix
> for
> > the "localhost" scenario. Could you try it out again? Thanks a lot for
> your
> > help.
> >
> > Best regards,
> >
> > Till
> >
> > On Mon, Mar 16, 2015 at 10:30 AM, Ufuk Celebi <uc...@apache.org> wrote:
> >
> >> There was an issue for this:
> >> https://issues.apache.org/jira/browse/FLINK-1634
> >>
> >> Can we close it then?
> >>
> >> On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga <vi...@icloud.com>
> >> wrote:
> >>
> >>> Hay Stephan,
> >>> Great to know you could fix the issue. Thank you on the update.
> >>> Best regards.
> >>>
> >>>> On Mar 14, 2015, at 9:19 PM, Stephan Ewen <se...@apache.org> wrote:
> >>>>
> >>>> Hey Dulaj!
> >>>>
> >>>> Forget what I said in the previous email. The issue with the wrong
> >>> address
> >>>> binding seems to be solved now. There is another issue that the
> >> embedded
> >>>> taskmanager does not start properly, for whatever reason. My gut
> >> feeling
> >>> is
> >>>> that there is something wrong
> >>>>
> >>>> There is a patch pending that changes the startup behavior to debug
> >> these
> >>>> situations much easier. I'll ping you as soon as that is in...
> >>>>
> >>>>
> >>>> Stephan
> >>>>
> >>>> On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen <se...@apache.org>
> >> wrote:
> >>>>
> >>>>> Hey Dulaj!
> >>>>>
> >>>>> One thing you can try is to add to the JVM startup options (in the
> >>> scripts
> >>>>> in the "bin" folder) the option "-Djava.net.preferIPv4Stack=true" and
> >>> see
> >>>>> if that helps it?
> >>>>>
> >>>>> Stephan
> >>>>>
> >>>>>
> >>>>> On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga <
> >> vidura.me@icloud.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>> Still this is no luck. I’ll upload the logs with configuration
> >>>>>> “localhost" as well as “127.0.0.1” so you can take a look.
> >>>>>>
> >>>>>> 127.0.0.1
> >>>>>> flink-Vidura-flink-client-localhost.log <
> >>>>>>
> >>>
> >>
> https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
> >>>>>>>
> >>>>>>
> >>>>>> localhost
> >>>>>> flink-Vidura-flink-client-localhost.log <
> >>>>>>
> >>>
> >>
> https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
> >>>>>>>
> >>>>>> flink-Vidura-jobmanager-localhost.log <
> >>>>>>
> >>>
> >>
> https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
> >>>>>>>
> >>>>>>
> >>>>>>> On Mar 11, 2015, at 11:32 PM, Till Rohrmann <tr...@apache.org>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi Dulaj,
> >>>>>>>
> >>>>>>> sorry for my late response. It looks as if the JobClient tries to
> >>>>>> connect
> >>>>>>> to the JobManager using its IPv6 instead of IPv4. Akka is really
> >> picky
> >>>>>> when
> >>>>>>> it comes to remote address. If Akka binds to the FQDN, then other
> >>>>>>> ActorSystem which try to connect to it using its IP address won't
> be
> >>>>>>> successful. I assume that this might be a problem. I tried to fix
> >> it.
> >>>>>> You
> >>>>>>> can find it here [1]. Could you please try it out by starting a
> >> local
> >>>>>>> cluster with the start-local.sh script. If it fails, could you
> >> please
> >>>>>> send
> >>>>>>> me all log files (client, jobmanager and taskmanager). Once we
> >> figured
> >>>>>> out
> >>>>>>> why the JobCilent does not connect, we can try to tackle the
> >>> BlobServer
> >>>>>>> issue.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>>
> >>>>>>> Till
> >>>>>>>
> >>>>>>> [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
> >>>>>>>
> >>>>>>> On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga <
> >> vidura.me@icloud.com
> >>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>> The error message is,
> >>>>>>>>
> >>>>>>>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
> >>>>>>>>     - Unable to load native-hadoop library for your platform...
> >>> using
> >>>>>>>> builtin-java classes where applicable
> >>>>>>>> org.apache.flink.client.program.ProgramInvocationException: Could
> >> not
> >>>>>>>> build up connection to JobManager.
> >>>>>>>>      at
> >> org.apache.flink.client.program.Client.run(Client.java:327)
> >>>>>>>>      at
> >> org.apache.flink.client.program.Client.run(Client.java:306)
> >>>>>>>>      at
> >> org.apache.flink.client.program.Client.run(Client.java:300)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
> >>>>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> >> Method)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>>>>>>      at java.lang.reflect.Method.invoke(Method.java:483)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
> >>>>>>>>      at
> >> org.apache.flink.client.program.Client.run(Client.java:250)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
> >>>>>>>>      at
> >>> org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
> >>>>>>>>      at
> >>>>>> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
> >>>>>>>> Caused by: java.io.IOException: JobManager at
> akka.tcp://flink@fe80
> >>>>>> :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
> >>>>>>>> not reachable. Please make sure that the JobManager is running and
> >>> its
> >>>>>> port
> >>>>>>>> is reachable.
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
> >>>>>>>>      at
> >> org.apache.flink.client.program.Client.run(Client.java:322)
> >>>>>>>>      ... 15 more
> >>>>>>>> Caused by: akka.actor.ActorNotFound: Actor not found for:
> >>>>>>>> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
> >>>>>>>>      at
> >> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> >>>>>>>>      at
> >>>>>>>>
> >>> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> >>>>>>>>      at
> >>>>>>>>
> akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
> >>>>>>>>      at
> >>> akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321)
> >>>>>>>>      at
> >>>>>>>>
> >>>>>>
> >>>
> >>
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952)
> >>>>>>>>      ... 20 more
> >>>>>>>>
> >>>>>>>> The exception above occurred while trying to run your command.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Client log doesn’t seem to show any info,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
> >>>>>>>>     - Unable to load native-hadoop library for your platform...
> >>> using
> >>>>>>>> builtin-java classes where applicable
> >>>>>>>> 21:06:01,935 INFO  org.apache.flink.api.java.ExecutionEnvironment
> >>>>>>>>    - The job has 0 registered types and 0 default Kryo serializers
> >>>>>>>> 21:06:02,857 INFO  akka.event.slf4j.Slf4jLogger
> >>>>>>>>    - Slf4jLogger started
> >>>>>>>> 21:06:02,909 INFO  Remoting
> >>>>>>>>    - Starting remoting
> >>>>>>>> 21:06:03,158 INFO  Remoting
> >>>>>>>>    - Remoting started; listening on addresses :[akka.tcp://
> >>>>>>>> flink@127.0.0.1:49463]
> >>>>>>
> >>>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Re: Could not build up connection to JobManager

Posted by Dulaj Viduranga <vi...@icloud.com>.

Hi,
I tested the update but it’s still the same. I think it isn’t a problem with my system because, I have an XAMPP server working totally fine (I tried with it is shut down as well) and also I doubly checked hosts files. I had little snitch installed but I also tried uninstalling it. 
Isn’t there a way around without using DNS to resolve localhost?

> On Mar 16, 2015, at 10:04 PM, Till Rohrmann <tr...@apache.org> wrote:
> 
> It is really strange. It's right that the CliFrontend now resolves
> localhost to the correct local address 10.218.100.122. Moreover, according
> to the logs, the JobManager is also started and binds to akka.tcp://
> flink@10.218.100.122:6123. According to the logs, this is also the address
> the CliFrontend uses to connect to the JobManager. If the timestamps are
> correct, then the JobManager was still alive when the job was sent. I don't
> really understand why this happens. Can it be that the CliFrontend which
> binds to 127.0.0.1 cannot communicate with 10.218.100.122? Can it be that
> you have some settings which prevent this? For the failing 127.0.0.1 case,
> it would be helpful to have access to the JobManager log.
> 
> I've updated the branch
> https://github.com/tillrohrmann/flink/tree/fixJobClient with a new fix for
> the "localhost" scenario. Could you try it out again? Thanks a lot for your
> help.
> 
> Best regards,
> 
> Till
> 
> On Mon, Mar 16, 2015 at 10:30 AM, Ufuk Celebi <uc...@apache.org> wrote:
> 
>> There was an issue for this:
>> https://issues.apache.org/jira/browse/FLINK-1634
>> 
>> Can we close it then?
>> 
>> On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga <vi...@icloud.com>
>> wrote:
>> 
>>> Hay Stephan,
>>> Great to know you could fix the issue. Thank you on the update.
>>> Best regards.
>>> 
>>>> On Mar 14, 2015, at 9:19 PM, Stephan Ewen <se...@apache.org> wrote:
>>>> 
>>>> Hey Dulaj!
>>>> 
>>>> Forget what I said in the previous email. The issue with the wrong
>>> address
>>>> binding seems to be solved now. There is another issue that the
>> embedded
>>>> taskmanager does not start properly, for whatever reason. My gut
>> feeling
>>> is
>>>> that there is something wrong
>>>> 
>>>> There is a patch pending that changes the startup behavior to debug
>> these
>>>> situations much easier. I'll ping you as soon as that is in...
>>>> 
>>>> 
>>>> Stephan
>>>> 
>>>> On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen <se...@apache.org>
>> wrote:
>>>> 
>>>>> Hey Dulaj!
>>>>> 
>>>>> One thing you can try is to add to the JVM startup options (in the
>>> scripts
>>>>> in the "bin" folder) the option "-Djava.net.preferIPv4Stack=true" and
>>> see
>>>>> if that helps it?
>>>>> 
>>>>> Stephan
>>>>> 
>>>>> 
>>>>> On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga <
>> vidura.me@icloud.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> Still this is no luck. I’ll upload the logs with configuration
>>>>>> “localhost" as well as “127.0.0.1” so you can take a look.
>>>>>> 
>>>>>> 127.0.0.1
>>>>>> flink-Vidura-flink-client-localhost.log <
>>>>>> 
>>> 
>> https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
>>>>>>> 
>>>>>> 
>>>>>> localhost
>>>>>> flink-Vidura-flink-client-localhost.log <
>>>>>> 
>>> 
>> https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
>>>>>>> 
>>>>>> flink-Vidura-jobmanager-localhost.log <
>>>>>> 
>>> 
>> https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
>>>>>>> 
>>>>>> 
>>>>>>> On Mar 11, 2015, at 11:32 PM, Till Rohrmann <tr...@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi Dulaj,
>>>>>>> 
>>>>>>> sorry for my late response. It looks as if the JobClient tries to
>>>>>> connect
>>>>>>> to the JobManager using its IPv6 instead of IPv4. Akka is really
>> picky
>>>>>> when
>>>>>>> it comes to remote address. If Akka binds to the FQDN, then other
>>>>>>> ActorSystem which try to connect to it using its IP address won't be
>>>>>>> successful. I assume that this might be a problem. I tried to fix
>> it.
>>>>>> You
>>>>>>> can find it here [1]. Could you please try it out by starting a
>> local
>>>>>>> cluster with the start-local.sh script. If it fails, could you
>> please
>>>>>> send
>>>>>>> me all log files (client, jobmanager and taskmanager). Once we
>> figured
>>>>>> out
>>>>>>> why the JobCilent does not connect, we can try to tackle the
>>> BlobServer
>>>>>>> issue.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> Till
>>>>>>> 
>>>>>>> [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
>>>>>>> 
>>>>>>> On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga <
>> vidura.me@icloud.com
>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> The error message is,
>>>>>>>> 
>>>>>>>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
>>>>>>>>     - Unable to load native-hadoop library for your platform...
>>> using
>>>>>>>> builtin-java classes where applicable
>>>>>>>> org.apache.flink.client.program.ProgramInvocationException: Could
>> not
>>>>>>>> build up connection to JobManager.
>>>>>>>>      at
>> org.apache.flink.client.program.Client.run(Client.java:327)
>>>>>>>>      at
>> org.apache.flink.client.program.Client.run(Client.java:306)
>>>>>>>>      at
>> org.apache.flink.client.program.Client.run(Client.java:300)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>>>>>>>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>      at java.lang.reflect.Method.invoke(Method.java:483)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>>>>>>>>      at
>> org.apache.flink.client.program.Client.run(Client.java:250)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>>>>>>>>      at
>>> org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>>>>>>>>      at
>>>>>> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
>>>>>>>> Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80
>>>>>> :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
>>>>>>>> not reachable. Please make sure that the JobManager is running and
>>> its
>>>>>> port
>>>>>>>> is reachable.
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
>>>>>>>>      at
>> org.apache.flink.client.program.Client.run(Client.java:322)
>>>>>>>>      ... 15 more
>>>>>>>> Caused by: akka.actor.ActorNotFound: Actor not found for:
>>>>>>>> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
>>>>>>>>      at
>> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>>>>>>>>      at
>>>>>>>> 
>>> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>>>>>>>>      at
>>>>>>>> akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
>>>>>>>>      at
>>> akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321)
>>>>>>>>      at
>>>>>>>> 
>>>>>> 
>>> 
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952)
>>>>>>>>      ... 20 more
>>>>>>>> 
>>>>>>>> The exception above occurred while trying to run your command.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Client log doesn’t seem to show any info,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
>>>>>>>>     - Unable to load native-hadoop library for your platform...
>>> using
>>>>>>>> builtin-java classes where applicable
>>>>>>>> 21:06:01,935 INFO  org.apache.flink.api.java.ExecutionEnvironment
>>>>>>>>    - The job has 0 registered types and 0 default Kryo serializers
>>>>>>>> 21:06:02,857 INFO  akka.event.slf4j.Slf4jLogger
>>>>>>>>    - Slf4jLogger started
>>>>>>>> 21:06:02,909 INFO  Remoting
>>>>>>>>    - Starting remoting
>>>>>>>> 21:06:03,158 INFO  Remoting
>>>>>>>>    - Remoting started; listening on addresses :[akka.tcp://
>>>>>>>> flink@127.0.0.1:49463]
>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> 
>>

Re: Could not build up connection to JobManager

Posted by Till Rohrmann <tr...@apache.org>.

It is really strange. It's right that the CliFrontend now resolves
localhost to the correct local address 10.218.100.122. Moreover, according
to the logs, the JobManager is also started and binds to akka.tcp://
flink@10.218.100.122:6123. According to the logs, this is also the address
the CliFrontend uses to connect to the JobManager. If the timestamps are
correct, then the JobManager was still alive when the job was sent. I don't
really understand why this happens. Can it be that the CliFrontend which
binds to 127.0.0.1 cannot communicate with 10.218.100.122? Can it be that
you have some settings which prevent this? For the failing 127.0.0.1 case,
it would be helpful to have access to the JobManager log.

I've updated the branch
https://github.com/tillrohrmann/flink/tree/fixJobClient with a new fix for
the "localhost" scenario. Could you try it out again? Thanks a lot for your
help.

Best regards,

Till

On Mon, Mar 16, 2015 at 10:30 AM, Ufuk Celebi <uc...@apache.org> wrote:

> There was an issue for this:
> https://issues.apache.org/jira/browse/FLINK-1634
>
> Can we close it then?
>
> On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga <vi...@icloud.com>
> wrote:
>
> > Hay Stephan,
> > Great to know you could fix the issue. Thank you on the update.
> > Best regards.
> >
> > > On Mar 14, 2015, at 9:19 PM, Stephan Ewen <se...@apache.org> wrote:
> > >
> > > Hey Dulaj!
> > >
> > > Forget what I said in the previous email. The issue with the wrong
> > address
> > > binding seems to be solved now. There is another issue that the
> embedded
> > > taskmanager does not start properly, for whatever reason. My gut
> feeling
> > is
> > > that there is something wrong
> > >
> > > There is a patch pending that changes the startup behavior to debug
> these
> > > situations much easier. I'll ping you as soon as that is in...
> > >
> > >
> > > Stephan
> > >
> > > On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen <se...@apache.org>
> wrote:
> > >
> > >> Hey Dulaj!
> > >>
> > >> One thing you can try is to add to the JVM startup options (in the
> > scripts
> > >> in the "bin" folder) the option "-Djava.net.preferIPv4Stack=true" and
> > see
> > >> if that helps it?
> > >>
> > >> Stephan
> > >>
> > >>
> > >> On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga <
> vidura.me@icloud.com>
> > >> wrote:
> > >>
> > >>> Hi,
> > >>> Still this is no luck. I’ll upload the logs with configuration
> > >>> “localhost" as well as “127.0.0.1” so you can take a look.
> > >>>
> > >>> 127.0.0.1
> > >>> flink-Vidura-flink-client-localhost.log <
> > >>>
> >
> https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
> > >>>>
> > >>>
> > >>> localhost
> > >>> flink-Vidura-flink-client-localhost.log <
> > >>>
> >
> https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
> > >>>>
> > >>> flink-Vidura-jobmanager-localhost.log <
> > >>>
> >
> https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
> > >>>>
> > >>>
> > >>>> On Mar 11, 2015, at 11:32 PM, Till Rohrmann <tr...@apache.org>
> > >>> wrote:
> > >>>>
> > >>>> Hi Dulaj,
> > >>>>
> > >>>> sorry for my late response. It looks as if the JobClient tries to
> > >>> connect
> > >>>> to the JobManager using its IPv6 instead of IPv4. Akka is really
> picky
> > >>> when
> > >>>> it comes to remote address. If Akka binds to the FQDN, then other
> > >>>> ActorSystem which try to connect to it using its IP address won't be
> > >>>> successful. I assume that this might be a problem. I tried to fix
> it.
> > >>> You
> > >>>> can find it here [1]. Could you please try it out by starting a
> local
> > >>>> cluster with the start-local.sh script. If it fails, could you
> please
> > >>> send
> > >>>> me all log files (client, jobmanager and taskmanager). Once we
> figured
> > >>> out
> > >>>> why the JobCilent does not connect, we can try to tackle the
> > BlobServer
> > >>>> issue.
> > >>>>
> > >>>> Cheers,
> > >>>>
> > >>>> Till
> > >>>>
> > >>>> [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
> > >>>>
> > >>>> On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga <
> vidura.me@icloud.com
> > >
> > >>>> wrote:
> > >>>>
> > >>>>> Hi,
> > >>>>> The error message is,
> > >>>>>
> > >>>>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
> > >>>>>      - Unable to load native-hadoop library for your platform...
> > using
> > >>>>> builtin-java classes where applicable
> > >>>>> org.apache.flink.client.program.ProgramInvocationException: Could
> not
> > >>>>> build up connection to JobManager.
> > >>>>>       at
> org.apache.flink.client.program.Client.run(Client.java:327)
> > >>>>>       at
> org.apache.flink.client.program.Client.run(Client.java:306)
> > >>>>>       at
> org.apache.flink.client.program.Client.run(Client.java:300)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
> > >>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >>>>>       at java.lang.reflect.Method.invoke(Method.java:483)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
> > >>>>>       at
> org.apache.flink.client.program.Client.run(Client.java:250)
> > >>>>>       at
> > >>>>>
> > >>>
> > org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
> > >>>>>       at
> > org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
> > >>>>>       at
> > >>> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
> > >>>>> Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80
> > >>> :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
> > >>>>> not reachable. Please make sure that the JobManager is running and
> > its
> > >>> port
> > >>>>> is reachable.
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
> > >>>>>       at
> org.apache.flink.client.program.Client.run(Client.java:322)
> > >>>>>       ... 15 more
> > >>>>> Caused by: akka.actor.ActorNotFound: Actor not found for:
> > >>>>> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
> > >>>>>       at
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> > >>>>>       at
> > >>>>>
> > scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> > >>>>>       at
> > >>>>> akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
> > >>>>>       at
> > >>>>>
> > >>>
> > akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
> > >>>>>       at
> > >>>>>
> > >>>
> > scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
> > >>>>>       at
> > akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321)
> > >>>>>       at
> > >>>>>
> > >>>
> >
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952)
> > >>>>>       ... 20 more
> > >>>>>
> > >>>>> The exception above occurred while trying to run your command.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Client log doesn’t seem to show any info,
> > >>>>>
> > >>>>>
> > >>>>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
> > >>>>>      - Unable to load native-hadoop library for your platform...
> > using
> > >>>>> builtin-java classes where applicable
> > >>>>> 21:06:01,935 INFO  org.apache.flink.api.java.ExecutionEnvironment
> > >>>>>     - The job has 0 registered types and 0 default Kryo serializers
> > >>>>> 21:06:02,857 INFO  akka.event.slf4j.Slf4jLogger
> > >>>>>     - Slf4jLogger started
> > >>>>> 21:06:02,909 INFO  Remoting
> > >>>>>     - Starting remoting
> > >>>>> 21:06:03,158 INFO  Remoting
> > >>>>>     - Remoting started; listening on addresses :[akka.tcp://
> > >>>>> flink@127.0.0.1:49463]
> > >>>
> > >>>
> > >>
> >
> >
>

Re: Could not build up connection to JobManager

Posted by Ufuk Celebi <uc...@apache.org>.

There was an issue for this:
https://issues.apache.org/jira/browse/FLINK-1634

Can we close it then?

On Sat, Mar 14, 2015 at 9:16 PM, Dulaj Viduranga <vi...@icloud.com>
wrote:

> Hay Stephan,
> Great to know you could fix the issue. Thank you on the update.
> Best regards.
>
> > On Mar 14, 2015, at 9:19 PM, Stephan Ewen <se...@apache.org> wrote:
> >
> > Hey Dulaj!
> >
> > Forget what I said in the previous email. The issue with the wrong
> address
> > binding seems to be solved now. There is another issue that the embedded
> > taskmanager does not start properly, for whatever reason. My gut feeling
> is
> > that there is something wrong
> >
> > There is a patch pending that changes the startup behavior to debug these
> > situations much easier. I'll ping you as soon as that is in...
> >
> >
> > Stephan
> >
> > On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen <se...@apache.org> wrote:
> >
> >> Hey Dulaj!
> >>
> >> One thing you can try is to add to the JVM startup options (in the
> scripts
> >> in the "bin" folder) the option "-Djava.net.preferIPv4Stack=true" and
> see
> >> if that helps it?
> >>
> >> Stephan
> >>
> >>
> >> On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga <vi...@icloud.com>
> >> wrote:
> >>
> >>> Hi,
> >>> Still this is no luck. I’ll upload the logs with configuration
> >>> “localhost" as well as “127.0.0.1” so you can take a look.
> >>>
> >>> 127.0.0.1
> >>> flink-Vidura-flink-client-localhost.log <
> >>>
> https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
> >>>>
> >>>
> >>> localhost
> >>> flink-Vidura-flink-client-localhost.log <
> >>>
> https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
> >>>>
> >>> flink-Vidura-jobmanager-localhost.log <
> >>>
> https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
> >>>>
> >>>
> >>>> On Mar 11, 2015, at 11:32 PM, Till Rohrmann <tr...@apache.org>
> >>> wrote:
> >>>>
> >>>> Hi Dulaj,
> >>>>
> >>>> sorry for my late response. It looks as if the JobClient tries to
> >>> connect
> >>>> to the JobManager using its IPv6 instead of IPv4. Akka is really picky
> >>> when
> >>>> it comes to remote address. If Akka binds to the FQDN, then other
> >>>> ActorSystem which try to connect to it using its IP address won't be
> >>>> successful. I assume that this might be a problem. I tried to fix it.
> >>> You
> >>>> can find it here [1]. Could you please try it out by starting a local
> >>>> cluster with the start-local.sh script. If it fails, could you please
> >>> send
> >>>> me all log files (client, jobmanager and taskmanager). Once we figured
> >>> out
> >>>> why the JobCilent does not connect, we can try to tackle the
> BlobServer
> >>>> issue.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Till
> >>>>
> >>>> [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
> >>>>
> >>>> On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga <vidura.me@icloud.com
> >
> >>>> wrote:
> >>>>
> >>>>> Hi,
> >>>>> The error message is,
> >>>>>
> >>>>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
> >>>>>      - Unable to load native-hadoop library for your platform...
> using
> >>>>> builtin-java classes where applicable
> >>>>> org.apache.flink.client.program.ProgramInvocationException: Could not
> >>>>> build up connection to JobManager.
> >>>>>       at org.apache.flink.client.program.Client.run(Client.java:327)
> >>>>>       at org.apache.flink.client.program.Client.run(Client.java:306)
> >>>>>       at org.apache.flink.client.program.Client.run(Client.java:300)
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
> >>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>>>       at
> >>>>>
> >>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >>>>>       at
> >>>>>
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>>>       at java.lang.reflect.Method.invoke(Method.java:483)
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
> >>>>>       at org.apache.flink.client.program.Client.run(Client.java:250)
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
> >>>>>       at
> org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
> >>>>>       at
> >>> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
> >>>>> Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80
> >>> :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
> >>>>> not reachable. Please make sure that the JobManager is running and
> its
> >>> port
> >>>>> is reachable.
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
> >>>>>       at org.apache.flink.client.program.Client.run(Client.java:322)
> >>>>>       ... 15 more
> >>>>> Caused by: akka.actor.ActorNotFound: Actor not found for:
> >>>>> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
> >>>>>       at
> >>>>>
> >>>
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
> >>>>>       at
> >>>>>
> >>>
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
> >>>>>       at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >>>>>       at
> >>>>>
> >>>
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> >>>>>       at
> >>>>>
> >>>
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> >>>>>       at
> >>>>>
> >>>
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> >>>>>       at
> >>>>>
> >>>
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> >>>>>       at
> >>>>>
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> >>>>>       at
> >>>>> akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> >>>>>       at
> >>>>>
> >>>
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
> >>>>>       at
> >>>>>
> >>>
> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
> >>>>>       at
> >>>>>
> >>>
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
> >>>>>       at
> >>>>>
> >>>
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
> >>>>>       at
> >>>>>
> >>>
> scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
> >>>>>       at
> >>>>>
> >>>
> scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
> >>>>>       at
> akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321)
> >>>>>       at
> >>>>>
> >>>
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952)
> >>>>>       ... 20 more
> >>>>>
> >>>>> The exception above occurred while trying to run your command.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Client log doesn’t seem to show any info,
> >>>>>
> >>>>>
> >>>>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
> >>>>>      - Unable to load native-hadoop library for your platform...
> using
> >>>>> builtin-java classes where applicable
> >>>>> 21:06:01,935 INFO  org.apache.flink.api.java.ExecutionEnvironment
> >>>>>     - The job has 0 registered types and 0 default Kryo serializers
> >>>>> 21:06:02,857 INFO  akka.event.slf4j.Slf4jLogger
> >>>>>     - Slf4jLogger started
> >>>>> 21:06:02,909 INFO  Remoting
> >>>>>     - Starting remoting
> >>>>> 21:06:03,158 INFO  Remoting
> >>>>>     - Remoting started; listening on addresses :[akka.tcp://
> >>>>> flink@127.0.0.1:49463]
> >>>
> >>>
> >>
>
>

Re: Could not build up connection to JobManager

Posted by Dulaj Viduranga <vi...@icloud.com>.

Hay Stephan,
Great to know you could fix the issue. Thank you on the update.
Best regards.

> On Mar 14, 2015, at 9:19 PM, Stephan Ewen <se...@apache.org> wrote:
> 
> Hey Dulaj!
> 
> Forget what I said in the previous email. The issue with the wrong address
> binding seems to be solved now. There is another issue that the embedded
> taskmanager does not start properly, for whatever reason. My gut feeling is
> that there is something wrong
> 
> There is a patch pending that changes the startup behavior to debug these
> situations much easier. I'll ping you as soon as that is in...
> 
> 
> Stephan
> 
> On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen <se...@apache.org> wrote:
> 
>> Hey Dulaj!
>> 
>> One thing you can try is to add to the JVM startup options (in the scripts
>> in the "bin" folder) the option "-Djava.net.preferIPv4Stack=true" and see
>> if that helps it?
>> 
>> Stephan
>> 
>> 
>> On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga <vi...@icloud.com>
>> wrote:
>> 
>>> Hi,
>>> Still this is no luck. I’ll upload the logs with configuration
>>> “localhost" as well as “127.0.0.1” so you can take a look.
>>> 
>>> 127.0.0.1
>>> flink-Vidura-flink-client-localhost.log <
>>> https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
>>>> 
>>> 
>>> localhost
>>> flink-Vidura-flink-client-localhost.log <
>>> https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
>>>> 
>>> flink-Vidura-jobmanager-localhost.log <
>>> https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
>>>> 
>>> 
>>>> On Mar 11, 2015, at 11:32 PM, Till Rohrmann <tr...@apache.org>
>>> wrote:
>>>> 
>>>> Hi Dulaj,
>>>> 
>>>> sorry for my late response. It looks as if the JobClient tries to
>>> connect
>>>> to the JobManager using its IPv6 instead of IPv4. Akka is really picky
>>> when
>>>> it comes to remote address. If Akka binds to the FQDN, then other
>>>> ActorSystem which try to connect to it using its IP address won't be
>>>> successful. I assume that this might be a problem. I tried to fix it.
>>> You
>>>> can find it here [1]. Could you please try it out by starting a local
>>>> cluster with the start-local.sh script. If it fails, could you please
>>> send
>>>> me all log files (client, jobmanager and taskmanager). Once we figured
>>> out
>>>> why the JobCilent does not connect, we can try to tackle the BlobServer
>>>> issue.
>>>> 
>>>> Cheers,
>>>> 
>>>> Till
>>>> 
>>>> [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
>>>> 
>>>> On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga <vi...@icloud.com>
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> The error message is,
>>>>> 
>>>>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
>>>>>      - Unable to load native-hadoop library for your platform... using
>>>>> builtin-java classes where applicable
>>>>> org.apache.flink.client.program.ProgramInvocationException: Could not
>>>>> build up connection to JobManager.
>>>>>       at org.apache.flink.client.program.Client.run(Client.java:327)
>>>>>       at org.apache.flink.client.program.Client.run(Client.java:306)
>>>>>       at org.apache.flink.client.program.Client.run(Client.java:300)
>>>>>       at
>>>>> 
>>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>>>>>       at
>>>>> 
>>> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>       at
>>>>> 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>       at
>>>>> 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>       at java.lang.reflect.Method.invoke(Method.java:483)
>>>>>       at
>>>>> 
>>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>>>>>       at
>>>>> 
>>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>>>>>       at org.apache.flink.client.program.Client.run(Client.java:250)
>>>>>       at
>>>>> 
>>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>>>>>       at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>>>>>       at
>>>>> 
>>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>>>>>       at
>>> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
>>>>> Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80
>>> :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
>>>>> not reachable. Please make sure that the JobManager is running and its
>>> port
>>>>> is reachable.
>>>>>       at
>>>>> 
>>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
>>>>>       at
>>>>> 
>>> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
>>>>>       at
>>>>> 
>>> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
>>>>>       at
>>>>> 
>>> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
>>>>>       at
>>>>> 
>>> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
>>>>>       at org.apache.flink.client.program.Client.run(Client.java:322)
>>>>>       ... 15 more
>>>>> Caused by: akka.actor.ActorNotFound: Actor not found for:
>>>>> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
>>>>>       at
>>>>> 
>>> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
>>>>>       at
>>>>> 
>>> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
>>>>>       at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>>>>>       at
>>>>> 
>>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>>>>>       at
>>>>> 
>>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>>>>>       at
>>>>> 
>>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>>>>>       at
>>>>> 
>>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>>>>>       at
>>>>> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>>>>>       at
>>>>> akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>>>>>       at
>>>>> 
>>> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
>>>>>       at
>>>>> 
>>> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
>>>>>       at
>>>>> 
>>> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
>>>>>       at
>>>>> 
>>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>>>>>       at
>>>>> 
>>> scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
>>>>>       at
>>>>> 
>>> scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
>>>>>       at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
>>>>>       at
>>>>> 
>>> org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321)
>>>>>       at
>>>>> 
>>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952)
>>>>>       ... 20 more
>>>>> 
>>>>> The exception above occurred while trying to run your command.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Client log doesn’t seem to show any info,
>>>>> 
>>>>> 
>>>>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
>>>>>      - Unable to load native-hadoop library for your platform... using
>>>>> builtin-java classes where applicable
>>>>> 21:06:01,935 INFO  org.apache.flink.api.java.ExecutionEnvironment
>>>>>     - The job has 0 registered types and 0 default Kryo serializers
>>>>> 21:06:02,857 INFO  akka.event.slf4j.Slf4jLogger
>>>>>     - Slf4jLogger started
>>>>> 21:06:02,909 INFO  Remoting
>>>>>     - Starting remoting
>>>>> 21:06:03,158 INFO  Remoting
>>>>>     - Remoting started; listening on addresses :[akka.tcp://
>>>>> flink@127.0.0.1:49463]
>>> 
>>> 
>>

Re: Could not build up connection to JobManager

Posted by Stephan Ewen <se...@apache.org>.

Hey Dulaj!

Forget what I said in the previous email. The issue with the wrong address
binding seems to be solved now. There is another issue that the embedded
taskmanager does not start properly, for whatever reason. My gut feeling is
that there is something wrong

There is a patch pending that changes the startup behavior to debug these
situations much easier. I'll ping you as soon as that is in...


Stephan

On Sat, Mar 14, 2015 at 4:42 PM, Stephan Ewen <se...@apache.org> wrote:

> Hey Dulaj!
>
> One thing you can try is to add to the JVM startup options (in the scripts
> in the "bin" folder) the option "-Djava.net.preferIPv4Stack=true" and see
> if that helps it?
>
> Stephan
>
>
> On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga <vi...@icloud.com>
> wrote:
>
>> Hi,
>> Still this is no luck. I’ll upload the logs with configuration
>> “localhost" as well as “127.0.0.1” so you can take a look.
>>
>> 127.0.0.1
>> flink-Vidura-flink-client-localhost.log <
>> https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
>> >
>>
>> localhost
>> flink-Vidura-flink-client-localhost.log <
>> https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
>> >
>> flink-Vidura-jobmanager-localhost.log <
>> https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
>> >
>>
>> > On Mar 11, 2015, at 11:32 PM, Till Rohrmann <tr...@apache.org>
>> wrote:
>> >
>> > Hi Dulaj,
>> >
>> > sorry for my late response. It looks as if the JobClient tries to
>> connect
>> > to the JobManager using its IPv6 instead of IPv4. Akka is really picky
>> when
>> > it comes to remote address. If Akka binds to the FQDN, then other
>> > ActorSystem which try to connect to it using its IP address won't be
>> > successful. I assume that this might be a problem. I tried to fix it.
>> You
>> > can find it here [1]. Could you please try it out by starting a local
>> > cluster with the start-local.sh script. If it fails, could you please
>> send
>> > me all log files (client, jobmanager and taskmanager). Once we figured
>> out
>> > why the JobCilent does not connect, we can try to tackle the BlobServer
>> > issue.
>> >
>> > Cheers,
>> >
>> > Till
>> >
>> > [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
>> >
>> > On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga <vi...@icloud.com>
>> > wrote:
>> >
>> >> Hi,
>> >> The error message is,
>> >>
>> >> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
>> >>       - Unable to load native-hadoop library for your platform... using
>> >> builtin-java classes where applicable
>> >> org.apache.flink.client.program.ProgramInvocationException: Could not
>> >> build up connection to JobManager.
>> >>        at org.apache.flink.client.program.Client.run(Client.java:327)
>> >>        at org.apache.flink.client.program.Client.run(Client.java:306)
>> >>        at org.apache.flink.client.program.Client.run(Client.java:300)
>> >>        at
>> >>
>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>> >>        at
>> >>
>> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>> >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >>        at
>> >>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >>        at
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >>        at java.lang.reflect.Method.invoke(Method.java:483)
>> >>        at
>> >>
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>> >>        at
>> >>
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>> >>        at org.apache.flink.client.program.Client.run(Client.java:250)
>> >>        at
>> >>
>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>> >>        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>> >>        at
>> >>
>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>> >>        at
>> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
>> >> Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80
>> :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
>> >> not reachable. Please make sure that the JobManager is running and its
>> port
>> >> is reachable.
>> >>        at
>> >>
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
>> >>        at
>> >>
>> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
>> >>        at
>> >>
>> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
>> >>        at
>> >>
>> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
>> >>        at
>> >>
>> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
>> >>        at org.apache.flink.client.program.Client.run(Client.java:322)
>> >>        ... 15 more
>> >> Caused by: akka.actor.ActorNotFound: Actor not found for:
>> >> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
>> >>        at
>> >>
>> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
>> >>        at
>> >>
>> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
>> >>        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>> >>        at
>> >>
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>> >>        at
>> >>
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>> >>        at
>> >>
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>> >>        at
>> >>
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>> >>        at
>> >> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>> >>        at
>> >> akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>> >>        at
>> >>
>> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
>> >>        at
>> >>
>> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
>> >>        at
>> >>
>> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
>> >>        at
>> >>
>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>> >>        at
>> >>
>> scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
>> >>        at
>> >>
>> scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
>> >>        at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
>> >>        at
>> >>
>> org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321)
>> >>        at
>> >>
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952)
>> >>        ... 20 more
>> >>
>> >> The exception above occurred while trying to run your command.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Client log doesn’t seem to show any info,
>> >>
>> >>
>> >> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
>> >>       - Unable to load native-hadoop library for your platform... using
>> >> builtin-java classes where applicable
>> >> 21:06:01,935 INFO  org.apache.flink.api.java.ExecutionEnvironment
>> >>      - The job has 0 registered types and 0 default Kryo serializers
>> >> 21:06:02,857 INFO  akka.event.slf4j.Slf4jLogger
>> >>      - Slf4jLogger started
>> >> 21:06:02,909 INFO  Remoting
>> >>      - Starting remoting
>> >> 21:06:03,158 INFO  Remoting
>> >>      - Remoting started; listening on addresses :[akka.tcp://
>> >> flink@127.0.0.1:49463]
>>
>>
>

Re: Could not build up connection to JobManager

Posted by Stephan Ewen <se...@apache.org>.

Hey Dulaj!

One thing you can try is to add to the JVM startup options (in the scripts
in the "bin" folder) the option "-Djava.net.preferIPv4Stack=true" and see
if that helps it?

Stephan


On Sat, Mar 14, 2015 at 4:29 AM, Dulaj Viduranga <vi...@icloud.com>
wrote:

> Hi,
> Still this is no luck. I’ll upload the logs with configuration “localhost"
> as well as “127.0.0.1” so you can take a look.
>
> 127.0.0.1
> flink-Vidura-flink-client-localhost.log <
> https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log
> >
>
> localhost
> flink-Vidura-flink-client-localhost.log <
> https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log
> >
> flink-Vidura-jobmanager-localhost.log <
> https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log
> >
>
> > On Mar 11, 2015, at 11:32 PM, Till Rohrmann <tr...@apache.org>
> wrote:
> >
> > Hi Dulaj,
> >
> > sorry for my late response. It looks as if the JobClient tries to connect
> > to the JobManager using its IPv6 instead of IPv4. Akka is really picky
> when
> > it comes to remote address. If Akka binds to the FQDN, then other
> > ActorSystem which try to connect to it using its IP address won't be
> > successful. I assume that this might be a problem. I tried to fix it. You
> > can find it here [1]. Could you please try it out by starting a local
> > cluster with the start-local.sh script. If it fails, could you please
> send
> > me all log files (client, jobmanager and taskmanager). Once we figured
> out
> > why the JobCilent does not connect, we can try to tackle the BlobServer
> > issue.
> >
> > Cheers,
> >
> > Till
> >
> > [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
> >
> > On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga <vi...@icloud.com>
> > wrote:
> >
> >> Hi,
> >> The error message is,
> >>
> >> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
> >>       - Unable to load native-hadoop library for your platform... using
> >> builtin-java classes where applicable
> >> org.apache.flink.client.program.ProgramInvocationException: Could not
> >> build up connection to JobManager.
> >>        at org.apache.flink.client.program.Client.run(Client.java:327)
> >>        at org.apache.flink.client.program.Client.run(Client.java:306)
> >>        at org.apache.flink.client.program.Client.run(Client.java:300)
> >>        at
> >>
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
> >>        at
> >>
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
> >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>        at
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >>        at
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>        at java.lang.reflect.Method.invoke(Method.java:483)
> >>        at
> >>
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
> >>        at
> >>
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
> >>        at org.apache.flink.client.program.Client.run(Client.java:250)
> >>        at
> >> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
> >>        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
> >>        at
> >>
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
> >>        at
> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
> >> Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80
> :0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
> >> not reachable. Please make sure that the JobManager is running and its
> port
> >> is reachable.
> >>        at
> >>
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
> >>        at
> >>
> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
> >>        at
> >>
> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
> >>        at
> >>
> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
> >>        at
> >>
> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
> >>        at org.apache.flink.client.program.Client.run(Client.java:322)
> >>        ... 15 more
> >> Caused by: akka.actor.ActorNotFound: Actor not found for:
> >> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
> >>        at
> >>
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
> >>        at
> >>
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
> >>        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> >>        at
> >>
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> >>        at
> >>
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> >>        at
> >>
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> >>        at
> >>
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
> >>        at
> >> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
> >>        at
> >> akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
> >>        at
> >>
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
> >>        at
> >> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
> >>        at
> >>
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
> >>        at
> >>
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
> >>        at
> >>
> scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
> >>        at
> >>
> scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
> >>        at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
> >>        at
> >>
> org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321)
> >>        at
> >>
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952)
> >>        ... 20 more
> >>
> >> The exception above occurred while trying to run your command.
> >>
> >>
> >>
> >>
> >>
> >> Client log doesn’t seem to show any info,
> >>
> >>
> >> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
> >>       - Unable to load native-hadoop library for your platform... using
> >> builtin-java classes where applicable
> >> 21:06:01,935 INFO  org.apache.flink.api.java.ExecutionEnvironment
> >>      - The job has 0 registered types and 0 default Kryo serializers
> >> 21:06:02,857 INFO  akka.event.slf4j.Slf4jLogger
> >>      - Slf4jLogger started
> >> 21:06:02,909 INFO  Remoting
> >>      - Starting remoting
> >> 21:06:03,158 INFO  Remoting
> >>      - Remoting started; listening on addresses :[akka.tcp://
> >> flink@127.0.0.1:49463]
>
>

Re: Could not build up connection to JobManager

Posted by Dulaj Viduranga <vi...@icloud.com>.

Hi,
Still this is no luck. I’ll upload the logs with configuration “localhost" as well as “127.0.0.1” so you can take a look.

127.0.0.1
flink-Vidura-flink-client-localhost.log <https://gist.github.com/viduranga/1d01149eee238158519e#file-flink-vidura-flink-client-localhost-log>

localhost
flink-Vidura-flink-client-localhost.log <https://gist.github.com/viduranga/d866c24c0ba566abab17#file-flink-vidura-flink-client-localhost-log>
flink-Vidura-jobmanager-localhost.log <https://gist.github.com/viduranga/e7549ef818c6a2af73e9#file-flink-vidura-jobmanager-localhost-log>

> On Mar 11, 2015, at 11:32 PM, Till Rohrmann <tr...@apache.org> wrote:
> 
> Hi Dulaj,
> 
> sorry for my late response. It looks as if the JobClient tries to connect
> to the JobManager using its IPv6 instead of IPv4. Akka is really picky when
> it comes to remote address. If Akka binds to the FQDN, then other
> ActorSystem which try to connect to it using its IP address won't be
> successful. I assume that this might be a problem. I tried to fix it. You
> can find it here [1]. Could you please try it out by starting a local
> cluster with the start-local.sh script. If it fails, could you please send
> me all log files (client, jobmanager and taskmanager). Once we figured out
> why the JobCilent does not connect, we can try to tackle the BlobServer
> issue.
> 
> Cheers,
> 
> Till
> 
> [1] https://github.com/tillrohrmann/flink/tree/fixJobClient
> 
> On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga <vi...@icloud.com>
> wrote:
> 
>> Hi,
>> The error message is,
>> 
>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
>>       - Unable to load native-hadoop library for your platform... using
>> builtin-java classes where applicable
>> org.apache.flink.client.program.ProgramInvocationException: Could not
>> build up connection to JobManager.
>>        at org.apache.flink.client.program.Client.run(Client.java:327)
>>        at org.apache.flink.client.program.Client.run(Client.java:306)
>>        at org.apache.flink.client.program.Client.run(Client.java:300)
>>        at
>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>>        at
>> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>        at java.lang.reflect.Method.invoke(Method.java:483)
>>        at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>>        at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>>        at org.apache.flink.client.program.Client.run(Client.java:250)
>>        at
>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>>        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>>        at
>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>>        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
>> Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80:0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
>> not reachable. Please make sure that the JobManager is running and its port
>> is reachable.
>>        at
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
>>        at
>> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
>>        at
>> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
>>        at
>> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
>>        at
>> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
>>        at org.apache.flink.client.program.Client.run(Client.java:322)
>>        ... 15 more
>> Caused by: akka.actor.ActorNotFound: Actor not found for:
>> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
>>        at
>> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
>>        at
>> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
>>        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>>        at
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>>        at
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>>        at
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>>        at
>> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>>        at
>> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>>        at
>> akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>>        at
>> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
>>        at
>> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
>>        at
>> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
>>        at
>> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>>        at
>> scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
>>        at
>> scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
>>        at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
>>        at
>> org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321)
>>        at
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952)
>>        ... 20 more
>> 
>> The exception above occurred while trying to run your command.
>> 
>> 
>> 
>> 
>> 
>> Client log doesn’t seem to show any info,
>> 
>> 
>> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
>>       - Unable to load native-hadoop library for your platform... using
>> builtin-java classes where applicable
>> 21:06:01,935 INFO  org.apache.flink.api.java.ExecutionEnvironment
>>      - The job has 0 registered types and 0 default Kryo serializers
>> 21:06:02,857 INFO  akka.event.slf4j.Slf4jLogger
>>      - Slf4jLogger started
>> 21:06:02,909 INFO  Remoting
>>      - Starting remoting
>> 21:06:03,158 INFO  Remoting
>>      - Remoting started; listening on addresses :[akka.tcp://
>> flink@127.0.0.1:49463]

Re: Could not build up connection to JobManager

Posted by Till Rohrmann <tr...@apache.org>.

Hi Dulaj,

sorry for my late response. It looks as if the JobClient tries to connect
to the JobManager using its IPv6 instead of IPv4. Akka is really picky when
it comes to remote address. If Akka binds to the FQDN, then other
ActorSystem which try to connect to it using its IP address won't be
successful. I assume that this might be a problem. I tried to fix it. You
can find it here [1]. Could you please try it out by starting a local
cluster with the start-local.sh script. If it fails, could you please send
me all log files (client, jobmanager and taskmanager). Once we figured out
why the JobCilent does not connect, we can try to tackle the BlobServer
issue.

Cheers,

Till

[1] https://github.com/tillrohrmann/flink/tree/fixJobClient

On Thu, Mar 5, 2015 at 4:40 PM, Dulaj Viduranga <vi...@icloud.com>
wrote:

> Hi,
> The error message is,
>
> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
>        - Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> org.apache.flink.client.program.ProgramInvocationException: Could not
> build up connection to JobManager.
>         at org.apache.flink.client.program.Client.run(Client.java:327)
>         at org.apache.flink.client.program.Client.run(Client.java:306)
>         at org.apache.flink.client.program.Client.run(Client.java:300)
>         at
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>         at
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:483)
>         at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>         at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>         at org.apache.flink.client.program.Client.run(Client.java:250)
>         at
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>         at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
> Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80:0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager
> not reachable. Please make sure that the JobManager is running and its port
> is reachable.
>         at
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
>         at
> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
>         at
> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
>         at
> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
>         at
> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
>         at org.apache.flink.client.program.Client.run(Client.java:322)
>         ... 15 more
> Caused by: akka.actor.ActorNotFound: Actor not found for:
> ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
>         at
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
>         at
> akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
>         at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>         at
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>         at
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>         at
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>         at
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>         at
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>         at
> akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>         at
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
>         at
> akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
>         at
> akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
>         at
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
>         at
> scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
>         at
> scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
>         at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
>         at
> org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321)
>         at
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952)
>         ... 20 more
>
> The exception above occurred while trying to run your command.
>
>
>
>
>
> Client log doesn’t seem to show any info,
>
>
> 21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader
>        - Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> 21:06:01,935 INFO  org.apache.flink.api.java.ExecutionEnvironment
>       - The job has 0 registered types and 0 default Kryo serializers
> 21:06:02,857 INFO  akka.event.slf4j.Slf4jLogger
>       - Slf4jLogger started
> 21:06:02,909 INFO  Remoting
>       - Starting remoting
> 21:06:03,158 INFO  Remoting
>       - Remoting started; listening on addresses :[akka.tcp://
> flink@127.0.0.1:49463]

Re: Could not build up connection to JobManager

Posted by Dulaj Viduranga <vi...@icloud.com>.

Hi,
The error message is,

21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager.
	at org.apache.flink.client.program.Client.run(Client.java:327)
	at org.apache.flink.client.program.Client.run(Client.java:306)
	at org.apache.flink.client.program.Client.run(Client.java:300)
	at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
	at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
	at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
	at org.apache.flink.client.program.Client.run(Client.java:250)
	at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
	at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
	at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
	at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
Caused by: java.io.IOException: JobManager at akka.tcp://flink@fe80:0:0:0:742b:7f78:fab5:68e2%11:6123/user/jobmanager not reachable. Please make sure that the JobManager is running and its port is reachable.
	at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:957)
	at org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
	at org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
	at org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
	at org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
	at org.apache.flink.client.program.Client.run(Client.java:322)
	... 15 more
Caused by: akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka://flink/deadLetters), Path(/)]
	at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:65)
	at akka.actor.ActorSelection$$anonfun$resolveOne$1.apply(ActorSelection.scala:63)
	at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
	at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
	at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
	at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
	at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
	at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
	at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.unbatchedExecute(Future.scala:74)
	at akka.dispatch.BatchingExecutor$class.execute(BatchingExecutor.scala:110)
	at akka.dispatch.ExecutionContexts$sameThreadExecutionContext$.execute(Future.scala:73)
	at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
	at scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
	at scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
	at akka.actor.ActorSelection.resolveOne(ActorSelection.scala:63)
	at org.apache.flink.runtime.akka.AkkaUtils$.getReference(AkkaUtils.scala:321)
	at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:952)
	... 20 more

The exception above occurred while trying to run your command.





Client log doesn’t seem to show any info,


21:06:01,521 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21:06:01,935 INFO  org.apache.flink.api.java.ExecutionEnvironment                - The job has 0 registered types and 0 default Kryo serializers
21:06:02,857 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
21:06:02,909 INFO  Remoting                                                      - Starting remoting
21:06:03,158 INFO  Remoting                                                      - Remoting started; listening on addresses :[akka.tcp://flink@127.0.0.1:49463]

Re: Could not build up connection to JobManager

Posted by Till Rohrmann <tr...@apache.org>.

What was the error when you tried to submit the job with "localhost"? The
client log would be very helpful to understand the problem.

Actually, the "localhost" in the code is only the fallback strategy if the
job manager address could not be retrieved differently.

On Thu, Mar 5, 2015 at 3:25 PM, Dulaj Viduranga <vi...@icloud.com>
wrote:

> Glad I could help any way. :)
> When the address is set to “localhost” I cannot submit a job. It
> immediately fails. But the address is “127.0.0.1”, it is stuck a little
> whyle on DEPLOYING and the fails.
> Correct me if I’m wrong but I think since using the address, hardcoded in
> config file, won’t harm anything, it will be safer to use it rather than
> defining it in the code.
>
> > On Mar 5, 2015, at 6:57 PM, Till Rohrmann <tr...@apache.org> wrote:
> >
> > Could you submit a job when you set the job manager address to
> "localhost"?
> > I did not see any logging statements of received jobs. If you did, could
> > you also send the logs of the client?
> >
> > The 0.0.0.0 to which the BlobServer binds works for me on my machine. I
> > cannot remember that we had problems with that before. But I agree, we
> > should set it to the network interface which the JobManager uses.
> >
> > I cannot explain why your fix solves the problem. It does not touch any
> of
> > the JobClient/JobManager logic.
> >
> > I updated my local branch [1] with a fix for the BlobServer. Could you
> try
> > it out again and send us the logs? Thanks a lot for your help Dulaj.
> >
> > On Thu, Mar 5, 2015 at 1:24 PM, Dulaj Viduranga <vi...@icloud.com>
> > wrote:
> >
> >> But can you explain why did my fix solved it?
> >>
> >>> On Mar 5, 2015, at 5:50 PM, Stephan Ewen <se...@apache.org> wrote:
> >>>
> >>> Hi Dulaj!
> >>>
> >>> Okay, the logs give us some insight. Both setups seem to look good in
> >> terms
> >>> of TaskManager and JobManager startup.
> >>>
> >>> In one of the logs (127.0.0.1) you submit a job. The job fails because
> >> the
> >>> TaskManager cannot grab the JAR file from the JobManager.
> >>> I think the problem is that the BLOB server binds to 0.0.0.0 - it
> should
> >>> bind to the same address as the JobManager actor system.
> >>>
> >>> That should definitely be changed...
> >>>
> >>> On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga <vidura.me@icloud.com
> >
> >>> wrote:
> >>>
> >>>> Hi,
> >>>> This is the log with setting “localhost”
> >>>> flink-Vidura-jobmanager-localhost.log <
> >>>>
> >>
> https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log
> >>>>>
> >>>>
> >>>> And this is the log with setting “127.0.0.1”
> >>>> flink-Vidura-jobmanager-localhost.log <
> >>>>
> >>
> https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log
> >>>>>
> >>>>
> >>>>> On Mar 5, 2015, at 2:23 PM, Till Rohrmann <tr...@apache.org>
> >> wrote:
> >>>>>
> >>>>> What does the jobmanager log says? I think Stephan added some more
> >>>> logging
> >>>>> output which helps us to debug this problem.
> >>>>>
> >>>>> On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga <
> vidura.me@icloud.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Using start-locat.sh.
> >>>>>> I’m using the original config yaml. I also tried changing jobmanager
> >>>>>> address in config to “127.0.0.1 but no luck. With my changes it
> works
> >>>> ok.
> >>>>>> The conf file follows.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>
> ################################################################################
> >>>>>> #  Licensed to the Apache Software Foundation (ASF) under one
> >>>>>> #  or more contributor license agreements.  See the NOTICE file
> >>>>>> #  distributed with this work for additional information
> >>>>>> #  regarding copyright ownership.  The ASF licenses this file
> >>>>>> #  to you under the Apache License, Version 2.0 (the
> >>>>>> #  "License"); you may not use this file except in compliance
> >>>>>> #  with the License.  You may obtain a copy of the License at
> >>>>>> #
> >>>>>> #      http://www.apache.org/licenses/LICENSE-2.0
> >>>>>> #
> >>>>>> #  Unless required by applicable law or agreed to in writing,
> software
> >>>>>> #  distributed under the License is distributed on an "AS IS" BASIS,
> >>>>>> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> >>>> implied.
> >>>>>> #  See the License for the specific language governing permissions
> and
> >>>>>> # limitations under the License.
> >>>>>>
> >>>>>>
> >>>>
> >>
> ################################################################################
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>
> #==============================================================================
> >>>>>> # Common
> >>>>>>
> >>>>>>
> >>>>
> >>
> #==============================================================================
> >>>>>>
> >>>>>> jobmanager.rpc.address: 127.0.0.1
> >>>>>>
> >>>>>> jobmanager.rpc.port: 6123
> >>>>>>
> >>>>>> jobmanager.heap.mb: 256
> >>>>>>
> >>>>>> taskmanager.heap.mb: 512
> >>>>>>
> >>>>>> taskmanager.numberOfTaskSlots: 1
> >>>>>>
> >>>>>> parallelization.degree.default: 1
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>
> #==============================================================================
> >>>>>> # Web Frontend
> >>>>>>
> >>>>>>
> >>>>
> >>
> #==============================================================================
> >>>>>>
> >>>>>> # The port under which the web-based runtime monitor listens.
> >>>>>> # A value of -1 deactivates the web server.
> >>>>>>
> >>>>>> jobmanager.web.port: 8081
> >>>>>>
> >>>>>> # The port uder which the standalone web client
> >>>>>> # (for job upload and submit) listens.
> >>>>>>
> >>>>>> webclient.port: 8080
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>
> #==============================================================================
> >>>>>> # Advanced
> >>>>>>
> >>>>>>
> >>>>
> >>
> #==============================================================================
> >>>>>>
> >>>>>> # The number of buffers for the network stack.
> >>>>>> #
> >>>>>> # taskmanager.network.numberOfBuffers: 2048
> >>>>>>
> >>>>>> # Directories for temporary files.
> >>>>>> #
> >>>>>> # Add a delimited list for multiple directories, using the system
> >>>> directory
> >>>>>> # delimiter (colon ':' on unix) or a comma, e.g.:
> >>>>>> #     /data1/tmp:/data2/tmp:/data3/tmp
> >>>>>> #
> >>>>>> # Note: Each directory entry is read from and written to by a
> >> different
> >>>> I/O
> >>>>>> # thread. You can include the same directory multiple times in order
> >> to
> >>>>>> create
> >>>>>> # multiple I/O threads against that directory. This is for example
> >>>>>> relevant for
> >>>>>> # high-throughput RAIDs.
> >>>>>> #
> >>>>>> # If not specified, the system-specific Java temporary directory
> >>>>>> (java.io.tmpdir
> >>>>>> # property) is taken.
> >>>>>> #
> >>>>>> # taskmanager.tmp.dirs: /tmp
> >>>>>>
> >>>>>> # Path to the Hadoop configuration directory.
> >>>>>> #
> >>>>>> # This configuration is used when writing into HDFS. Unless
> specified
> >>>>>> otherwise,
> >>>>>> # HDFS file creation will use HDFS default settings with respect to
> >>>>>> block-size,
> >>>>>> # replication factor, etc.
> >>>>>> #
> >>>>>> # You can also directly specify the paths to hdfs-default.xml and
> >>>>>> hdfs-site.xml
> >>>>>> # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
> >>>>>> #
> >>>>>> # fs.hdfs.hadoopconf: /path/to/hadoop/conf/
> >>>>>>
> >>>>>>
> >>>>>>> On Mar 5, 2015, at 2:03 PM, Till Rohrmann <tr...@apache.org>
> >>>> wrote:
> >>>>>>>
> >>>>>>> How did you start the flink cluster? Using the start-local.sh, the
> >>>>>>> start-cluster.sh or starting the job manager and task managers
> >>>>>> individually
> >>>>>>> using taskmanager.sh/jobmanager.sh. Could you maybe post the
> >>>>>>> flink-conf.yaml file, you're using?
> >>>>>>>
> >>>>>>> With your changes, everything works, right?
> >>>>>>>
> >>>>>>> On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga <
> >> vidura.me@icloud.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Till,
> >>>>>>>> I’m sorry. It doesn’t seem to solve the problem. The taskmanager
> >> still
> >>>>>>>> tries a 10.0.0.0/8 IP.
> >>>>>>>>
> >>>>>>>> Best regards.
> >>>>>>>>
> >>>>>>>>> On Mar 5, 2015, at 1:00 PM, Till Rohrmann <
> till.rohrmann@gmail.com
> >>>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Dulaj,
> >>>>>>>>>
> >>>>>>>>> I looked through your commit and noticed that the JobClient might
> >> not
> >>>>>> be
> >>>>>>>>> listening on the right network interface. Your commit seems to
> fix
> >>>> it.
> >>>>>> I
> >>>>>>>>> just want to understand the problem properly and therefore I
> >> opened a
> >>>>>>>>> branch with a small change. Could you try out whether this change
> >>>> would
> >>>>>>>>> also fix your problem? You can find the code here [1]. Would be
> >>>> awesome
> >>>>>>>> if
> >>>>>>>>> you checked it out and let it run on your cluster setting.
> Thanks a
> >>>> lot
> >>>>>>>>> Dulaj!
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
> >>>>>>>>>
> >>>>>>>>> On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <
> >>>> vidura.me@icloud.com>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> The every change in the commit b7da22a is not required but I
> >> thought
> >>>>>>>> they
> >>>>>>>>>> are appropriate.
> >>>>>>>>>>
> >>>>>>>>>>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <
> >> vidura.me@icloud.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi,
> >>>>>>>>>>> I found many other places “localhost” is hard coded. I changed
> >> them
> >>>>>> in
> >>>>>>>> a
> >>>>>>>>>> better way I think. I made a pull request. Please review.
> b7da22a
> >> <
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org>
> >>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> If I recall correctly, we only hardcode "localhost" in the
> local
> >>>>>> mini
> >>>>>>>>>>>> cluster - do you think it is problematic there as well?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Have you found any other places?
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <
> >>>>>>>> vidura.me@icloud.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> In some places of the code, "localhost" is hard coded. When
> it
> >> is
> >>>>>>>>>> resolved
> >>>>>>>>>>>>> by the DNS, it is posible to be directed  to a different IP
> >> other
> >>>>>>>> than
> >>>>>>>>>>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those
> >>>> places
> >>>>>> to
> >>>>>>>>>>>>> 127.0.0.1 and it works like a charm.
> >>>>>>>>>>>>> But hard coding 127.0.0.1 is not a good option because when
> the
> >>>>>>>>>> jobmanager
> >>>>>>>>>>>>> ip is changed, this becomes an issue again. I'm thinking of
> >>>> setting
> >>>>>>>>>>>>> jobmanager ip from the config.yaml to these places.
> >>>>>>>>>>>>> If you have a better idea on doing this with your experience,
> >>>>>> please
> >>>>>>>>>> let
> >>>>>>>>>>>>> me know.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best.
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Could not build up connection to JobManager

Posted by Dulaj Viduranga <vi...@icloud.com>.

Glad I could help any way. :)
When the address is set to “localhost” I cannot submit a job. It immediately fails. But the address is “127.0.0.1”, it is stuck a little whyle on DEPLOYING and the fails.
Correct me if I’m wrong but I think since using the address, hardcoded in config file, won’t harm anything, it will be safer to use it rather than defining it in the code. 

> On Mar 5, 2015, at 6:57 PM, Till Rohrmann <tr...@apache.org> wrote:
> 
> Could you submit a job when you set the job manager address to "localhost"?
> I did not see any logging statements of received jobs. If you did, could
> you also send the logs of the client?
> 
> The 0.0.0.0 to which the BlobServer binds works for me on my machine. I
> cannot remember that we had problems with that before. But I agree, we
> should set it to the network interface which the JobManager uses.
> 
> I cannot explain why your fix solves the problem. It does not touch any of
> the JobClient/JobManager logic.
> 
> I updated my local branch [1] with a fix for the BlobServer. Could you try
> it out again and send us the logs? Thanks a lot for your help Dulaj.
> 
> On Thu, Mar 5, 2015 at 1:24 PM, Dulaj Viduranga <vi...@icloud.com>
> wrote:
> 
>> But can you explain why did my fix solved it?
>> 
>>> On Mar 5, 2015, at 5:50 PM, Stephan Ewen <se...@apache.org> wrote:
>>> 
>>> Hi Dulaj!
>>> 
>>> Okay, the logs give us some insight. Both setups seem to look good in
>> terms
>>> of TaskManager and JobManager startup.
>>> 
>>> In one of the logs (127.0.0.1) you submit a job. The job fails because
>> the
>>> TaskManager cannot grab the JAR file from the JobManager.
>>> I think the problem is that the BLOB server binds to 0.0.0.0 - it should
>>> bind to the same address as the JobManager actor system.
>>> 
>>> That should definitely be changed...
>>> 
>>> On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga <vi...@icloud.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> This is the log with setting “localhost”
>>>> flink-Vidura-jobmanager-localhost.log <
>>>> 
>> https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log
>>>>> 
>>>> 
>>>> And this is the log with setting “127.0.0.1”
>>>> flink-Vidura-jobmanager-localhost.log <
>>>> 
>> https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log
>>>>> 
>>>> 
>>>>> On Mar 5, 2015, at 2:23 PM, Till Rohrmann <tr...@apache.org>
>> wrote:
>>>>> 
>>>>> What does the jobmanager log says? I think Stephan added some more
>>>> logging
>>>>> output which helps us to debug this problem.
>>>>> 
>>>>> On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga <vi...@icloud.com>
>>>>> wrote:
>>>>> 
>>>>>> Using start-locat.sh.
>>>>>> I’m using the original config yaml. I also tried changing jobmanager
>>>>>> address in config to “127.0.0.1 but no luck. With my changes it works
>>>> ok.
>>>>>> The conf file follows.
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>> ################################################################################
>>>>>> #  Licensed to the Apache Software Foundation (ASF) under one
>>>>>> #  or more contributor license agreements.  See the NOTICE file
>>>>>> #  distributed with this work for additional information
>>>>>> #  regarding copyright ownership.  The ASF licenses this file
>>>>>> #  to you under the Apache License, Version 2.0 (the
>>>>>> #  "License"); you may not use this file except in compliance
>>>>>> #  with the License.  You may obtain a copy of the License at
>>>>>> #
>>>>>> #      http://www.apache.org/licenses/LICENSE-2.0
>>>>>> #
>>>>>> #  Unless required by applicable law or agreed to in writing, software
>>>>>> #  distributed under the License is distributed on an "AS IS" BASIS,
>>>>>> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>>>> implied.
>>>>>> #  See the License for the specific language governing permissions and
>>>>>> # limitations under the License.
>>>>>> 
>>>>>> 
>>>> 
>> ################################################################################
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>> #==============================================================================
>>>>>> # Common
>>>>>> 
>>>>>> 
>>>> 
>> #==============================================================================
>>>>>> 
>>>>>> jobmanager.rpc.address: 127.0.0.1
>>>>>> 
>>>>>> jobmanager.rpc.port: 6123
>>>>>> 
>>>>>> jobmanager.heap.mb: 256
>>>>>> 
>>>>>> taskmanager.heap.mb: 512
>>>>>> 
>>>>>> taskmanager.numberOfTaskSlots: 1
>>>>>> 
>>>>>> parallelization.degree.default: 1
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>> #==============================================================================
>>>>>> # Web Frontend
>>>>>> 
>>>>>> 
>>>> 
>> #==============================================================================
>>>>>> 
>>>>>> # The port under which the web-based runtime monitor listens.
>>>>>> # A value of -1 deactivates the web server.
>>>>>> 
>>>>>> jobmanager.web.port: 8081
>>>>>> 
>>>>>> # The port uder which the standalone web client
>>>>>> # (for job upload and submit) listens.
>>>>>> 
>>>>>> webclient.port: 8080
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>> #==============================================================================
>>>>>> # Advanced
>>>>>> 
>>>>>> 
>>>> 
>> #==============================================================================
>>>>>> 
>>>>>> # The number of buffers for the network stack.
>>>>>> #
>>>>>> # taskmanager.network.numberOfBuffers: 2048
>>>>>> 
>>>>>> # Directories for temporary files.
>>>>>> #
>>>>>> # Add a delimited list for multiple directories, using the system
>>>> directory
>>>>>> # delimiter (colon ':' on unix) or a comma, e.g.:
>>>>>> #     /data1/tmp:/data2/tmp:/data3/tmp
>>>>>> #
>>>>>> # Note: Each directory entry is read from and written to by a
>> different
>>>> I/O
>>>>>> # thread. You can include the same directory multiple times in order
>> to
>>>>>> create
>>>>>> # multiple I/O threads against that directory. This is for example
>>>>>> relevant for
>>>>>> # high-throughput RAIDs.
>>>>>> #
>>>>>> # If not specified, the system-specific Java temporary directory
>>>>>> (java.io.tmpdir
>>>>>> # property) is taken.
>>>>>> #
>>>>>> # taskmanager.tmp.dirs: /tmp
>>>>>> 
>>>>>> # Path to the Hadoop configuration directory.
>>>>>> #
>>>>>> # This configuration is used when writing into HDFS. Unless specified
>>>>>> otherwise,
>>>>>> # HDFS file creation will use HDFS default settings with respect to
>>>>>> block-size,
>>>>>> # replication factor, etc.
>>>>>> #
>>>>>> # You can also directly specify the paths to hdfs-default.xml and
>>>>>> hdfs-site.xml
>>>>>> # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
>>>>>> #
>>>>>> # fs.hdfs.hadoopconf: /path/to/hadoop/conf/
>>>>>> 
>>>>>> 
>>>>>>> On Mar 5, 2015, at 2:03 PM, Till Rohrmann <tr...@apache.org>
>>>> wrote:
>>>>>>> 
>>>>>>> How did you start the flink cluster? Using the start-local.sh, the
>>>>>>> start-cluster.sh or starting the job manager and task managers
>>>>>> individually
>>>>>>> using taskmanager.sh/jobmanager.sh. Could you maybe post the
>>>>>>> flink-conf.yaml file, you're using?
>>>>>>> 
>>>>>>> With your changes, everything works, right?
>>>>>>> 
>>>>>>> On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga <
>> vidura.me@icloud.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Till,
>>>>>>>> I’m sorry. It doesn’t seem to solve the problem. The taskmanager
>> still
>>>>>>>> tries a 10.0.0.0/8 IP.
>>>>>>>> 
>>>>>>>> Best regards.
>>>>>>>> 
>>>>>>>>> On Mar 5, 2015, at 1:00 PM, Till Rohrmann <till.rohrmann@gmail.com
>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Dulaj,
>>>>>>>>> 
>>>>>>>>> I looked through your commit and noticed that the JobClient might
>> not
>>>>>> be
>>>>>>>>> listening on the right network interface. Your commit seems to fix
>>>> it.
>>>>>> I
>>>>>>>>> just want to understand the problem properly and therefore I
>> opened a
>>>>>>>>> branch with a small change. Could you try out whether this change
>>>> would
>>>>>>>>> also fix your problem? You can find the code here [1]. Would be
>>>> awesome
>>>>>>>> if
>>>>>>>>> you checked it out and let it run on your cluster setting. Thanks a
>>>> lot
>>>>>>>>> Dulaj!
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
>>>>>>>>> 
>>>>>>>>> On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <
>>>> vidura.me@icloud.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> The every change in the commit b7da22a is not required but I
>> thought
>>>>>>>> they
>>>>>>>>>> are appropriate.
>>>>>>>>>> 
>>>>>>>>>>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <
>> vidura.me@icloud.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> I found many other places “localhost” is hard coded. I changed
>> them
>>>>>> in
>>>>>>>> a
>>>>>>>>>> better way I think. I made a pull request. Please review. b7da22a
>> <
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org>
>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> If I recall correctly, we only hardcode "localhost" in the local
>>>>>> mini
>>>>>>>>>>>> cluster - do you think it is problematic there as well?
>>>>>>>>>>>> 
>>>>>>>>>>>> Have you found any other places?
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <
>>>>>>>> vidura.me@icloud.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> In some places of the code, "localhost" is hard coded. When it
>> is
>>>>>>>>>> resolved
>>>>>>>>>>>>> by the DNS, it is posible to be directed  to a different IP
>> other
>>>>>>>> than
>>>>>>>>>>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those
>>>> places
>>>>>> to
>>>>>>>>>>>>> 127.0.0.1 and it works like a charm.
>>>>>>>>>>>>> But hard coding 127.0.0.1 is not a good option because when the
>>>>>>>>>> jobmanager
>>>>>>>>>>>>> ip is changed, this becomes an issue again. I'm thinking of
>>>> setting
>>>>>>>>>>>>> jobmanager ip from the config.yaml to these places.
>>>>>>>>>>>>> If you have a better idea on doing this with your experience,
>>>>>> please
>>>>>>>>>> let
>>>>>>>>>>>>> me know.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best.
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Could not build up connection to JobManager

Posted by Till Rohrmann <tr...@apache.org>.

Could you submit a job when you set the job manager address to "localhost"?
I did not see any logging statements of received jobs. If you did, could
you also send the logs of the client?

The 0.0.0.0 to which the BlobServer binds works for me on my machine. I
cannot remember that we had problems with that before. But I agree, we
should set it to the network interface which the JobManager uses.

I cannot explain why your fix solves the problem. It does not touch any of
the JobClient/JobManager logic.

I updated my local branch [1] with a fix for the BlobServer. Could you try
it out again and send us the logs? Thanks a lot for your help Dulaj.

On Thu, Mar 5, 2015 at 1:24 PM, Dulaj Viduranga <vi...@icloud.com>
wrote:

> But can you explain why did my fix solved it?
>
> > On Mar 5, 2015, at 5:50 PM, Stephan Ewen <se...@apache.org> wrote:
> >
> > Hi Dulaj!
> >
> > Okay, the logs give us some insight. Both setups seem to look good in
> terms
> > of TaskManager and JobManager startup.
> >
> > In one of the logs (127.0.0.1) you submit a job. The job fails because
> the
> > TaskManager cannot grab the JAR file from the JobManager.
> > I think the problem is that the BLOB server binds to 0.0.0.0 - it should
> > bind to the same address as the JobManager actor system.
> >
> > That should definitely be changed...
> >
> > On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga <vi...@icloud.com>
> > wrote:
> >
> >> Hi,
> >> This is the log with setting “localhost”
> >> flink-Vidura-jobmanager-localhost.log <
> >>
> https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log
> >>>
> >>
> >> And this is the log with setting “127.0.0.1”
> >> flink-Vidura-jobmanager-localhost.log <
> >>
> https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log
> >>>
> >>
> >>> On Mar 5, 2015, at 2:23 PM, Till Rohrmann <tr...@apache.org>
> wrote:
> >>>
> >>> What does the jobmanager log says? I think Stephan added some more
> >> logging
> >>> output which helps us to debug this problem.
> >>>
> >>> On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga <vi...@icloud.com>
> >>> wrote:
> >>>
> >>>> Using start-locat.sh.
> >>>> I’m using the original config yaml. I also tried changing jobmanager
> >>>> address in config to “127.0.0.1 but no luck. With my changes it works
> >> ok.
> >>>> The conf file follows.
> >>>>
> >>>>
> >>>>
> >>
> ################################################################################
> >>>> #  Licensed to the Apache Software Foundation (ASF) under one
> >>>> #  or more contributor license agreements.  See the NOTICE file
> >>>> #  distributed with this work for additional information
> >>>> #  regarding copyright ownership.  The ASF licenses this file
> >>>> #  to you under the Apache License, Version 2.0 (the
> >>>> #  "License"); you may not use this file except in compliance
> >>>> #  with the License.  You may obtain a copy of the License at
> >>>> #
> >>>> #      http://www.apache.org/licenses/LICENSE-2.0
> >>>> #
> >>>> #  Unless required by applicable law or agreed to in writing, software
> >>>> #  distributed under the License is distributed on an "AS IS" BASIS,
> >>>> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> >> implied.
> >>>> #  See the License for the specific language governing permissions and
> >>>> # limitations under the License.
> >>>>
> >>>>
> >>
> ################################################################################
> >>>>
> >>>>
> >>>>
> >>>>
> >>
> #==============================================================================
> >>>> # Common
> >>>>
> >>>>
> >>
> #==============================================================================
> >>>>
> >>>> jobmanager.rpc.address: 127.0.0.1
> >>>>
> >>>> jobmanager.rpc.port: 6123
> >>>>
> >>>> jobmanager.heap.mb: 256
> >>>>
> >>>> taskmanager.heap.mb: 512
> >>>>
> >>>> taskmanager.numberOfTaskSlots: 1
> >>>>
> >>>> parallelization.degree.default: 1
> >>>>
> >>>>
> >>>>
> >>
> #==============================================================================
> >>>> # Web Frontend
> >>>>
> >>>>
> >>
> #==============================================================================
> >>>>
> >>>> # The port under which the web-based runtime monitor listens.
> >>>> # A value of -1 deactivates the web server.
> >>>>
> >>>> jobmanager.web.port: 8081
> >>>>
> >>>> # The port uder which the standalone web client
> >>>> # (for job upload and submit) listens.
> >>>>
> >>>> webclient.port: 8080
> >>>>
> >>>>
> >>>>
> >>
> #==============================================================================
> >>>> # Advanced
> >>>>
> >>>>
> >>
> #==============================================================================
> >>>>
> >>>> # The number of buffers for the network stack.
> >>>> #
> >>>> # taskmanager.network.numberOfBuffers: 2048
> >>>>
> >>>> # Directories for temporary files.
> >>>> #
> >>>> # Add a delimited list for multiple directories, using the system
> >> directory
> >>>> # delimiter (colon ':' on unix) or a comma, e.g.:
> >>>> #     /data1/tmp:/data2/tmp:/data3/tmp
> >>>> #
> >>>> # Note: Each directory entry is read from and written to by a
> different
> >> I/O
> >>>> # thread. You can include the same directory multiple times in order
> to
> >>>> create
> >>>> # multiple I/O threads against that directory. This is for example
> >>>> relevant for
> >>>> # high-throughput RAIDs.
> >>>> #
> >>>> # If not specified, the system-specific Java temporary directory
> >>>> (java.io.tmpdir
> >>>> # property) is taken.
> >>>> #
> >>>> # taskmanager.tmp.dirs: /tmp
> >>>>
> >>>> # Path to the Hadoop configuration directory.
> >>>> #
> >>>> # This configuration is used when writing into HDFS. Unless specified
> >>>> otherwise,
> >>>> # HDFS file creation will use HDFS default settings with respect to
> >>>> block-size,
> >>>> # replication factor, etc.
> >>>> #
> >>>> # You can also directly specify the paths to hdfs-default.xml and
> >>>> hdfs-site.xml
> >>>> # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
> >>>> #
> >>>> # fs.hdfs.hadoopconf: /path/to/hadoop/conf/
> >>>>
> >>>>
> >>>>> On Mar 5, 2015, at 2:03 PM, Till Rohrmann <tr...@apache.org>
> >> wrote:
> >>>>>
> >>>>> How did you start the flink cluster? Using the start-local.sh, the
> >>>>> start-cluster.sh or starting the job manager and task managers
> >>>> individually
> >>>>> using taskmanager.sh/jobmanager.sh. Could you maybe post the
> >>>>> flink-conf.yaml file, you're using?
> >>>>>
> >>>>> With your changes, everything works, right?
> >>>>>
> >>>>> On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga <
> vidura.me@icloud.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Till,
> >>>>>> I’m sorry. It doesn’t seem to solve the problem. The taskmanager
> still
> >>>>>> tries a 10.0.0.0/8 IP.
> >>>>>>
> >>>>>> Best regards.
> >>>>>>
> >>>>>>> On Mar 5, 2015, at 1:00 PM, Till Rohrmann <till.rohrmann@gmail.com
> >
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi Dulaj,
> >>>>>>>
> >>>>>>> I looked through your commit and noticed that the JobClient might
> not
> >>>> be
> >>>>>>> listening on the right network interface. Your commit seems to fix
> >> it.
> >>>> I
> >>>>>>> just want to understand the problem properly and therefore I
> opened a
> >>>>>>> branch with a small change. Could you try out whether this change
> >> would
> >>>>>>> also fix your problem? You can find the code here [1]. Would be
> >> awesome
> >>>>>> if
> >>>>>>> you checked it out and let it run on your cluster setting. Thanks a
> >> lot
> >>>>>>> Dulaj!
> >>>>>>>
> >>>>>>> [1]
> >>>>>>>
> >>>>>>
> >>>>
> >>
> https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
> >>>>>>>
> >>>>>>> On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <
> >> vidura.me@icloud.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> The every change in the commit b7da22a is not required but I
> thought
> >>>>>> they
> >>>>>>>> are appropriate.
> >>>>>>>>
> >>>>>>>>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <
> vidura.me@icloud.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>> I found many other places “localhost” is hard coded. I changed
> them
> >>>> in
> >>>>>> a
> >>>>>>>> better way I think. I made a pull request. Please review. b7da22a
> <
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org>
> >> wrote:
> >>>>>>>>>>
> >>>>>>>>>> If I recall correctly, we only hardcode "localhost" in the local
> >>>> mini
> >>>>>>>>>> cluster - do you think it is problematic there as well?
> >>>>>>>>>>
> >>>>>>>>>> Have you found any other places?
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <
> >>>>>> vidura.me@icloud.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> In some places of the code, "localhost" is hard coded. When it
> is
> >>>>>>>> resolved
> >>>>>>>>>>> by the DNS, it is posible to be directed  to a different IP
> other
> >>>>>> than
> >>>>>>>>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those
> >> places
> >>>> to
> >>>>>>>>>>> 127.0.0.1 and it works like a charm.
> >>>>>>>>>>> But hard coding 127.0.0.1 is not a good option because when the
> >>>>>>>> jobmanager
> >>>>>>>>>>> ip is changed, this becomes an issue again. I'm thinking of
> >> setting
> >>>>>>>>>>> jobmanager ip from the config.yaml to these places.
> >>>>>>>>>>> If you have a better idea on doing this with your experience,
> >>>> please
> >>>>>>>> let
> >>>>>>>>>>> me know.
> >>>>>>>>>>>
> >>>>>>>>>>> Best.
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Could not build up connection to JobManager

Posted by Dulaj Viduranga <vi...@icloud.com>.

But can you explain why did my fix solved it?

> On Mar 5, 2015, at 5:50 PM, Stephan Ewen <se...@apache.org> wrote:
> 
> Hi Dulaj!
> 
> Okay, the logs give us some insight. Both setups seem to look good in terms
> of TaskManager and JobManager startup.
> 
> In one of the logs (127.0.0.1) you submit a job. The job fails because the
> TaskManager cannot grab the JAR file from the JobManager.
> I think the problem is that the BLOB server binds to 0.0.0.0 - it should
> bind to the same address as the JobManager actor system.
> 
> That should definitely be changed...
> 
> On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga <vi...@icloud.com>
> wrote:
> 
>> Hi,
>> This is the log with setting “localhost”
>> flink-Vidura-jobmanager-localhost.log <
>> https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log
>>> 
>> 
>> And this is the log with setting “127.0.0.1”
>> flink-Vidura-jobmanager-localhost.log <
>> https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log
>>> 
>> 
>>> On Mar 5, 2015, at 2:23 PM, Till Rohrmann <tr...@apache.org> wrote:
>>> 
>>> What does the jobmanager log says? I think Stephan added some more
>> logging
>>> output which helps us to debug this problem.
>>> 
>>> On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga <vi...@icloud.com>
>>> wrote:
>>> 
>>>> Using start-locat.sh.
>>>> I’m using the original config yaml. I also tried changing jobmanager
>>>> address in config to “127.0.0.1 but no luck. With my changes it works
>> ok.
>>>> The conf file follows.
>>>> 
>>>> 
>>>> 
>> ################################################################################
>>>> #  Licensed to the Apache Software Foundation (ASF) under one
>>>> #  or more contributor license agreements.  See the NOTICE file
>>>> #  distributed with this work for additional information
>>>> #  regarding copyright ownership.  The ASF licenses this file
>>>> #  to you under the Apache License, Version 2.0 (the
>>>> #  "License"); you may not use this file except in compliance
>>>> #  with the License.  You may obtain a copy of the License at
>>>> #
>>>> #      http://www.apache.org/licenses/LICENSE-2.0
>>>> #
>>>> #  Unless required by applicable law or agreed to in writing, software
>>>> #  distributed under the License is distributed on an "AS IS" BASIS,
>>>> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>> implied.
>>>> #  See the License for the specific language governing permissions and
>>>> # limitations under the License.
>>>> 
>>>> 
>> ################################################################################
>>>> 
>>>> 
>>>> 
>>>> 
>> #==============================================================================
>>>> # Common
>>>> 
>>>> 
>> #==============================================================================
>>>> 
>>>> jobmanager.rpc.address: 127.0.0.1
>>>> 
>>>> jobmanager.rpc.port: 6123
>>>> 
>>>> jobmanager.heap.mb: 256
>>>> 
>>>> taskmanager.heap.mb: 512
>>>> 
>>>> taskmanager.numberOfTaskSlots: 1
>>>> 
>>>> parallelization.degree.default: 1
>>>> 
>>>> 
>>>> 
>> #==============================================================================
>>>> # Web Frontend
>>>> 
>>>> 
>> #==============================================================================
>>>> 
>>>> # The port under which the web-based runtime monitor listens.
>>>> # A value of -1 deactivates the web server.
>>>> 
>>>> jobmanager.web.port: 8081
>>>> 
>>>> # The port uder which the standalone web client
>>>> # (for job upload and submit) listens.
>>>> 
>>>> webclient.port: 8080
>>>> 
>>>> 
>>>> 
>> #==============================================================================
>>>> # Advanced
>>>> 
>>>> 
>> #==============================================================================
>>>> 
>>>> # The number of buffers for the network stack.
>>>> #
>>>> # taskmanager.network.numberOfBuffers: 2048
>>>> 
>>>> # Directories for temporary files.
>>>> #
>>>> # Add a delimited list for multiple directories, using the system
>> directory
>>>> # delimiter (colon ':' on unix) or a comma, e.g.:
>>>> #     /data1/tmp:/data2/tmp:/data3/tmp
>>>> #
>>>> # Note: Each directory entry is read from and written to by a different
>> I/O
>>>> # thread. You can include the same directory multiple times in order to
>>>> create
>>>> # multiple I/O threads against that directory. This is for example
>>>> relevant for
>>>> # high-throughput RAIDs.
>>>> #
>>>> # If not specified, the system-specific Java temporary directory
>>>> (java.io.tmpdir
>>>> # property) is taken.
>>>> #
>>>> # taskmanager.tmp.dirs: /tmp
>>>> 
>>>> # Path to the Hadoop configuration directory.
>>>> #
>>>> # This configuration is used when writing into HDFS. Unless specified
>>>> otherwise,
>>>> # HDFS file creation will use HDFS default settings with respect to
>>>> block-size,
>>>> # replication factor, etc.
>>>> #
>>>> # You can also directly specify the paths to hdfs-default.xml and
>>>> hdfs-site.xml
>>>> # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
>>>> #
>>>> # fs.hdfs.hadoopconf: /path/to/hadoop/conf/
>>>> 
>>>> 
>>>>> On Mar 5, 2015, at 2:03 PM, Till Rohrmann <tr...@apache.org>
>> wrote:
>>>>> 
>>>>> How did you start the flink cluster? Using the start-local.sh, the
>>>>> start-cluster.sh or starting the job manager and task managers
>>>> individually
>>>>> using taskmanager.sh/jobmanager.sh. Could you maybe post the
>>>>> flink-conf.yaml file, you're using?
>>>>> 
>>>>> With your changes, everything works, right?
>>>>> 
>>>>> On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga <vi...@icloud.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi Till,
>>>>>> I’m sorry. It doesn’t seem to solve the problem. The taskmanager still
>>>>>> tries a 10.0.0.0/8 IP.
>>>>>> 
>>>>>> Best regards.
>>>>>> 
>>>>>>> On Mar 5, 2015, at 1:00 PM, Till Rohrmann <ti...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi Dulaj,
>>>>>>> 
>>>>>>> I looked through your commit and noticed that the JobClient might not
>>>> be
>>>>>>> listening on the right network interface. Your commit seems to fix
>> it.
>>>> I
>>>>>>> just want to understand the problem properly and therefore I opened a
>>>>>>> branch with a small change. Could you try out whether this change
>> would
>>>>>>> also fix your problem? You can find the code here [1]. Would be
>> awesome
>>>>>> if
>>>>>>> you checked it out and let it run on your cluster setting. Thanks a
>> lot
>>>>>>> Dulaj!
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
>>>>>>> 
>>>>>>> On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <
>> vidura.me@icloud.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> The every change in the commit b7da22a is not required but I thought
>>>>>> they
>>>>>>>> are appropriate.
>>>>>>>> 
>>>>>>>>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <vi...@icloud.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> I found many other places “localhost” is hard coded. I changed them
>>>> in
>>>>>> a
>>>>>>>> better way I think. I made a pull request. Please review. b7da22a <
>>>>>>>> 
>>>>>> 
>>>> 
>> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org>
>> wrote:
>>>>>>>>>> 
>>>>>>>>>> If I recall correctly, we only hardcode "localhost" in the local
>>>> mini
>>>>>>>>>> cluster - do you think it is problematic there as well?
>>>>>>>>>> 
>>>>>>>>>> Have you found any other places?
>>>>>>>>>> 
>>>>>>>>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <
>>>>>> vidura.me@icloud.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> In some places of the code, "localhost" is hard coded. When it is
>>>>>>>> resolved
>>>>>>>>>>> by the DNS, it is posible to be directed  to a different IP other
>>>>>> than
>>>>>>>>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those
>> places
>>>> to
>>>>>>>>>>> 127.0.0.1 and it works like a charm.
>>>>>>>>>>> But hard coding 127.0.0.1 is not a good option because when the
>>>>>>>> jobmanager
>>>>>>>>>>> ip is changed, this becomes an issue again. I'm thinking of
>> setting
>>>>>>>>>>> jobmanager ip from the config.yaml to these places.
>>>>>>>>>>> If you have a better idea on doing this with your experience,
>>>> please
>>>>>>>> let
>>>>>>>>>>> me know.
>>>>>>>>>>> 
>>>>>>>>>>> Best.
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Could not build up connection to JobManager

Posted by Stephan Ewen <se...@apache.org>.

Hi Dulaj!

Okay, the logs give us some insight. Both setups seem to look good in terms
of TaskManager and JobManager startup.

In one of the logs (127.0.0.1) you submit a job. The job fails because the
TaskManager cannot grab the JAR file from the JobManager.
I think the problem is that the BLOB server binds to 0.0.0.0 - it should
bind to the same address as the JobManager actor system.

That should definitely be changed...

On Thu, Mar 5, 2015 at 10:08 AM, Dulaj Viduranga <vi...@icloud.com>
wrote:

> Hi,
> This is the log with setting “localhost”
> flink-Vidura-jobmanager-localhost.log <
> https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log
> >
>
> And this is the log with setting “127.0.0.1”
> flink-Vidura-jobmanager-localhost.log <
> https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log
> >
>
> > On Mar 5, 2015, at 2:23 PM, Till Rohrmann <tr...@apache.org> wrote:
> >
> > What does the jobmanager log says? I think Stephan added some more
> logging
> > output which helps us to debug this problem.
> >
> > On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga <vi...@icloud.com>
> > wrote:
> >
> >> Using start-locat.sh.
> >> I’m using the original config yaml. I also tried changing jobmanager
> >> address in config to “127.0.0.1 but no luck. With my changes it works
> ok.
> >> The conf file follows.
> >>
> >>
> >>
> ################################################################################
> >> #  Licensed to the Apache Software Foundation (ASF) under one
> >> #  or more contributor license agreements.  See the NOTICE file
> >> #  distributed with this work for additional information
> >> #  regarding copyright ownership.  The ASF licenses this file
> >> #  to you under the Apache License, Version 2.0 (the
> >> #  "License"); you may not use this file except in compliance
> >> #  with the License.  You may obtain a copy of the License at
> >> #
> >> #      http://www.apache.org/licenses/LICENSE-2.0
> >> #
> >> #  Unless required by applicable law or agreed to in writing, software
> >> #  distributed under the License is distributed on an "AS IS" BASIS,
> >> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> >> #  See the License for the specific language governing permissions and
> >> # limitations under the License.
> >>
> >>
> ################################################################################
> >>
> >>
> >>
> >>
> #==============================================================================
> >> # Common
> >>
> >>
> #==============================================================================
> >>
> >> jobmanager.rpc.address: 127.0.0.1
> >>
> >> jobmanager.rpc.port: 6123
> >>
> >> jobmanager.heap.mb: 256
> >>
> >> taskmanager.heap.mb: 512
> >>
> >> taskmanager.numberOfTaskSlots: 1
> >>
> >> parallelization.degree.default: 1
> >>
> >>
> >>
> #==============================================================================
> >> # Web Frontend
> >>
> >>
> #==============================================================================
> >>
> >> # The port under which the web-based runtime monitor listens.
> >> # A value of -1 deactivates the web server.
> >>
> >> jobmanager.web.port: 8081
> >>
> >> # The port uder which the standalone web client
> >> # (for job upload and submit) listens.
> >>
> >> webclient.port: 8080
> >>
> >>
> >>
> #==============================================================================
> >> # Advanced
> >>
> >>
> #==============================================================================
> >>
> >> # The number of buffers for the network stack.
> >> #
> >> # taskmanager.network.numberOfBuffers: 2048
> >>
> >> # Directories for temporary files.
> >> #
> >> # Add a delimited list for multiple directories, using the system
> directory
> >> # delimiter (colon ':' on unix) or a comma, e.g.:
> >> #     /data1/tmp:/data2/tmp:/data3/tmp
> >> #
> >> # Note: Each directory entry is read from and written to by a different
> I/O
> >> # thread. You can include the same directory multiple times in order to
> >> create
> >> # multiple I/O threads against that directory. This is for example
> >> relevant for
> >> # high-throughput RAIDs.
> >> #
> >> # If not specified, the system-specific Java temporary directory
> >> (java.io.tmpdir
> >> # property) is taken.
> >> #
> >> # taskmanager.tmp.dirs: /tmp
> >>
> >> # Path to the Hadoop configuration directory.
> >> #
> >> # This configuration is used when writing into HDFS. Unless specified
> >> otherwise,
> >> # HDFS file creation will use HDFS default settings with respect to
> >> block-size,
> >> # replication factor, etc.
> >> #
> >> # You can also directly specify the paths to hdfs-default.xml and
> >> hdfs-site.xml
> >> # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
> >> #
> >> # fs.hdfs.hadoopconf: /path/to/hadoop/conf/
> >>
> >>
> >>> On Mar 5, 2015, at 2:03 PM, Till Rohrmann <tr...@apache.org>
> wrote:
> >>>
> >>> How did you start the flink cluster? Using the start-local.sh, the
> >>> start-cluster.sh or starting the job manager and task managers
> >> individually
> >>> using taskmanager.sh/jobmanager.sh. Could you maybe post the
> >>> flink-conf.yaml file, you're using?
> >>>
> >>> With your changes, everything works, right?
> >>>
> >>> On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga <vi...@icloud.com>
> >>> wrote:
> >>>
> >>>> Hi Till,
> >>>> I’m sorry. It doesn’t seem to solve the problem. The taskmanager still
> >>>> tries a 10.0.0.0/8 IP.
> >>>>
> >>>> Best regards.
> >>>>
> >>>>> On Mar 5, 2015, at 1:00 PM, Till Rohrmann <ti...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Hi Dulaj,
> >>>>>
> >>>>> I looked through your commit and noticed that the JobClient might not
> >> be
> >>>>> listening on the right network interface. Your commit seems to fix
> it.
> >> I
> >>>>> just want to understand the problem properly and therefore I opened a
> >>>>> branch with a small change. Could you try out whether this change
> would
> >>>>> also fix your problem? You can find the code here [1]. Would be
> awesome
> >>>> if
> >>>>> you checked it out and let it run on your cluster setting. Thanks a
> lot
> >>>>> Dulaj!
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> >>
> https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
> >>>>>
> >>>>> On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <
> vidura.me@icloud.com>
> >>>>> wrote:
> >>>>>
> >>>>>> The every change in the commit b7da22a is not required but I thought
> >>>> they
> >>>>>> are appropriate.
> >>>>>>
> >>>>>>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <vi...@icloud.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>> I found many other places “localhost” is hard coded. I changed them
> >> in
> >>>> a
> >>>>>> better way I think. I made a pull request. Please review. b7da22a <
> >>>>>>
> >>>>
> >>
> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org>
> wrote:
> >>>>>>>>
> >>>>>>>> If I recall correctly, we only hardcode "localhost" in the local
> >> mini
> >>>>>>>> cluster - do you think it is problematic there as well?
> >>>>>>>>
> >>>>>>>> Have you found any other places?
> >>>>>>>>
> >>>>>>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <
> >>>> vidura.me@icloud.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> In some places of the code, "localhost" is hard coded. When it is
> >>>>>> resolved
> >>>>>>>>> by the DNS, it is posible to be directed  to a different IP other
> >>>> than
> >>>>>>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those
> places
> >> to
> >>>>>>>>> 127.0.0.1 and it works like a charm.
> >>>>>>>>> But hard coding 127.0.0.1 is not a good option because when the
> >>>>>> jobmanager
> >>>>>>>>> ip is changed, this becomes an issue again. I'm thinking of
> setting
> >>>>>>>>> jobmanager ip from the config.yaml to these places.
> >>>>>>>>> If you have a better idea on doing this with your experience,
> >> please
> >>>>>> let
> >>>>>>>>> me know.
> >>>>>>>>>
> >>>>>>>>> Best.
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Could not build up connection to JobManager

Posted by Dulaj Viduranga <vi...@icloud.com>.

Hi,
This is the log with setting “localhost”
flink-Vidura-jobmanager-localhost.log <https://gist.github.com/viduranga/e9d43521587697de3eb5#file-flink-vidura-jobmanager-localhost-log>

And this is the log with setting “127.0.0.1”
flink-Vidura-jobmanager-localhost.log <https://gist.github.com/viduranga/5af6b05f204e1f4b344f#file-flink-vidura-jobmanager-localhost-log>

> On Mar 5, 2015, at 2:23 PM, Till Rohrmann <tr...@apache.org> wrote:
> 
> What does the jobmanager log says? I think Stephan added some more logging
> output which helps us to debug this problem.
> 
> On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga <vi...@icloud.com>
> wrote:
> 
>> Using start-locat.sh.
>> I’m using the original config yaml. I also tried changing jobmanager
>> address in config to “127.0.0.1 but no luck. With my changes it works ok.
>> The conf file follows.
>> 
>> 
>> ################################################################################
>> #  Licensed to the Apache Software Foundation (ASF) under one
>> #  or more contributor license agreements.  See the NOTICE file
>> #  distributed with this work for additional information
>> #  regarding copyright ownership.  The ASF licenses this file
>> #  to you under the Apache License, Version 2.0 (the
>> #  "License"); you may not use this file except in compliance
>> #  with the License.  You may obtain a copy of the License at
>> #
>> #      http://www.apache.org/licenses/LICENSE-2.0
>> #
>> #  Unless required by applicable law or agreed to in writing, software
>> #  distributed under the License is distributed on an "AS IS" BASIS,
>> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>> #  See the License for the specific language governing permissions and
>> # limitations under the License.
>> 
>> ################################################################################
>> 
>> 
>> 
>> #==============================================================================
>> # Common
>> 
>> #==============================================================================
>> 
>> jobmanager.rpc.address: 127.0.0.1
>> 
>> jobmanager.rpc.port: 6123
>> 
>> jobmanager.heap.mb: 256
>> 
>> taskmanager.heap.mb: 512
>> 
>> taskmanager.numberOfTaskSlots: 1
>> 
>> parallelization.degree.default: 1
>> 
>> 
>> #==============================================================================
>> # Web Frontend
>> 
>> #==============================================================================
>> 
>> # The port under which the web-based runtime monitor listens.
>> # A value of -1 deactivates the web server.
>> 
>> jobmanager.web.port: 8081
>> 
>> # The port uder which the standalone web client
>> # (for job upload and submit) listens.
>> 
>> webclient.port: 8080
>> 
>> 
>> #==============================================================================
>> # Advanced
>> 
>> #==============================================================================
>> 
>> # The number of buffers for the network stack.
>> #
>> # taskmanager.network.numberOfBuffers: 2048
>> 
>> # Directories for temporary files.
>> #
>> # Add a delimited list for multiple directories, using the system directory
>> # delimiter (colon ':' on unix) or a comma, e.g.:
>> #     /data1/tmp:/data2/tmp:/data3/tmp
>> #
>> # Note: Each directory entry is read from and written to by a different I/O
>> # thread. You can include the same directory multiple times in order to
>> create
>> # multiple I/O threads against that directory. This is for example
>> relevant for
>> # high-throughput RAIDs.
>> #
>> # If not specified, the system-specific Java temporary directory
>> (java.io.tmpdir
>> # property) is taken.
>> #
>> # taskmanager.tmp.dirs: /tmp
>> 
>> # Path to the Hadoop configuration directory.
>> #
>> # This configuration is used when writing into HDFS. Unless specified
>> otherwise,
>> # HDFS file creation will use HDFS default settings with respect to
>> block-size,
>> # replication factor, etc.
>> #
>> # You can also directly specify the paths to hdfs-default.xml and
>> hdfs-site.xml
>> # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
>> #
>> # fs.hdfs.hadoopconf: /path/to/hadoop/conf/
>> 
>> 
>>> On Mar 5, 2015, at 2:03 PM, Till Rohrmann <tr...@apache.org> wrote:
>>> 
>>> How did you start the flink cluster? Using the start-local.sh, the
>>> start-cluster.sh or starting the job manager and task managers
>> individually
>>> using taskmanager.sh/jobmanager.sh. Could you maybe post the
>>> flink-conf.yaml file, you're using?
>>> 
>>> With your changes, everything works, right?
>>> 
>>> On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga <vi...@icloud.com>
>>> wrote:
>>> 
>>>> Hi Till,
>>>> I’m sorry. It doesn’t seem to solve the problem. The taskmanager still
>>>> tries a 10.0.0.0/8 IP.
>>>> 
>>>> Best regards.
>>>> 
>>>>> On Mar 5, 2015, at 1:00 PM, Till Rohrmann <ti...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Hi Dulaj,
>>>>> 
>>>>> I looked through your commit and noticed that the JobClient might not
>> be
>>>>> listening on the right network interface. Your commit seems to fix it.
>> I
>>>>> just want to understand the problem properly and therefore I opened a
>>>>> branch with a small change. Could you try out whether this change would
>>>>> also fix your problem? You can find the code here [1]. Would be awesome
>>>> if
>>>>> you checked it out and let it run on your cluster setting. Thanks a lot
>>>>> Dulaj!
>>>>> 
>>>>> [1]
>>>>> 
>>>> 
>> https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
>>>>> 
>>>>> On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <vi...@icloud.com>
>>>>> wrote:
>>>>> 
>>>>>> The every change in the commit b7da22a is not required but I thought
>>>> they
>>>>>> are appropriate.
>>>>>> 
>>>>>>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <vi...@icloud.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> I found many other places “localhost” is hard coded. I changed them
>> in
>>>> a
>>>>>> better way I think. I made a pull request. Please review. b7da22a <
>>>>>> 
>>>> 
>> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
>>>>>>> 
>>>>>>> 
>>>>>>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org> wrote:
>>>>>>>> 
>>>>>>>> If I recall correctly, we only hardcode "localhost" in the local
>> mini
>>>>>>>> cluster - do you think it is problematic there as well?
>>>>>>>> 
>>>>>>>> Have you found any other places?
>>>>>>>> 
>>>>>>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <
>>>> vidura.me@icloud.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> In some places of the code, "localhost" is hard coded. When it is
>>>>>> resolved
>>>>>>>>> by the DNS, it is posible to be directed  to a different IP other
>>>> than
>>>>>>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those places
>> to
>>>>>>>>> 127.0.0.1 and it works like a charm.
>>>>>>>>> But hard coding 127.0.0.1 is not a good option because when the
>>>>>> jobmanager
>>>>>>>>> ip is changed, this becomes an issue again. I'm thinking of setting
>>>>>>>>> jobmanager ip from the config.yaml to these places.
>>>>>>>>> If you have a better idea on doing this with your experience,
>> please
>>>>>> let
>>>>>>>>> me know.
>>>>>>>>> 
>>>>>>>>> Best.
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Could not build up connection to JobManager

Posted by Till Rohrmann <tr...@apache.org>.

What does the jobmanager log says? I think Stephan added some more logging
output which helps us to debug this problem.

On Thu, Mar 5, 2015 at 9:36 AM, Dulaj Viduranga <vi...@icloud.com>
wrote:

> Using start-locat.sh.
> I’m using the original config yaml. I also tried changing jobmanager
> address in config to “127.0.0.1 but no luck. With my changes it works ok.
> The conf file follows.
>
>
> ################################################################################
> #  Licensed to the Apache Software Foundation (ASF) under one
> #  or more contributor license agreements.  See the NOTICE file
> #  distributed with this work for additional information
> #  regarding copyright ownership.  The ASF licenses this file
> #  to you under the Apache License, Version 2.0 (the
> #  "License"); you may not use this file except in compliance
> #  with the License.  You may obtain a copy of the License at
> #
> #      http://www.apache.org/licenses/LICENSE-2.0
> #
> #  Unless required by applicable law or agreed to in writing, software
> #  distributed under the License is distributed on an "AS IS" BASIS,
> #  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> #  See the License for the specific language governing permissions and
> # limitations under the License.
>
> ################################################################################
>
>
>
> #==============================================================================
> # Common
>
> #==============================================================================
>
> jobmanager.rpc.address: 127.0.0.1
>
> jobmanager.rpc.port: 6123
>
> jobmanager.heap.mb: 256
>
> taskmanager.heap.mb: 512
>
> taskmanager.numberOfTaskSlots: 1
>
> parallelization.degree.default: 1
>
>
> #==============================================================================
> # Web Frontend
>
> #==============================================================================
>
> # The port under which the web-based runtime monitor listens.
> # A value of -1 deactivates the web server.
>
> jobmanager.web.port: 8081
>
> # The port uder which the standalone web client
> # (for job upload and submit) listens.
>
> webclient.port: 8080
>
>
> #==============================================================================
> # Advanced
>
> #==============================================================================
>
> # The number of buffers for the network stack.
> #
> # taskmanager.network.numberOfBuffers: 2048
>
> # Directories for temporary files.
> #
> # Add a delimited list for multiple directories, using the system directory
> # delimiter (colon ':' on unix) or a comma, e.g.:
> #     /data1/tmp:/data2/tmp:/data3/tmp
> #
> # Note: Each directory entry is read from and written to by a different I/O
> # thread. You can include the same directory multiple times in order to
> create
> # multiple I/O threads against that directory. This is for example
> relevant for
> # high-throughput RAIDs.
> #
> # If not specified, the system-specific Java temporary directory
> (java.io.tmpdir
> # property) is taken.
> #
> # taskmanager.tmp.dirs: /tmp
>
> # Path to the Hadoop configuration directory.
> #
> # This configuration is used when writing into HDFS. Unless specified
> otherwise,
> # HDFS file creation will use HDFS default settings with respect to
> block-size,
> # replication factor, etc.
> #
> # You can also directly specify the paths to hdfs-default.xml and
> hdfs-site.xml
> # via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
> #
> # fs.hdfs.hadoopconf: /path/to/hadoop/conf/
>
>
> > On Mar 5, 2015, at 2:03 PM, Till Rohrmann <tr...@apache.org> wrote:
> >
> > How did you start the flink cluster? Using the start-local.sh, the
> > start-cluster.sh or starting the job manager and task managers
> individually
> > using taskmanager.sh/jobmanager.sh. Could you maybe post the
> > flink-conf.yaml file, you're using?
> >
> > With your changes, everything works, right?
> >
> > On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga <vi...@icloud.com>
> > wrote:
> >
> >> Hi Till,
> >> I’m sorry. It doesn’t seem to solve the problem. The taskmanager still
> >> tries a 10.0.0.0/8 IP.
> >>
> >> Best regards.
> >>
> >>> On Mar 5, 2015, at 1:00 PM, Till Rohrmann <ti...@gmail.com>
> >> wrote:
> >>>
> >>> Hi Dulaj,
> >>>
> >>> I looked through your commit and noticed that the JobClient might not
> be
> >>> listening on the right network interface. Your commit seems to fix it.
> I
> >>> just want to understand the problem properly and therefore I opened a
> >>> branch with a small change. Could you try out whether this change would
> >>> also fix your problem? You can find the code here [1]. Would be awesome
> >> if
> >>> you checked it out and let it run on your cluster setting. Thanks a lot
> >>> Dulaj!
> >>>
> >>> [1]
> >>>
> >>
> https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
> >>>
> >>> On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <vi...@icloud.com>
> >>> wrote:
> >>>
> >>>> The every change in the commit b7da22a is not required but I thought
> >> they
> >>>> are appropriate.
> >>>>
> >>>>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <vi...@icloud.com>
> >>>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>> I found many other places “localhost” is hard coded. I changed them
> in
> >> a
> >>>> better way I think. I made a pull request. Please review. b7da22a <
> >>>>
> >>
> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
> >>>>>
> >>>>>
> >>>>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org> wrote:
> >>>>>>
> >>>>>> If I recall correctly, we only hardcode "localhost" in the local
> mini
> >>>>>> cluster - do you think it is problematic there as well?
> >>>>>>
> >>>>>> Have you found any other places?
> >>>>>>
> >>>>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <
> >> vidura.me@icloud.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> In some places of the code, "localhost" is hard coded. When it is
> >>>> resolved
> >>>>>>> by the DNS, it is posible to be directed  to a different IP other
> >> than
> >>>>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those places
> to
> >>>>>>> 127.0.0.1 and it works like a charm.
> >>>>>>> But hard coding 127.0.0.1 is not a good option because when the
> >>>> jobmanager
> >>>>>>> ip is changed, this becomes an issue again. I'm thinking of setting
> >>>>>>> jobmanager ip from the config.yaml to these places.
> >>>>>>> If you have a better idea on doing this with your experience,
> please
> >>>> let
> >>>>>>> me know.
> >>>>>>>
> >>>>>>> Best.
> >>>>>>>
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Could not build up connection to JobManager

Posted by Dulaj Viduranga <vi...@icloud.com>.

Using start-locat.sh.
I’m using the original config yaml. I also tried changing jobmanager address in config to “127.0.0.1 but no luck. With my changes it works ok.
The conf file follows.

################################################################################
#  Licensed to the Apache Software Foundation (ASF) under one
#  or more contributor license agreements.  See the NOTICE file
#  distributed with this work for additional information
#  regarding copyright ownership.  The ASF licenses this file
#  to you under the Apache License, Version 2.0 (the
#  "License"); you may not use this file except in compliance
#  with the License.  You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
# limitations under the License.
################################################################################

#==============================================================================
# Common
#==============================================================================

jobmanager.rpc.address: 127.0.0.1

jobmanager.rpc.port: 6123

jobmanager.heap.mb: 256

taskmanager.heap.mb: 512

taskmanager.numberOfTaskSlots: 1

parallelization.degree.default: 1

#==============================================================================
# Web Frontend
#==============================================================================

# The port under which the web-based runtime monitor listens.
# A value of -1 deactivates the web server.

jobmanager.web.port: 8081

# The port uder which the standalone web client
# (for job upload and submit) listens.

webclient.port: 8080

#==============================================================================
# Advanced
#==============================================================================

# The number of buffers for the network stack.
#
# taskmanager.network.numberOfBuffers: 2048

# Directories for temporary files.
#
# Add a delimited list for multiple directories, using the system directory
# delimiter (colon ':' on unix) or a comma, e.g.:
#     /data1/tmp:/data2/tmp:/data3/tmp
#
# Note: Each directory entry is read from and written to by a different I/O
# thread. You can include the same directory multiple times in order to create
# multiple I/O threads against that directory. This is for example relevant for
# high-throughput RAIDs.
#
# If not specified, the system-specific Java temporary directory (java.io.tmpdir
# property) is taken.
#
# taskmanager.tmp.dirs: /tmp

# Path to the Hadoop configuration directory.
#
# This configuration is used when writing into HDFS. Unless specified otherwise,
# HDFS file creation will use HDFS default settings with respect to block-size,
# replication factor, etc.
#
# You can also directly specify the paths to hdfs-default.xml and hdfs-site.xml
# via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
#
# fs.hdfs.hadoopconf: /path/to/hadoop/conf/

> On Mar 5, 2015, at 2:03 PM, Till Rohrmann <tr...@apache.org> wrote:
> 
> How did you start the flink cluster? Using the start-local.sh, the
> start-cluster.sh or starting the job manager and task managers individually
> using taskmanager.sh/jobmanager.sh. Could you maybe post the
> flink-conf.yaml file, you're using?
> 
> With your changes, everything works, right?
> 
> On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga <vi...@icloud.com>
> wrote:
> 
>> Hi Till,
>> I’m sorry. It doesn’t seem to solve the problem. The taskmanager still
>> tries a 10.0.0.0/8 IP.
>> 
>> Best regards.
>> 
>>> On Mar 5, 2015, at 1:00 PM, Till Rohrmann <ti...@gmail.com>
>> wrote:
>>> 
>>> Hi Dulaj,
>>> 
>>> I looked through your commit and noticed that the JobClient might not be
>>> listening on the right network interface. Your commit seems to fix it. I
>>> just want to understand the problem properly and therefore I opened a
>>> branch with a small change. Could you try out whether this change would
>>> also fix your problem? You can find the code here [1]. Would be awesome
>> if
>>> you checked it out and let it run on your cluster setting. Thanks a lot
>>> Dulaj!
>>> 
>>> [1]
>>> 
>> https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
>>> 
>>> On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <vi...@icloud.com>
>>> wrote:
>>> 
>>>> The every change in the commit b7da22a is not required but I thought
>> they
>>>> are appropriate.
>>>> 
>>>>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <vi...@icloud.com>
>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> I found many other places “localhost” is hard coded. I changed them in
>> a
>>>> better way I think. I made a pull request. Please review. b7da22a <
>>>> 
>> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
>>>>> 
>>>>> 
>>>>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org> wrote:
>>>>>> 
>>>>>> If I recall correctly, we only hardcode "localhost" in the local mini
>>>>>> cluster - do you think it is problematic there as well?
>>>>>> 
>>>>>> Have you found any other places?
>>>>>> 
>>>>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <
>> vidura.me@icloud.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> In some places of the code, "localhost" is hard coded. When it is
>>>> resolved
>>>>>>> by the DNS, it is posible to be directed  to a different IP other
>> than
>>>>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to
>>>>>>> 127.0.0.1 and it works like a charm.
>>>>>>> But hard coding 127.0.0.1 is not a good option because when the
>>>> jobmanager
>>>>>>> ip is changed, this becomes an issue again. I'm thinking of setting
>>>>>>> jobmanager ip from the config.yaml to these places.
>>>>>>> If you have a better idea on doing this with your experience, please
>>>> let
>>>>>>> me know.
>>>>>>> 
>>>>>>> Best.
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Could not build up connection to JobManager

Posted by Till Rohrmann <tr...@apache.org>.

How did you start the flink cluster? Using the start-local.sh, the
start-cluster.sh or starting the job manager and task managers individually
using taskmanager.sh/jobmanager.sh. Could you maybe post the
flink-conf.yaml file, you're using?

With your changes, everything works, right?

On Thu, Mar 5, 2015 at 8:55 AM, Dulaj Viduranga <vi...@icloud.com>
wrote:

> Hi Till,
> I’m sorry. It doesn’t seem to solve the problem. The taskmanager still
> tries a 10.0.0.0/8 IP.
>
> Best regards.
>
> > On Mar 5, 2015, at 1:00 PM, Till Rohrmann <ti...@gmail.com>
> wrote:
> >
> > Hi Dulaj,
> >
> > I looked through your commit and noticed that the JobClient might not be
> > listening on the right network interface. Your commit seems to fix it. I
> > just want to understand the problem properly and therefore I opened a
> > branch with a small change. Could you try out whether this change would
> > also fix your problem? You can find the code here [1]. Would be awesome
> if
> > you checked it out and let it run on your cluster setting. Thanks a lot
> > Dulaj!
> >
> > [1]
> >
> https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
> >
> > On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <vi...@icloud.com>
> > wrote:
> >
> >> The every change in the commit b7da22a is not required but I thought
> they
> >> are appropriate.
> >>
> >>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <vi...@icloud.com>
> >> wrote:
> >>>
> >>> Hi,
> >>> I found many other places “localhost” is hard coded. I changed them in
> a
> >> better way I think. I made a pull request. Please review. b7da22a <
> >>
> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
> >>>
> >>>
> >>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org> wrote:
> >>>>
> >>>> If I recall correctly, we only hardcode "localhost" in the local mini
> >>>> cluster - do you think it is problematic there as well?
> >>>>
> >>>> Have you found any other places?
> >>>>
> >>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <
> vidura.me@icloud.com>
> >>>> wrote:
> >>>>
> >>>>> In some places of the code, "localhost" is hard coded. When it is
> >> resolved
> >>>>> by the DNS, it is posible to be directed  to a different IP other
> than
> >>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to
> >>>>> 127.0.0.1 and it works like a charm.
> >>>>> But hard coding 127.0.0.1 is not a good option because when the
> >> jobmanager
> >>>>> ip is changed, this becomes an issue again. I'm thinking of setting
> >>>>> jobmanager ip from the config.yaml to these places.
> >>>>> If you have a better idea on doing this with your experience, please
> >> let
> >>>>> me know.
> >>>>>
> >>>>> Best.
> >>>>>
> >>>
> >>
> >>
>
>

Re: Could not build up connection to JobManager

Posted by Dulaj Viduranga <vi...@icloud.com>.

Hi Till,
I’m sorry. It doesn’t seem to solve the problem. The taskmanager still tries a 10.0.0.0/8 IP.

Best regards.

> On Mar 5, 2015, at 1:00 PM, Till Rohrmann <ti...@gmail.com> wrote:
> 
> Hi Dulaj,
> 
> I looked through your commit and noticed that the JobClient might not be
> listening on the right network interface. Your commit seems to fix it. I
> just want to understand the problem properly and therefore I opened a
> branch with a small change. Could you try out whether this change would
> also fix your problem? You can find the code here [1]. Would be awesome if
> you checked it out and let it run on your cluster setting. Thanks a lot
> Dulaj!
> 
> [1]
> https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient
> 
> On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <vi...@icloud.com>
> wrote:
> 
>> The every change in the commit b7da22a is not required but I thought they
>> are appropriate.
>> 
>>> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <vi...@icloud.com>
>> wrote:
>>> 
>>> Hi,
>>> I found many other places “localhost” is hard coded. I changed them in a
>> better way I think. I made a pull request. Please review. b7da22a <
>> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
>>> 
>>> 
>>>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org> wrote:
>>>> 
>>>> If I recall correctly, we only hardcode "localhost" in the local mini
>>>> cluster - do you think it is problematic there as well?
>>>> 
>>>> Have you found any other places?
>>>> 
>>>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <vi...@icloud.com>
>>>> wrote:
>>>> 
>>>>> In some places of the code, "localhost" is hard coded. When it is
>> resolved
>>>>> by the DNS, it is posible to be directed  to a different IP other than
>>>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to
>>>>> 127.0.0.1 and it works like a charm.
>>>>> But hard coding 127.0.0.1 is not a good option because when the
>> jobmanager
>>>>> ip is changed, this becomes an issue again. I'm thinking of setting
>>>>> jobmanager ip from the config.yaml to these places.
>>>>> If you have a better idea on doing this with your experience, please
>> let
>>>>> me know.
>>>>> 
>>>>> Best.
>>>>> 
>>> 
>> 
>>

Re: Could not build up connection to JobManager

Posted by Till Rohrmann <ti...@gmail.com>.

Hi Dulaj,

I looked through your commit and noticed that the JobClient might not be
listening on the right network interface. Your commit seems to fix it. I
just want to understand the problem properly and therefore I opened a
branch with a small change. Could you try out whether this change would
also fix your problem? You can find the code here [1]. Would be awesome if
you checked it out and let it run on your cluster setting. Thanks a lot
Dulaj!

[1]
https://github.com/tillrohrmann/flink/tree/fixLocalFlinkMiniClusterJobClient

On Thu, Mar 5, 2015 at 4:21 AM, Dulaj Viduranga <vi...@icloud.com>
wrote:

> The every change in the commit b7da22a is not required but I thought they
> are appropriate.
>
> > On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <vi...@icloud.com>
> wrote:
> >
> > Hi,
> > I found many other places “localhost” is hard coded. I changed them in a
> better way I think. I made a pull request. Please review. b7da22a <
> https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd
> >
> >
> >> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org> wrote:
> >>
> >> If I recall correctly, we only hardcode "localhost" in the local mini
> >> cluster - do you think it is problematic there as well?
> >>
> >> Have you found any other places?
> >>
> >> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <vi...@icloud.com>
> >> wrote:
> >>
> >>> In some places of the code, "localhost" is hard coded. When it is
> resolved
> >>> by the DNS, it is posible to be directed  to a different IP other than
> >>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to
> >>> 127.0.0.1 and it works like a charm.
> >>> But hard coding 127.0.0.1 is not a good option because when the
> jobmanager
> >>> ip is changed, this becomes an issue again. I'm thinking of setting
> >>> jobmanager ip from the config.yaml to these places.
> >>> If you have a better idea on doing this with your experience, please
> let
> >>> me know.
> >>>
> >>> Best.
> >>>
> >
>
>

Re: Could not build up connection to JobManager

Posted by Dulaj Viduranga <vi...@icloud.com>.

The every change in the commit b7da22a is not required but I thought they are appropriate.

> On Mar 5, 2015, at 8:11 AM, Dulaj Viduranga <vi...@icloud.com> wrote:
> 
> Hi,
> I found many other places “localhost” is hard coded. I changed them in a better way I think. I made a pull request. Please review. b7da22a <https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd>
> 
>> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org> wrote:
>> 
>> If I recall correctly, we only hardcode "localhost" in the local mini
>> cluster - do you think it is problematic there as well?
>> 
>> Have you found any other places?
>> 
>> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <vi...@icloud.com>
>> wrote:
>> 
>>> In some places of the code, "localhost" is hard coded. When it is resolved
>>> by the DNS, it is posible to be directed  to a different IP other than
>>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to
>>> 127.0.0.1 and it works like a charm.
>>> But hard coding 127.0.0.1 is not a good option because when the jobmanager
>>> ip is changed, this becomes an issue again. I'm thinking of setting
>>> jobmanager ip from the config.yaml to these places.
>>> If you have a better idea on doing this with your experience, please let
>>> me know.
>>> 
>>> Best.
>>> 
>

Re: Could not build up connection to JobManager

Posted by Dulaj Viduranga <vi...@icloud.com>.

Hi,
I found many other places “localhost” is hard coded. I changed them in a better way I think. I made a pull request. Please review. b7da22a <https://github.com/viduranga/flink/commit/b7da22a562d3da5a9be2657308c0f82e4e2f80cd>

> On Mar 4, 2015, at 8:17 PM, Stephan Ewen <se...@apache.org> wrote:
> 
> If I recall correctly, we only hardcode "localhost" in the local mini
> cluster - do you think it is problematic there as well?
> 
> Have you found any other places?
> 
> On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <vi...@icloud.com>
> wrote:
> 
>> In some places of the code, "localhost" is hard coded. When it is resolved
>> by the DNS, it is posible to be directed  to a different IP other than
>> 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to
>> 127.0.0.1 and it works like a charm.
>> But hard coding 127.0.0.1 is not a good option because when the jobmanager
>> ip is changed, this becomes an issue again. I'm thinking of setting
>> jobmanager ip from the config.yaml to these places.
>> If you have a better idea on doing this with your experience, please let
>> me know.
>> 
>> Best.
>>

Re: Could not build up connection to JobManager

Posted by Stephan Ewen <se...@apache.org>.

If I recall correctly, we only hardcode "localhost" in the local mini
cluster - do you think it is problematic there as well?

Have you found any other places?

On Mon, Mar 2, 2015 at 10:26 AM, Dulaj Viduranga <vi...@icloud.com>
wrote:

> In some places of the code, "localhost" is hard coded. When it is resolved
> by the DNS, it is posible to be directed  to a different IP other than
> 127.0.0.1 (like private range 10.0.0.0/8). I changed those places to
> 127.0.0.1 and it works like a charm.
> But hard coding 127.0.0.1 is not a good option because when the jobmanager
> ip is changed, this becomes an issue again. I'm thinking of setting
> jobmanager ip from the config.yaml to these places.
> If you have a better idea on doing this with your experience, please let
> me know.
>
> Best.
>

Re: Could not build up connection to JobManager

Posted by Stephan Ewen <se...@apache.org>.

Wow, great. Can you tell us what the issue was?
Am 02.03.2015 09:31 schrieb "Dulaj Viduranga" <vi...@icloud.com>:

> Hi,
> I found the fix for this issue and I'll create a pull request in the
> following day.
>