You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Aljoscha Krettek <al...@apache.org> on 2018/03/16 17:41:44 UTC

Re: Yarn deployment takes long on some networks

Hi Gyula,

Is there any news on this?

@Nico or @Gary you recently also did stuff with YARN, do you maybe have an idea of what could be going on?

Best,
Aljoscha

> On 21. Nov 2017, at 06:42, Gyula Fóra <gy...@gmail.com> wrote:
> 
> Hi all!
> 
> Today we started noticing that deploying our jobs took over 3 minutes when
> deployed from some machine and normal (few seconds) when deployed from the
> others.
> 
> Looking at the logs it seems that the client cant find some job id for a
> few minutes in this case:
> 
> ...
> 2017-11-21 15:23:00,880 DEBUG org.apache.flink.yarn.YarnJobManager
>                - Job with ID 179d67bfab7c4c0b9f00ea772f6e4f0c not found in
> JobManager
> 2017-11-21 15:23:04,528 DEBUG org.apache.zookeeper.ClientCnxn
>                 - Got ping response for sessionid: 0x25eb8e005b7971b after
> 0ms
> 2017-11-21 15:23:04,636 DEBUG org.apache.hadoop.ipc.Client
>                - IPC Client (937277082) connection to
> splat13.sto.midasplayer.com/172.26.87.155:8030 from splat sending #38
> 2017-11-21 15:23:04,636 DEBUG org.apache.hadoop.ipc.Client
>                - IPC Client (937277082) connection to
> splat13.sto.midasplayer.com/172.26.87.155:8030 from splat got value #38
> 2017-11-21 15:23:04,651 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
>                 - Call: allocate took 16ms
> 2017-11-21 15:23:05,880 DEBUG org.apache.flink.yarn.YarnJobManager
>                - Job with ID 179d67bfab7c4c0b9f00ea772f6e4f0c not found in
> JobManager
> 2017-11-21 15:23:06,409 DEBUG akka.remote.RemoteWatcher
>                 - Sending Heartbeat to [akka.tcp://
> flink@splat33.sto.midasplayer.com:56045]
> 2017-11-21 15:23:06,413 DEBUG akka.remote.RemoteWatcher
>                 - Received heartbeat rsp from [akka.tcp://
> flink@splat33.sto.midasplayer.com:56045]
> 2017-11-21 15:23:07,665 DEBUG
> akka.serialization.Serialization(akka://flink)                - Using
> serializer[akka.serialization.JavaSerializer] for message
> [org.apache.flink.runtime.clusterframework.messages.GetClusterStatusResponse]
> 2017-11-21 15:23:07,824 INFO  org.apache.flink.yarn.YarnJobManager
>                - Submitting job 179d67bfab7c4c0b9f00ea772f6e4f0c
> (event-bifrost-log).
> 2017
> 
> Interestingly enough nothing like this shows when deployed from other
> servers.
> We suspect there might be some strange network issue (which doesnt seem to
> affect jar upload times) that screws with akka in some way.
> 
> Any idea how to debug this?
> Thank you!
> 
> Gyula

Re: Yarn deployment takes long on some networks

Posted by Gyula Fóra <gy...@gmail.com>.

Hi!
Sorry for not following up on this, turned out some ports were blocked by
some random firewall change. So no issue on Flinks side.

Gyula

On Fri, Mar 16, 2018, 17:41 Aljoscha Krettek <al...@apache.org> wrote:

> Hi Gyula,
>
> Is there any news on this?
>
> @Nico or @Gary you recently also did stuff with YARN, do you maybe have an
> idea of what could be going on?
>
> Best,
> Aljoscha
>
> > On 21. Nov 2017, at 06:42, Gyula Fóra <gy...@gmail.com> wrote:
> >
> > Hi all!
> >
> > Today we started noticing that deploying our jobs took over 3 minutes
> when
> > deployed from some machine and normal (few seconds) when deployed from
> the
> > others.
> >
> > Looking at the logs it seems that the client cant find some job id for a
> > few minutes in this case:
> >
> > ...
> > 2017-11-21 15:23:00,880 DEBUG org.apache.flink.yarn.YarnJobManager
> >                - Job with ID 179d67bfab7c4c0b9f00ea772f6e4f0c not found
> in
> > JobManager
> > 2017-11-21 15:23:04,528 DEBUG org.apache.zookeeper.ClientCnxn
> >                 - Got ping response for sessionid: 0x25eb8e005b7971b
> after
> > 0ms
> > 2017-11-21 15:23:04,636 DEBUG org.apache.hadoop.ipc.Client
> >                - IPC Client (937277082) connection to
> > splat13.sto.midasplayer.com/172.26.87.155:8030 from splat sending #38
> > 2017-11-21 15:23:04,636 DEBUG org.apache.hadoop.ipc.Client
> >                - IPC Client (937277082) connection to
> > splat13.sto.midasplayer.com/172.26.87.155:8030 from splat got value #38
> > 2017-11-21 15:23:04,651 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine
> >                 - Call: allocate took 16ms
> > 2017-11-21 15:23:05,880 DEBUG org.apache.flink.yarn.YarnJobManager
> >                - Job with ID 179d67bfab7c4c0b9f00ea772f6e4f0c not found
> in
> > JobManager
> > 2017-11-21 15:23:06,409 DEBUG akka.remote.RemoteWatcher
> >                 - Sending Heartbeat to [akka.tcp://
> > flink@splat33.sto.midasplayer.com:56045]
> > 2017-11-21 15:23:06,413 DEBUG akka.remote.RemoteWatcher
> >                 - Received heartbeat rsp from [akka.tcp://
> > flink@splat33.sto.midasplayer.com:56045]
> > 2017-11-21 15:23:07,665 DEBUG
> > akka.serialization.Serialization(akka://flink)                - Using
> > serializer[akka.serialization.JavaSerializer] for message
> >
> [org.apache.flink.runtime.clusterframework.messages.GetClusterStatusResponse]
> > 2017-11-21 15:23:07,824 INFO  org.apache.flink.yarn.YarnJobManager
> >                - Submitting job 179d67bfab7c4c0b9f00ea772f6e4f0c
> > (event-bifrost-log).
> > 2017
> >
> > Interestingly enough nothing like this shows when deployed from other
> > servers.
> > We suspect there might be some strange network issue (which doesnt seem
> to
> > affect jar upload times) that screws with akka in some way.
> >
> > Any idea how to debug this?
> > Thank you!
> >
> > Gyula
>
>