You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Liu, Raymond" <ra...@intel.com> on 2013/11/01 08:14:46 UTC

Executor could not connect to Driver?

Hi

I am encounter an issue that the executor actor could not connect to Driver actor. But I could not figure out what's the reason.

Say the Driver actor is listening on :35838

root@sr434:~# netstat -lpv
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 *:50075                 *:*                     LISTEN      18242/java
tcp        0      0 *:50020                 *:*                     LISTEN      18242/java
tcp        0      0 *:ssh                   *:*                     LISTEN      1325/sshd
tcp        0      0 *:50010                 *:*                     LISTEN      18242/java
tcp6       0      0 sr434:35838             [::]:*                  LISTEN      9420/java
tcp6       0      0 [::]:40390              [::]:*                  LISTEN      9420/java
tcp6       0      0 [::]:4040               [::]:*                  LISTEN      9420/java
tcp6       0      0 [::]:8040               [::]:*                  LISTEN      28324/java
tcp6       0      0 [::]:60712              [::]:*                  LISTEN      28324/java
tcp6       0      0 [::]:8042               [::]:*                  LISTEN      28324/java
tcp6       0      0 [::]:34028              [::]:*                  LISTEN      9420/java
tcp6       0      0 [::]:ssh                [::]:*                  LISTEN      1325/sshd
tcp6       0      0 [::]:45528              [::]:*                  LISTEN      9420/java
tcp6       0      0 [::]:13562              [::]:*                  LISTEN      28324/java


while the executor driver report errors as below :

13/11/01 13:16:43 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka://spark@sr434:35838/user/CoarseGrainedScheduler
13/11/01 13:16:43 ERROR executor.CoarseGrainedExecutorBackend: Driver terminated or disconnected! Shutting down.

Any idea?

Best Regards,
Raymond Liu

RE: Executor could not connect to Driver?

Posted by "Liu, Raymond" <ra...@intel.com>.
Thanks, my case seems not caused by GC, cpu is pretty low and both YGC and FGC seems behavior quite normal. Hmm, weird.

Best Regards,
Raymond Liu

From: Aaron Davidson [mailto:ilikerps@gmail.com]
Sent: Saturday, November 02, 2013 12:07 AM
To: user@spark.incubator.apache.org
Subject: Re: Executor could not connect to Driver?

I've seen this happen before due to the driver doing long GCs when the driver machine was heavily memory-constrained. For this particular issue, simply freeing up memory used by other applications fixed the problem.

On Fri, Nov 1, 2013 at 12:14 AM, Liu, Raymond <ra...@intel.com>> wrote:
Hi

I am encounter an issue that the executor actor could not connect to Driver actor. But I could not figure out what's the reason.

Say the Driver actor is listening on :35838

root@sr434:~# netstat -lpv
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 *:50075                 *:*                     LISTEN      18242/java
tcp        0      0 *:50020                 *:*                     LISTEN      18242/java
tcp        0      0 *:ssh                   *:*                     LISTEN      1325/sshd
tcp        0      0 *:50010                 *:*                     LISTEN      18242/java
tcp6       0      0 sr434:35838             [::]:*                  LISTEN      9420/java
tcp6       0      0 [::]:40390              [::]:*                  LISTEN      9420/java
tcp6       0      0 [::]:4040               [::]:*                  LISTEN      9420/java
tcp6       0      0 [::]:8040               [::]:*                  LISTEN      28324/java
tcp6       0      0 [::]:60712              [::]:*                  LISTEN      28324/java
tcp6       0      0 [::]:8042               [::]:*                  LISTEN      28324/java
tcp6       0      0 [::]:34028              [::]:*                  LISTEN      9420/java
tcp6       0      0 [::]:ssh                [::]:*                  LISTEN      1325/sshd
tcp6       0      0 [::]:45528              [::]:*                  LISTEN      9420/java
tcp6       0      0 [::]:13562              [::]:*                  LISTEN      28324/java


while the executor driver report errors as below :

13/11/01 13:16:43 INFO executor.CoarseGrainedExecutorBackend: Connecting to driver: akka://spark@sr434:35838/user/CoarseGrainedScheduler
13/11/01 13:16:43 ERROR executor.CoarseGrainedExecutorBackend: Driver terminated or disconnected! Shutting down.

Any idea?

Best Regards,
Raymond Liu


Re: Executor could not connect to Driver?

Posted by Aaron Davidson <il...@gmail.com>.
I've seen this happen before due to the driver doing long GCs when the
driver machine was heavily memory-constrained. For this particular issue,
simply freeing up memory used by other applications fixed the problem.


On Fri, Nov 1, 2013 at 12:14 AM, Liu, Raymond <ra...@intel.com> wrote:

> Hi
>
> I am encounter an issue that the executor actor could not connect to
> Driver actor. But I could not figure out what's the reason.
>
> Say the Driver actor is listening on :35838
>
> root@sr434:~# netstat -lpv
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
>       PID/Program name
> tcp        0      0 *:50075                 *:*                     LISTEN
>      18242/java
> tcp        0      0 *:50020                 *:*                     LISTEN
>      18242/java
> tcp        0      0 *:ssh                   *:*                     LISTEN
>      1325/sshd
> tcp        0      0 *:50010                 *:*                     LISTEN
>      18242/java
> tcp6       0      0 sr434:35838             [::]:*                  LISTEN
>      9420/java
> tcp6       0      0 [::]:40390              [::]:*                  LISTEN
>      9420/java
> tcp6       0      0 [::]:4040               [::]:*                  LISTEN
>      9420/java
> tcp6       0      0 [::]:8040               [::]:*                  LISTEN
>      28324/java
> tcp6       0      0 [::]:60712              [::]:*                  LISTEN
>      28324/java
> tcp6       0      0 [::]:8042               [::]:*                  LISTEN
>      28324/java
> tcp6       0      0 [::]:34028              [::]:*                  LISTEN
>      9420/java
> tcp6       0      0 [::]:ssh                [::]:*                  LISTEN
>      1325/sshd
> tcp6       0      0 [::]:45528              [::]:*                  LISTEN
>      9420/java
> tcp6       0      0 [::]:13562              [::]:*                  LISTEN
>      28324/java
>
>
> while the executor driver report errors as below :
>
> 13/11/01 13:16:43 INFO executor.CoarseGrainedExecutorBackend: Connecting
> to driver: akka://spark@sr434:35838/user/CoarseGrainedScheduler
> 13/11/01 13:16:43 ERROR executor.CoarseGrainedExecutorBackend: Driver
> terminated or disconnected! Shutting down.
>
> Any idea?
>
> Best Regards,
> Raymond Liu
>