You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Surendranauth Hiraman <su...@velos.io> on 2014/04/07 19:10:12 UTC

PySpark SocketConnect Issue in Cluster

Hi,

We have a situation where a Pyspark script works fine as a local process
("local" url) on the Master and the Worker nodes, which would indicate that
all python dependencies are set up properly on each machine.

But when we try to run the script at the cluster level (using the master's
url), if fails partway through the flow on a GroupBy with a SocketConnect
error and python crashes.

This is on ec2 using the AMI. This doesn't seem to be an issue of the
master not seeing the workers, since they show up in the web ui.

Also, we can see the job running on the cluster until it reaches the
GroupBy transform step, which is when we get the SocketConnect error.

Any ideas?

-Suren


SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: suren.hiraman@v <su...@sociocast.com>elos.io
W: www.velos.io

Re: PySpark SocketConnect Issue in Cluster

Posted by Surendranauth Hiraman <su...@velos.io>.
This appears to be an issue around using pandas. Even if we just
instantiate a dataframe and do nothing with it, the python worker process
is exiting. But if we remove any pandas references, the same job runs to
completion.

Has anyone run into this before?

-Suren



On Mon, Apr 7, 2014 at 1:10 PM, Surendranauth Hiraman <
suren.hiraman@velos.io> wrote:

> Hi,
>
> We have a situation where a Pyspark script works fine as a local process
> ("local" url) on the Master and the Worker nodes, which would indicate that
> all python dependencies are set up properly on each machine.
>
> But when we try to run the script at the cluster level (using the master's
> url), if fails partway through the flow on a GroupBy with a SocketConnect
> error and python crashes.
>
> This is on ec2 using the AMI. This doesn't seem to be an issue of the
> master not seeing the workers, since they show up in the web ui.
>
> Also, we can see the job running on the cluster until it reaches the
> GroupBy transform step, which is when we get the SocketConnect error.
>
> Any ideas?
>
> -Suren
>
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: (917) 525-2466 ext. 105
> F: 646.349.4063
> E: suren.hiraman@v <su...@sociocast.com>elos.io
> W: www.velos.io
>
>


-- 

SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: suren.hiraman@v <su...@sociocast.com>elos.io
W: www.velos.io