You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Matt Cheah <mc...@palantir.com> on 2013/11/19 02:00:24 UTC

EC2 node submit jobs to separate Spark Cluster

Hi,

I'm working with an infrastructure that already has its own web server set up on EC2. I would like to set up a separate spark cluster on EC2 with the scripts and have the web server submit jobs to this spark cluster.

Is it possible to do this? I'm getting some errors running the spark shell from the spark shell on the web server: "Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory". I have heard that it's not possible for any local computer to connect to the spark cluster, but I was wondering if other EC2 nodes could have their firewalls configured to allow this.

We don't want to deploy the web server on the master node of the spark cluster.

Thanks,

-Matt Cheah



Re: EC2 node submit jobs to separate Spark Cluster

Posted by Aaron Davidson <il...@gmail.com>.
I think the driver binds to a random port by default, but this can be
changed using the "spark.driver.port" system property. So you should be
able to set that property and open only that port. See:
http://spark.incubator.apache.org/docs/latest/configuration.html.


On Tue, Nov 19, 2013 at 10:16 AM, Matt Cheah <mc...@palantir.com> wrote:

>  I determined that it is a firewall issue. Allowing "all traffic" to the
> cluster where the shell was running hack-fixed the problem.
>
>  That being said, what ports do I have to open to allow the spark master
> to communicate back to the driver? I've heard this is required. And
> obviously allowing all traffic is bad…
>
>  -Matt Cheah
>
>   From: Aaron Davidson <il...@gmail.com>
> Reply-To: "user@spark.incubator.apache.org" <
> user@spark.incubator.apache.org>
> Date: Monday, November 18, 2013 8:28 PM
> To: "user@spark.incubator.apache.org" <us...@spark.incubator.apache.org>
> Subject: Re: EC2 node submit jobs to separate Spark Cluster
>
>   The main issue with running a spark-shell locally is that it
> orchestrates the actual computation, so you want it to be "close" to the
> actual Worker nodes for latency reasons. Running a spark-shell on EC2 in
> the same region as the Spark cluster avoids this problem.
>
>  The error you're seeing seems to indicate a different issue. Check the
> Master web UI (accessible on port 8080 at the master's IP address) to make
> sure that Workers are successfully registered and they have the expected
> amount of memory available to Spark. You can also check to see how much
> memory your spark-shell is trying to get per executor. A couple common
> problems are (1) an abandoned spark-shell is holding onto all of your
> cluster's resources or (2) you've manually configured your spark-shell to
> try to get more memory than your Workers have available. Both of these
> should be visible in the web UI.
>
>
> On Mon, Nov 18, 2013 at 5:00 PM, Matt Cheah <mc...@palantir.com> wrote:
>
>>  Hi,
>>
>>  I'm working with an infrastructure that already has its own web server
>> set up on EC2. I would like to set up a *separate* spark cluster on EC2
>> with the scripts and have the web server submit jobs to this spark cluster.
>>
>>  Is it possible to do this? I'm getting some errors running the spark
>> shell from the spark shell on the web server: "Initial job has not accepted
>> any resources; check your cluster UI to ensure that workers are registered
>> and have sufficient memory". I have heard that it's not possible for any
>> local computer to connect to the spark cluster, but I was wondering if
>> other EC2 nodes could have their firewalls configured to allow this.
>>
>>  We don't want to deploy the web server on the master node of the spark
>> cluster.
>>
>>  Thanks,
>>
>>  -Matt Cheah
>>
>>
>>
>

Re: EC2 node submit jobs to separate Spark Cluster

Posted by Matt Cheah <mc...@palantir.com>.
I determined that it is a firewall issue. Allowing "all traffic" to the cluster where the shell was running hack-fixed the problem.

That being said, what ports do I have to open to allow the spark master to communicate back to the driver? I've heard this is required. And obviously allowing all traffic is bad…

-Matt Cheah

From: Aaron Davidson <il...@gmail.com>>
Reply-To: "user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>" <us...@spark.incubator.apache.org>>
Date: Monday, November 18, 2013 8:28 PM
To: "user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>" <us...@spark.incubator.apache.org>>
Subject: Re: EC2 node submit jobs to separate Spark Cluster

The main issue with running a spark-shell locally is that it orchestrates the actual computation, so you want it to be "close" to the actual Worker nodes for latency reasons. Running a spark-shell on EC2 in the same region as the Spark cluster avoids this problem.

The error you're seeing seems to indicate a different issue. Check the Master web UI (accessible on port 8080 at the master's IP address) to make sure that Workers are successfully registered and they have the expected amount of memory available to Spark. You can also check to see how much memory your spark-shell is trying to get per executor. A couple common problems are (1) an abandoned spark-shell is holding onto all of your cluster's resources or (2) you've manually configured your spark-shell to try to get more memory than your Workers have available. Both of these should be visible in the web UI.


On Mon, Nov 18, 2013 at 5:00 PM, Matt Cheah <mc...@palantir.com>> wrote:
Hi,

I'm working with an infrastructure that already has its own web server set up on EC2. I would like to set up a separate spark cluster on EC2 with the scripts and have the web server submit jobs to this spark cluster.

Is it possible to do this? I'm getting some errors running the spark shell from the spark shell on the web server: "Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory". I have heard that it's not possible for any local computer to connect to the spark cluster, but I was wondering if other EC2 nodes could have their firewalls configured to allow this.

We don't want to deploy the web server on the master node of the spark cluster.

Thanks,

-Matt Cheah




Re: EC2 node submit jobs to separate Spark Cluster

Posted by Aaron Davidson <il...@gmail.com>.
The main issue with running a spark-shell locally is that it orchestrates
the actual computation, so you want it to be "close" to the actual Worker
nodes for latency reasons. Running a spark-shell on EC2 in the same region
as the Spark cluster avoids this problem.

The error you're seeing seems to indicate a different issue. Check the
Master web UI (accessible on port 8080 at the master's IP address) to make
sure that Workers are successfully registered and they have the expected
amount of memory available to Spark. You can also check to see how much
memory your spark-shell is trying to get per executor. A couple common
problems are (1) an abandoned spark-shell is holding onto all of your
cluster's resources or (2) you've manually configured your spark-shell to
try to get more memory than your Workers have available. Both of these
should be visible in the web UI.


On Mon, Nov 18, 2013 at 5:00 PM, Matt Cheah <mc...@palantir.com> wrote:

>  Hi,
>
>  I'm working with an infrastructure that already has its own web server
> set up on EC2. I would like to set up a *separate* spark cluster on EC2
> with the scripts and have the web server submit jobs to this spark cluster.
>
>  Is it possible to do this? I'm getting some errors running the spark
> shell from the spark shell on the web server: "Initial job has not accepted
> any resources; check your cluster UI to ensure that workers are registered
> and have sufficient memory". I have heard that it's not possible for any
> local computer to connect to the spark cluster, but I was wondering if
> other EC2 nodes could have their firewalls configured to allow this.
>
>  We don't want to deploy the web server on the master node of the spark
> cluster.
>
>  Thanks,
>
>  -Matt Cheah
>
>
>