You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Axel Dahl <ax...@whisperstream.com> on 2015/08/18 00:36:13 UTC

how do I execute a job on a single worker node in standalone mode

I have a 4 node cluster and have been playing around with the num-executors
parameters, executor-memory and executor-cores

I set the following:
--executor-memory=10G
--num-executors=1
--executor-cores=8

But when I run the job, I see that each worker, is running one executor
which has  2 cores and 2.5G memory.

What I'd like to do instead is have Spark just allocate the job to a single
worker node?

Is that possible in standalone mode or do I need a job/resource scheduler
like Yarn to do that?

Thanks in advance,

-Axel

Re: how do I execute a job on a single worker node in standalone mode

Posted by Andrew Or <an...@databricks.com>.
Hi Axel, what spark version are you using? Also, what do your
configurations look like for the following?

- spark.cores.max (also --total-executor-cores)
- spark.executor.cores (also --executor-cores)


2015-08-19 9:27 GMT-07:00 Axel Dahl <ax...@whisperstream.com>:

> hmm maybe I spoke too soon.
>
> I have an apache zeppelin instance running and have configured it to use
> 48 cores (each node only has 16 cores), so I figured by setting it to 48,
> would mean that spark would grab 3 nodes.  what happens instead though is
> that spark, reports that 48 cores are being used, but only executes
> everything on 1 node, it looks like it's not grabbing the extra nodes.
>
> On Wed, Aug 19, 2015 at 8:43 AM, Axel Dahl <ax...@whisperstream.com> wrote:
>
>> That worked great, thanks Andrew.
>>
>> On Tue, Aug 18, 2015 at 1:39 PM, Andrew Or <an...@databricks.com> wrote:
>>
>>> Hi Axel,
>>>
>>> You can try setting `spark.deploy.spreadOut` to false (through your
>>> conf/spark-defaults.conf file). What this does is essentially try to
>>> schedule as many cores on one worker as possible before spilling over to
>>> other workers. Note that you *must* restart the cluster through the sbin
>>> scripts.
>>>
>>> For more information see:
>>> http://spark.apache.org/docs/latest/spark-standalone.html.
>>>
>>> Feel free to let me know whether it works,
>>> -Andrew
>>>
>>>
>>> 2015-08-18 4:49 GMT-07:00 Igor Berman <ig...@gmail.com>:
>>>
>>>> by default standalone creates 1 executor on every worker machine per
>>>> application
>>>> number of overall cores is configured with --total-executor-cores
>>>> so in general if you'll specify --total-executor-cores=1 then there
>>>> would be only 1 core on some executor and you'll get what you want
>>>>
>>>> on the other hand, if you application needs all cores of your cluster
>>>> and only some specific job should run on single executor there are few
>>>> methods to achieve this
>>>> e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition
>>>>
>>>>
>>>> On 18 August 2015 at 01:36, Axel Dahl <ax...@whisperstream.com> wrote:
>>>>
>>>>> I have a 4 node cluster and have been playing around with the
>>>>> num-executors parameters, executor-memory and executor-cores
>>>>>
>>>>> I set the following:
>>>>> --executor-memory=10G
>>>>> --num-executors=1
>>>>> --executor-cores=8
>>>>>
>>>>> But when I run the job, I see that each worker, is running one
>>>>> executor which has  2 cores and 2.5G memory.
>>>>>
>>>>> What I'd like to do instead is have Spark just allocate the job to a
>>>>> single worker node?
>>>>>
>>>>> Is that possible in standalone mode or do I need a job/resource
>>>>> scheduler like Yarn to do that?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> -Axel
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: how do I execute a job on a single worker node in standalone mode

Posted by Axel Dahl <ax...@whisperstream.com>.
hmm maybe I spoke too soon.

I have an apache zeppelin instance running and have configured it to use 48
cores (each node only has 16 cores), so I figured by setting it to 48,
would mean that spark would grab 3 nodes.  what happens instead though is
that spark, reports that 48 cores are being used, but only executes
everything on 1 node, it looks like it's not grabbing the extra nodes.

On Wed, Aug 19, 2015 at 8:43 AM, Axel Dahl <ax...@whisperstream.com> wrote:

> That worked great, thanks Andrew.
>
> On Tue, Aug 18, 2015 at 1:39 PM, Andrew Or <an...@databricks.com> wrote:
>
>> Hi Axel,
>>
>> You can try setting `spark.deploy.spreadOut` to false (through your
>> conf/spark-defaults.conf file). What this does is essentially try to
>> schedule as many cores on one worker as possible before spilling over to
>> other workers. Note that you *must* restart the cluster through the sbin
>> scripts.
>>
>> For more information see:
>> http://spark.apache.org/docs/latest/spark-standalone.html.
>>
>> Feel free to let me know whether it works,
>> -Andrew
>>
>>
>> 2015-08-18 4:49 GMT-07:00 Igor Berman <ig...@gmail.com>:
>>
>>> by default standalone creates 1 executor on every worker machine per
>>> application
>>> number of overall cores is configured with --total-executor-cores
>>> so in general if you'll specify --total-executor-cores=1 then there
>>> would be only 1 core on some executor and you'll get what you want
>>>
>>> on the other hand, if you application needs all cores of your cluster
>>> and only some specific job should run on single executor there are few
>>> methods to achieve this
>>> e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition
>>>
>>>
>>> On 18 August 2015 at 01:36, Axel Dahl <ax...@whisperstream.com> wrote:
>>>
>>>> I have a 4 node cluster and have been playing around with the
>>>> num-executors parameters, executor-memory and executor-cores
>>>>
>>>> I set the following:
>>>> --executor-memory=10G
>>>> --num-executors=1
>>>> --executor-cores=8
>>>>
>>>> But when I run the job, I see that each worker, is running one executor
>>>> which has  2 cores and 2.5G memory.
>>>>
>>>> What I'd like to do instead is have Spark just allocate the job to a
>>>> single worker node?
>>>>
>>>> Is that possible in standalone mode or do I need a job/resource
>>>> scheduler like Yarn to do that?
>>>>
>>>> Thanks in advance,
>>>>
>>>> -Axel
>>>>
>>>>
>>>>
>>>
>>
>

Re: how do I execute a job on a single worker node in standalone mode

Posted by Axel Dahl <ax...@whisperstream.com>.
That worked great, thanks Andrew.

On Tue, Aug 18, 2015 at 1:39 PM, Andrew Or <an...@databricks.com> wrote:

> Hi Axel,
>
> You can try setting `spark.deploy.spreadOut` to false (through your
> conf/spark-defaults.conf file). What this does is essentially try to
> schedule as many cores on one worker as possible before spilling over to
> other workers. Note that you *must* restart the cluster through the sbin
> scripts.
>
> For more information see:
> http://spark.apache.org/docs/latest/spark-standalone.html.
>
> Feel free to let me know whether it works,
> -Andrew
>
>
> 2015-08-18 4:49 GMT-07:00 Igor Berman <ig...@gmail.com>:
>
>> by default standalone creates 1 executor on every worker machine per
>> application
>> number of overall cores is configured with --total-executor-cores
>> so in general if you'll specify --total-executor-cores=1 then there would
>> be only 1 core on some executor and you'll get what you want
>>
>> on the other hand, if you application needs all cores of your cluster and
>> only some specific job should run on single executor there are few methods
>> to achieve this
>> e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition
>>
>>
>> On 18 August 2015 at 01:36, Axel Dahl <ax...@whisperstream.com> wrote:
>>
>>> I have a 4 node cluster and have been playing around with the
>>> num-executors parameters, executor-memory and executor-cores
>>>
>>> I set the following:
>>> --executor-memory=10G
>>> --num-executors=1
>>> --executor-cores=8
>>>
>>> But when I run the job, I see that each worker, is running one executor
>>> which has  2 cores and 2.5G memory.
>>>
>>> What I'd like to do instead is have Spark just allocate the job to a
>>> single worker node?
>>>
>>> Is that possible in standalone mode or do I need a job/resource
>>> scheduler like Yarn to do that?
>>>
>>> Thanks in advance,
>>>
>>> -Axel
>>>
>>>
>>>
>>
>

Re: how do I execute a job on a single worker node in standalone mode

Posted by Andrew Or <an...@databricks.com>.
Hi Axel,

You can try setting `spark.deploy.spreadOut` to false (through your
conf/spark-defaults.conf file). What this does is essentially try to
schedule as many cores on one worker as possible before spilling over to
other workers. Note that you *must* restart the cluster through the sbin
scripts.

For more information see:
http://spark.apache.org/docs/latest/spark-standalone.html.

Feel free to let me know whether it works,
-Andrew


2015-08-18 4:49 GMT-07:00 Igor Berman <ig...@gmail.com>:

> by default standalone creates 1 executor on every worker machine per
> application
> number of overall cores is configured with --total-executor-cores
> so in general if you'll specify --total-executor-cores=1 then there would
> be only 1 core on some executor and you'll get what you want
>
> on the other hand, if you application needs all cores of your cluster and
> only some specific job should run on single executor there are few methods
> to achieve this
> e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition
>
>
> On 18 August 2015 at 01:36, Axel Dahl <ax...@whisperstream.com> wrote:
>
>> I have a 4 node cluster and have been playing around with the
>> num-executors parameters, executor-memory and executor-cores
>>
>> I set the following:
>> --executor-memory=10G
>> --num-executors=1
>> --executor-cores=8
>>
>> But when I run the job, I see that each worker, is running one executor
>> which has  2 cores and 2.5G memory.
>>
>> What I'd like to do instead is have Spark just allocate the job to a
>> single worker node?
>>
>> Is that possible in standalone mode or do I need a job/resource scheduler
>> like Yarn to do that?
>>
>> Thanks in advance,
>>
>> -Axel
>>
>>
>>
>

Re: how do I execute a job on a single worker node in standalone mode

Posted by Igor Berman <ig...@gmail.com>.
by default standalone creates 1 executor on every worker machine per
application
number of overall cores is configured with --total-executor-cores
so in general if you'll specify --total-executor-cores=1 then there would
be only 1 core on some executor and you'll get what you want

on the other hand, if you application needs all cores of your cluster and
only some specific job should run on single executor there are few methods
to achieve this
e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition


On 18 August 2015 at 01:36, Axel Dahl <ax...@whisperstream.com> wrote:

> I have a 4 node cluster and have been playing around with the
> num-executors parameters, executor-memory and executor-cores
>
> I set the following:
> --executor-memory=10G
> --num-executors=1
> --executor-cores=8
>
> But when I run the job, I see that each worker, is running one executor
> which has  2 cores and 2.5G memory.
>
> What I'd like to do instead is have Spark just allocate the job to a
> single worker node?
>
> Is that possible in standalone mode or do I need a job/resource scheduler
> like Yarn to do that?
>
> Thanks in advance,
>
> -Axel
>
>
>