You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Alan Prando <al...@scanboo.com.br> on 2014/11/18 19:03:33 UTC

Spark on YARN

Hi Folks!

I'm running Spark on YARN cluster installed with Cloudera Manager Express.
The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G
RAM.

My spark's job is working fine, however it seems that just 2 of 3 slaves
are working (htop shows 2 slaves working 100% on 32 cores, and 1 slaves
without any processing).

I'm using this command:
./spark-submit --master yarn --num-executors 3 --executor-cores 32
 --executor-memory 32g feature_extractor.py -r 390

Additionaly, spark's log testify communications with 2 slaves only:
14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
Actor[akka.tcp://sparkExecutor@ip-172-31-13-180.ec2.internal:33177/user/Executor#-113177469]
with ID 1
14/11/18 17:19:38 INFO RackResolver: Resolved ip-172-31-13-180.ec2.internal
to /default
14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
Actor[akka.tcp://sparkExecutor@ip-172-31-13-179.ec2.internal:51859/user/Executor#-323896724]
with ID 2
14/11/18 17:19:38 INFO RackResolver: Resolved ip-172-31-13-179.ec2.internal
to /default
14/11/18 17:19:38 INFO BlockManagerMasterActor: Registering block manager
ip-172-31-13-180.ec2.internal:50959 with 16.6 GB RAM
14/11/18 17:19:39 INFO BlockManagerMasterActor: Registering block manager
ip-172-31-13-179.ec2.internal:53557 with 16.6 GB RAM
14/11/18 17:19:51 INFO YarnClientSchedulerBackend: SchedulerBackend is
ready for scheduling beginning after waiting
maxRegisteredResourcesWaitingTime: 30000(ms)

Is there a configuration to call spark's job on YARN cluster with all
slaves?

Thanks in advance! =]

---
Regards
Alan Vidotti Prando.

Re: Spark on YARN

Posted by Sean Owen <so...@cloudera.com>.
I think your config may be the issue then. It sounds like 1 server is
configured in a different YARN group that states it has way less
resource than it does.

On Wed, Nov 19, 2014 at 5:27 PM, Alan Prando <al...@scanboo.com.br> wrote:
> Hi all!
>
> Thanks for answering!
>
> @Sean, I tried to run with 30 executor-cores , and 1 machine still without
> processing.
> @Vanzin, I checked RM's web UI, and all nodes were detecteds and "RUNNING".
> The interesting fact is that available
> memory and available core of 1 node was different of other 2, with just 1
> available core and 1 available gig ram.
>
> @All, I created a new cluster with 10 slaves and 1 master, and now 9 of my
> slaves are working, and 1 still without processing.
>
> It's fine by me! I'm just wondering why YARN's doing it... Does anyone know
> the answer?
>
> 2014-11-18 16:18 GMT-02:00 Sean Owen <so...@cloudera.com>:
>
>> My guess is you're asking for all cores of all machines but the driver
>> needs at least one core, so one executor is unable to find a machine to fit
>> on.
>>
>> On Nov 18, 2014 7:04 PM, "Alan Prando" <al...@scanboo.com.br> wrote:
>>>
>>> Hi Folks!
>>>
>>> I'm running Spark on YARN cluster installed with Cloudera Manager
>>> Express.
>>> The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G
>>> RAM.
>>>
>>> My spark's job is working fine, however it seems that just 2 of 3 slaves
>>> are working (htop shows 2 slaves working 100% on 32 cores, and 1 slaves
>>> without any processing).
>>>
>>> I'm using this command:
>>> ./spark-submit --master yarn --num-executors 3 --executor-cores 32
>>> --executor-memory 32g feature_extractor.py -r 390
>>>
>>> Additionaly, spark's log testify communications with 2 slaves only:
>>> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
>>> Actor[akka.tcp://sparkExecutor@ip-172-31-13-180.ec2.internal:33177/user/Executor#-113177469]
>>> with ID 1
>>> 14/11/18 17:19:38 INFO RackResolver: Resolved
>>> ip-172-31-13-180.ec2.internal to /default
>>> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
>>> Actor[akka.tcp://sparkExecutor@ip-172-31-13-179.ec2.internal:51859/user/Executor#-323896724]
>>> with ID 2
>>> 14/11/18 17:19:38 INFO RackResolver: Resolved
>>> ip-172-31-13-179.ec2.internal to /default
>>> 14/11/18 17:19:38 INFO BlockManagerMasterActor: Registering block manager
>>> ip-172-31-13-180.ec2.internal:50959 with 16.6 GB RAM
>>> 14/11/18 17:19:39 INFO BlockManagerMasterActor: Registering block manager
>>> ip-172-31-13-179.ec2.internal:53557 with 16.6 GB RAM
>>> 14/11/18 17:19:51 INFO YarnClientSchedulerBackend: SchedulerBackend is
>>> ready for scheduling beginning after waiting
>>> maxRegisteredResourcesWaitingTime: 30000(ms)
>>>
>>> Is there a configuration to call spark's job on YARN cluster with all
>>> slaves?
>>>
>>> Thanks in advance! =]
>>>
>>> ---
>>> Regards
>>> Alan Vidotti Prando.
>>>
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark on YARN

Posted by Alan Prando <al...@scanboo.com.br>.
Hi all!

Thanks for answering!

@Sean, I tried to run with 30 executor-cores , and 1 machine still without
processing.
@Vanzin, I checked RM's web UI, and all nodes were detecteds and "RUNNING".
The interesting fact is that available
memory and available core of 1 node was different of other 2, with just 1
available core and 1 available gig ram.

@All, I created a new cluster with 10 slaves and 1 master, and now 9 of my
slaves are working, and 1 still without processing.

It's fine by me! I'm just wondering why YARN's doing it... Does anyone know
the answer?

2014-11-18 16:18 GMT-02:00 Sean Owen <so...@cloudera.com>:

> My guess is you're asking for all cores of all machines but the driver
> needs at least one core, so one executor is unable to find a machine to fit
> on.
> On Nov 18, 2014 7:04 PM, "Alan Prando" <al...@scanboo.com.br> wrote:
>
>> Hi Folks!
>>
>> I'm running Spark on YARN cluster installed with Cloudera Manager Express.
>> The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G
>> RAM.
>>
>> My spark's job is working fine, however it seems that just 2 of 3 slaves
>> are working (htop shows 2 slaves working 100% on 32 cores, and 1 slaves
>> without any processing).
>>
>> I'm using this command:
>> ./spark-submit --master yarn --num-executors 3 --executor-cores 32
>>  --executor-memory 32g feature_extractor.py -r 390
>>
>> Additionaly, spark's log testify communications with 2 slaves only:
>> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
>> Actor[akka.tcp://sparkExecutor@ip-172-31-13-180.ec2.internal:33177/user/Executor#-113177469]
>> with ID 1
>> 14/11/18 17:19:38 INFO RackResolver: Resolved
>> ip-172-31-13-180.ec2.internal to /default
>> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
>> Actor[akka.tcp://sparkExecutor@ip-172-31-13-179.ec2.internal:51859/user/Executor#-323896724]
>> with ID 2
>> 14/11/18 17:19:38 INFO RackResolver: Resolved
>> ip-172-31-13-179.ec2.internal to /default
>> 14/11/18 17:19:38 INFO BlockManagerMasterActor: Registering block manager
>> ip-172-31-13-180.ec2.internal:50959 with 16.6 GB RAM
>> 14/11/18 17:19:39 INFO BlockManagerMasterActor: Registering block manager
>> ip-172-31-13-179.ec2.internal:53557 with 16.6 GB RAM
>> 14/11/18 17:19:51 INFO YarnClientSchedulerBackend: SchedulerBackend is
>> ready for scheduling beginning after waiting
>> maxRegisteredResourcesWaitingTime: 30000(ms)
>>
>> Is there a configuration to call spark's job on YARN cluster with all
>> slaves?
>>
>> Thanks in advance! =]
>>
>> ---
>> Regards
>> Alan Vidotti Prando.
>>
>>
>>

Re: Spark on YARN

Posted by Sean Owen <so...@cloudera.com>.
My guess is you're asking for all cores of all machines but the driver
needs at least one core, so one executor is unable to find a machine to fit
on.
On Nov 18, 2014 7:04 PM, "Alan Prando" <al...@scanboo.com.br> wrote:

> Hi Folks!
>
> I'm running Spark on YARN cluster installed with Cloudera Manager Express.
> The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G
> RAM.
>
> My spark's job is working fine, however it seems that just 2 of 3 slaves
> are working (htop shows 2 slaves working 100% on 32 cores, and 1 slaves
> without any processing).
>
> I'm using this command:
> ./spark-submit --master yarn --num-executors 3 --executor-cores 32
>  --executor-memory 32g feature_extractor.py -r 390
>
> Additionaly, spark's log testify communications with 2 slaves only:
> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
> Actor[akka.tcp://sparkExecutor@ip-172-31-13-180.ec2.internal:33177/user/Executor#-113177469]
> with ID 1
> 14/11/18 17:19:38 INFO RackResolver: Resolved
> ip-172-31-13-180.ec2.internal to /default
> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
> Actor[akka.tcp://sparkExecutor@ip-172-31-13-179.ec2.internal:51859/user/Executor#-323896724]
> with ID 2
> 14/11/18 17:19:38 INFO RackResolver: Resolved
> ip-172-31-13-179.ec2.internal to /default
> 14/11/18 17:19:38 INFO BlockManagerMasterActor: Registering block manager
> ip-172-31-13-180.ec2.internal:50959 with 16.6 GB RAM
> 14/11/18 17:19:39 INFO BlockManagerMasterActor: Registering block manager
> ip-172-31-13-179.ec2.internal:53557 with 16.6 GB RAM
> 14/11/18 17:19:51 INFO YarnClientSchedulerBackend: SchedulerBackend is
> ready for scheduling beginning after waiting
> maxRegisteredResourcesWaitingTime: 30000(ms)
>
> Is there a configuration to call spark's job on YARN cluster with all
> slaves?
>
> Thanks in advance! =]
>
> ---
> Regards
> Alan Vidotti Prando.
>
>
>

Re: Spark on YARN

Posted by Marcelo Vanzin <va...@cloudera.com>.
Can you check in your RM's web UI how much of each resource does Yarn
think you have available? You can also check that in the Yarn
configuration directly.

Perhaps it's not configured to use all of the available resources. (If
it was set up with Cloudera Manager, CM will reserve some room for
daemons that need to run on each machine, so it won't tell Yarn to
make all 32 cores / 64 GB available for applications.)

Also remember that Spark needs to start "num executors + 1" containers
when adding up all the needed resources. The extra container generally
requires less resources than the executors, but it still needs to
allocate resources from the RM.



On Tue, Nov 18, 2014 at 10:03 AM, Alan Prando <al...@scanboo.com.br> wrote:
> Hi Folks!
>
> I'm running Spark on YARN cluster installed with Cloudera Manager Express.
> The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G
> RAM.
>
> My spark's job is working fine, however it seems that just 2 of 3 slaves are
> working (htop shows 2 slaves working 100% on 32 cores, and 1 slaves without
> any processing).
>
> I'm using this command:
> ./spark-submit --master yarn --num-executors 3 --executor-cores 32
> --executor-memory 32g feature_extractor.py -r 390
>
> Additionaly, spark's log testify communications with 2 slaves only:
> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
> Actor[akka.tcp://sparkExecutor@ip-172-31-13-180.ec2.internal:33177/user/Executor#-113177469]
> with ID 1
> 14/11/18 17:19:38 INFO RackResolver: Resolved ip-172-31-13-180.ec2.internal
> to /default
> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
> Actor[akka.tcp://sparkExecutor@ip-172-31-13-179.ec2.internal:51859/user/Executor#-323896724]
> with ID 2
> 14/11/18 17:19:38 INFO RackResolver: Resolved ip-172-31-13-179.ec2.internal
> to /default
> 14/11/18 17:19:38 INFO BlockManagerMasterActor: Registering block manager
> ip-172-31-13-180.ec2.internal:50959 with 16.6 GB RAM
> 14/11/18 17:19:39 INFO BlockManagerMasterActor: Registering block manager
> ip-172-31-13-179.ec2.internal:53557 with 16.6 GB RAM
> 14/11/18 17:19:51 INFO YarnClientSchedulerBackend: SchedulerBackend is ready
> for scheduling beginning after waiting maxRegisteredResourcesWaitingTime:
> 30000(ms)
>
> Is there a configuration to call spark's job on YARN cluster with all
> slaves?
>
> Thanks in advance! =]
>
> ---
> Regards
> Alan Vidotti Prando.
>
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark on YARN

Posted by Sandy Ryza <sa...@cloudera.com>.
Hey Alan,

Spark's application master will take up 1 core on one of the nodes on the
cluster.  This means that that node will only have 31 cores remaining, not
enough to fit your third executor.

-Sandy

On Tue, Nov 18, 2014 at 10:03 AM, Alan Prando <al...@scanboo.com.br> wrote:

> Hi Folks!
>
> I'm running Spark on YARN cluster installed with Cloudera Manager Express.
> The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G
> RAM.
>
> My spark's job is working fine, however it seems that just 2 of 3 slaves
> are working (htop shows 2 slaves working 100% on 32 cores, and 1 slaves
> without any processing).
>
> I'm using this command:
> ./spark-submit --master yarn --num-executors 3 --executor-cores 32
>  --executor-memory 32g feature_extractor.py -r 390
>
> Additionaly, spark's log testify communications with 2 slaves only:
> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
> Actor[akka.tcp://sparkExecutor@ip-172-31-13-180.ec2.internal:33177/user/Executor#-113177469]
> with ID 1
> 14/11/18 17:19:38 INFO RackResolver: Resolved
> ip-172-31-13-180.ec2.internal to /default
> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
> Actor[akka.tcp://sparkExecutor@ip-172-31-13-179.ec2.internal:51859/user/Executor#-323896724]
> with ID 2
> 14/11/18 17:19:38 INFO RackResolver: Resolved
> ip-172-31-13-179.ec2.internal to /default
> 14/11/18 17:19:38 INFO BlockManagerMasterActor: Registering block manager
> ip-172-31-13-180.ec2.internal:50959 with 16.6 GB RAM
> 14/11/18 17:19:39 INFO BlockManagerMasterActor: Registering block manager
> ip-172-31-13-179.ec2.internal:53557 with 16.6 GB RAM
> 14/11/18 17:19:51 INFO YarnClientSchedulerBackend: SchedulerBackend is
> ready for scheduling beginning after waiting
> maxRegisteredResourcesWaitingTime: 30000(ms)
>
> Is there a configuration to call spark's job on YARN cluster with all
> slaves?
>
> Thanks in advance! =]
>
> ---
> Regards
> Alan Vidotti Prando.
>
>
>

Re: Spark on YARN

Posted by Debasish Das <de...@gmail.com>.
I run my Spark on YARN jobs as:

HADOOP_CONF_DIR=/etc/hadoop/conf/ /app/data/v606014/dist/bin/spark-submit
--master yarn --jars test-job.jar --executor-cores 4 --num-executors 10
--executor-memory 16g --driver-memory 4g --class TestClass test.jar

It uses HADOOP_CONF_DIR to schedule executors and I get the number I ask
for (assuming other MapReduce jobs are not taking the cluster)...

Large memory intensive jobs like ALS still get issues on YARN but simple
jobs run fine...

Mine is also internal CDH cluster...

On Tue, Nov 18, 2014 at 10:03 AM, Alan Prando <al...@scanboo.com.br> wrote:

> Hi Folks!
>
> I'm running Spark on YARN cluster installed with Cloudera Manager Express.
> The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G
> RAM.
>
> My spark's job is working fine, however it seems that just 2 of 3 slaves
> are working (htop shows 2 slaves working 100% on 32 cores, and 1 slaves
> without any processing).
>
> I'm using this command:
> ./spark-submit --master yarn --num-executors 3 --executor-cores 32
>  --executor-memory 32g feature_extractor.py -r 390
>
> Additionaly, spark's log testify communications with 2 slaves only:
> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
> Actor[akka.tcp://sparkExecutor@ip-172-31-13-180.ec2.internal:33177/user/Executor#-113177469]
> with ID 1
> 14/11/18 17:19:38 INFO RackResolver: Resolved
> ip-172-31-13-180.ec2.internal to /default
> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
> Actor[akka.tcp://sparkExecutor@ip-172-31-13-179.ec2.internal:51859/user/Executor#-323896724]
> with ID 2
> 14/11/18 17:19:38 INFO RackResolver: Resolved
> ip-172-31-13-179.ec2.internal to /default
> 14/11/18 17:19:38 INFO BlockManagerMasterActor: Registering block manager
> ip-172-31-13-180.ec2.internal:50959 with 16.6 GB RAM
> 14/11/18 17:19:39 INFO BlockManagerMasterActor: Registering block manager
> ip-172-31-13-179.ec2.internal:53557 with 16.6 GB RAM
> 14/11/18 17:19:51 INFO YarnClientSchedulerBackend: SchedulerBackend is
> ready for scheduling beginning after waiting
> maxRegisteredResourcesWaitingTime: 30000(ms)
>
> Is there a configuration to call spark's job on YARN cluster with all
> slaves?
>
> Thanks in advance! =]
>
> ---
> Regards
> Alan Vidotti Prando.
>
>
>