You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Shushant Arora <sh...@gmail.com> on 2015/07/14 18:57:03 UTC

spark on yarn

I am running spark application on yarn managed cluster.

When I specify --executor-cores > 4 it fails to start the application.
I am starting the app as

spark-submit --class classname --num-executors 10 --executor-cores
5 --master masteradd jarname

Exception in thread "main" org.apache.spark.SparkException: Yarn
application has already ended! It might have been killed or unable to
launch application master.

When I give --executor-cores as 4 , it works fine.

My Cluster has 10 nodes .
Why am I not able to specify more than 4 concurrent tasks. Is there any max
limit yarn side or spark side which I can override to make use of more
tasks ?

Re: spark on yarn

Posted by Marcelo Vanzin <va...@cloudera.com>.
On Tue, Jul 14, 2015 at 11:13 AM, Shushant Arora <sh...@gmail.com>
wrote:

> spark-submit --class classname --num-executors 10 --executor-cores 4
> --master masteradd jarname
>
> Will it allocate 10 containers throughout the life of streaming
> application on same nodes until any node failure happens and
>

It will allocate 10 containers somewhere in the cluster (wherever YARN
tells the application to). If a container dies (not necessarily because of
node failure), Spark will start a new one, which may start somewhere else.

And these 10 containers will be released only at end of streaming
> application never in between if none of them fails.
>

Correct. If you don't want that behavior you should look at enabling
dynamic allocation in Spark (see docs).

-- 
Marcelo

Re: spark on yarn

Posted by Shushant Arora <sh...@gmail.com>.
Ok thanks a lot!

few more doubts :
What happens in a streaming application say with

spark-submit --class classname --num-executors 10 --executor-cores 4
--master masteradd jarname

Will it allocate 10 containers throughout the life of streaming application
on same nodes until any node failure happens and
will just allocate tasks/cores at start of each job(or action) in each
batch interval and which it can spawn at max of 40 -say 1 is fixed for
driver/Application master and say I have a receiver based stream
application with 5 receivers , then left with 40-6=34 max cores in 10 fixed
containers .

And these 10 containers will be released only at end of streaming
application never in between if none of them fails.



On Tue, Jul 14, 2015 at 11:32 PM, Marcelo Vanzin <va...@cloudera.com>
wrote:

> On Tue, Jul 14, 2015 at 10:55 AM, Shushant Arora <
> shushantarora09@gmail.com> wrote:
>
>> Is yarn.scheduler.maximum-allocation-vcores the setting for max vcores
>> per container?
>>
>
> I don't remember YARN config names by heart, but that sounds promising.
> I'd look at the YARN documentation for details.
>
>
>> Whats the setting for max limit of --num-executors ?
>>
>
> There's no setting for that. The max number of executors you can run is
> based on the resources available in the YARN cluster. For example, for 10
> executors, you'll need enough resources to start 10 processes with the
> number of cores and amount of memory you requested (plus 1 core and some
> memory for the Application Master).
>
>
> --
> Marcelo
>

Re: spark on yarn

Posted by Marcelo Vanzin <va...@cloudera.com>.
On Tue, Jul 14, 2015 at 10:55 AM, Shushant Arora <sh...@gmail.com>
wrote:

> Is yarn.scheduler.maximum-allocation-vcores the setting for max vcores per
> container?
>

I don't remember YARN config names by heart, but that sounds promising. I'd
look at the YARN documentation for details.


> Whats the setting for max limit of --num-executors ?
>

There's no setting for that. The max number of executors you can run is
based on the resources available in the YARN cluster. For example, for 10
executors, you'll need enough resources to start 10 processes with the
number of cores and amount of memory you requested (plus 1 core and some
memory for the Application Master).


-- 
Marcelo

Re: spark on yarn

Posted by Ted Yu <yu...@gmail.com>.
Please see YARN-193 where 'yarn.scheduler.maximum-allocation-vcores' was
introduced.

See also YARN-3823 which changed default value.

Cheers

On Tue, Jul 14, 2015 at 10:55 AM, Shushant Arora <sh...@gmail.com>
wrote:

> Is yarn.scheduler.maximum-allocation-vcores the setting for max vcores per
> container?
>
> Whats the setting for max limit of --num-executors ?
>
> On Tue, Jul 14, 2015 at 11:18 PM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> On Tue, Jul 14, 2015 at 10:40 AM, Shushant Arora <
>> shushantarora09@gmail.com> wrote:
>>
>>> My understanding was --executor-cores(5 here) are maximum concurrent
>>> tasks possible in an executor and --num-executors (10 here)are no of
>>> executors or containers demanded by Application master/Spark driver program
>>>  to yarn RM.
>>>
>>
>> --executor-cores requests cores from YARN. YARN is a resource manager,
>> and you're requesting more resources than it has available, so it denies
>> your request. If you want to make more than 4 cores available in your NMs,
>> you need to change YARN's configuration.
>>
>> --
>> Marcelo
>>
>
>

Re: spark on yarn

Posted by Shushant Arora <sh...@gmail.com>.
Is yarn.scheduler.maximum-allocation-vcores the setting for max vcores per
container?

Whats the setting for max limit of --num-executors ?

On Tue, Jul 14, 2015 at 11:18 PM, Marcelo Vanzin <va...@cloudera.com>
wrote:

> On Tue, Jul 14, 2015 at 10:40 AM, Shushant Arora <
> shushantarora09@gmail.com> wrote:
>
>> My understanding was --executor-cores(5 here) are maximum concurrent
>> tasks possible in an executor and --num-executors (10 here)are no of
>> executors or containers demanded by Application master/Spark driver program
>>  to yarn RM.
>>
>
> --executor-cores requests cores from YARN. YARN is a resource manager, and
> you're requesting more resources than it has available, so it denies your
> request. If you want to make more than 4 cores available in your NMs, you
> need to change YARN's configuration.
>
> --
> Marcelo
>

Re: spark on yarn

Posted by Ted Yu <yu...@gmail.com>.
Shushant :

Please also see 'Debugging your Application' section of
https://spark.apache.org/docs/latest/running-on-yarn.html


On Tue, Jul 14, 2015 at 10:48 AM, Marcelo Vanzin <va...@cloudera.com>
wrote:

> On Tue, Jul 14, 2015 at 10:40 AM, Shushant Arora <
> shushantarora09@gmail.com> wrote:
>
>> My understanding was --executor-cores(5 here) are maximum concurrent
>> tasks possible in an executor and --num-executors (10 here)are no of
>> executors or containers demanded by Application master/Spark driver program
>>  to yarn RM.
>>
>
> --executor-cores requests cores from YARN. YARN is a resource manager, and
> you're requesting more resources than it has available, so it denies your
> request. If you want to make more than 4 cores available in your NMs, you
> need to change YARN's configuration.
>
> --
> Marcelo
>

Re: spark on yarn

Posted by Marcelo Vanzin <va...@cloudera.com>.
On Tue, Jul 14, 2015 at 10:40 AM, Shushant Arora <sh...@gmail.com>
wrote:

> My understanding was --executor-cores(5 here) are maximum concurrent
> tasks possible in an executor and --num-executors (10 here)are no of
> executors or containers demanded by Application master/Spark driver program
>  to yarn RM.
>

--executor-cores requests cores from YARN. YARN is a resource manager, and
you're requesting more resources than it has available, so it denies your
request. If you want to make more than 4 cores available in your NMs, you
need to change YARN's configuration.

-- 
Marcelo

Re: spark on yarn

Posted by Shushant Arora <sh...@gmail.com>.
got the below exception in logs:

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException):
Invalid resource request, requested virtual cores < 0, or requested virtual
cores > max configured, requestedVirtualCores=5, maxVirtualCores=4
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:205)
at
org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:94)
at
org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:487)
at
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at
org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)


-----------------------------------------------------------------------------------------------------------------------------------
My understanding was --executor-cores(5 here) are maximum concurrent tasks
possible in an executor and --num-executors (10 here)are no of executors or
containers demanded by Application master/Spark driver program  to yarn RM.

So these --num-executors (5) are parallel tasks and why is it controlled by
yarn . Why not by JVM executor which was started by name node . Why can't
these 10 JVM executors started by respective Name  nodes as containers
 allocate requested tasks or threads > vcores configured on node.

Is there a way I can control maxVirtualCores for an application and be it
very large than no of cores. Say 100 for a 8 core system. Since in a JVM
process( CPU+IO+Network intensive processing)100 threads are very less ?






On Tue, Jul 14, 2015 at 10:52 PM, Marcelo Vanzin <va...@cloudera.com>
wrote:

>
> On Tue, Jul 14, 2015 at 9:57 AM, Shushant Arora <shushantarora09@gmail.com
> > wrote:
>
>> When I specify --executor-cores > 4 it fails to start the application.
>> When I give --executor-cores as 4 , it works fine.
>>
>
> Do you have any NM that advertises more than 4 available cores?
>
> Also, it's always worth it to check if there's anything interesting in the
> logs; see the "yarn logs" command, and also the RM/NM logs.
>
> --
> Marcelo
>

Re: spark on yarn

Posted by Marcelo Vanzin <va...@cloudera.com>.
On Tue, Jul 14, 2015 at 9:57 AM, Shushant Arora <sh...@gmail.com>
wrote:

> When I specify --executor-cores > 4 it fails to start the application.
> When I give --executor-cores as 4 , it works fine.
>

Do you have any NM that advertises more than 4 available cores?

Also, it's always worth it to check if there's anything interesting in the
logs; see the "yarn logs" command, and also the RM/NM logs.

-- 
Marcelo

Re: spark on yarn

Posted by Marcelo Vanzin <va...@cloudera.com>.
On Tue, Jul 14, 2015 at 12:03 PM, Shushant Arora <sh...@gmail.com>
wrote:

> Can a container have multiple JVMs running in YARN?
>

Yes and no. A container runs a single command, but that process can start
other processes, and those also count towards the resource usage of the
container (mostly memory). For example, pyspark will spawn python processes
from the main JVM.

But if you're asking about executors, ignoring pyspark or other non
Scala/Java backends, there will be a single JVM. Spark will allow a number
of concurrent tasks to run that matches the number of vcores you requested
for the executor.


> 1.Is the difference is in Hadoop Mapreduce job - say I specify 20 reducers
> and my job uses 10 map tasks then, it need total 30 containers or 30 vcores
> ?
>

It's not that simple and trying to compare that to Spark is kinda
misleading.

-- 
Marcelo

Re: spark on yarn

Posted by Shushant Arora <sh...@gmail.com>.
Can a container have multiple JVMs running in YARN?

I am comparing Hadoop Mapreduce running on yarn vs spark running on yarn
here :

1.Is the difference is in Hadoop Mapreduce job - say I specify 20 reducers
and my job uses 10 map tasks then, it need total 30 containers or 30 vcores
? I guess 30 vcores and runs multiple max 4 jvms (set using max vcores in a
container) in  my case ?So in taotal 8 containers - 7 with 4 vcores and 1
with 2 ?

2.for each map/reduce tasks requested is equal to vcore requested  and it
launch a new JVM inside same container or per container 1 JVM ?

3.In spark jobs it launches task in same jvm running in a container and 1
container has max 1 JVM and max tasks = max vcore per container limit (4 in
my case).







On Wed, Jul 15, 2015 at 12:17 AM, Jong Wook Kim <jo...@nyu.edu> wrote:

> it's probably because your YARN cluster has only 40 vCores available.
>
> Go to your resource manager and check if "VCores Total" and "Memory Total"
> exceeds what you have set. (40 cores and 5120 MB)
>
> If that looks fine, go to "Scheduler" page and find the queue on which
> your jobs run, and check the resources allocated for that queue.
>
> Hope this helps.
>
> Jong Wook
>
>
> > On Jul 15, 2015, at 01:57, Shushant Arora <sh...@gmail.com>
> wrote:
> >
> > I am running spark application on yarn managed cluster.
> >
> > When I specify --executor-cores > 4 it fails to start the application.
> > I am starting the app as
> >
> > spark-submit --class classname --num-executors 10 --executor-cores 5
> --master masteradd jarname
> >
> > Exception in thread "main" org.apache.spark.SparkException: Yarn
> application has already ended! It might have been killed or unable to
> launch application master.
> >
> > When I give --executor-cores as 4 , it works fine.
> >
> > My Cluster has 10 nodes .
> > Why am I not able to specify more than 4 concurrent tasks. Is there any
> max limit yarn side or spark side which I can override to make use of more
> tasks ?
>
>

Re: spark on yarn

Posted by Jong Wook Kim <jo...@nyu.edu>.
it's probably because your YARN cluster has only 40 vCores available.

Go to your resource manager and check if "VCores Total" and "Memory Total" exceeds what you have set. (40 cores and 5120 MB)

If that looks fine, go to "Scheduler" page and find the queue on which your jobs run, and check the resources allocated for that queue.

Hope this helps.

Jong Wook


> On Jul 15, 2015, at 01:57, Shushant Arora <sh...@gmail.com> wrote:
> 
> I am running spark application on yarn managed cluster.
> 
> When I specify --executor-cores > 4 it fails to start the application.
> I am starting the app as
> 
> spark-submit --class classname --num-executors 10 --executor-cores 5 --master masteradd jarname
> 
> Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
> 
> When I give --executor-cores as 4 , it works fine.
> 
> My Cluster has 10 nodes . 
> Why am I not able to specify more than 4 concurrent tasks. Is there any max limit yarn side or spark side which I can override to make use of more tasks ?


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org