You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mike Frampton <mi...@hotmail.com> on 2015/07/19 03:25:31 UTC

[General Question] [Hadoop + Spark at scale] Spark Rack Awareness ?

I wanted to ask a general question about Hadoop/Yarn and Apache Spark integration. I know that 
Hadoop on a physical cluster has rack awareness. i.e. It attempts to minimise network traffic 
by saving replicated blocks within a rack. i.e. 

I wondered whether, when Spark is configured to use Yarn as a cluster manager, it is able to 
use this feature to also minimise network traffic to a degree. 

Sorry if this questionn is not quite accurate but I think you can generally see what I mean ? 
 		 	   		  

Re: [General Question] [Hadoop + Spark at scale] Spark Rack Awareness ?

Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Mike,

Spark is rack-aware in its task scheduling.  Currently Spark doesn't honor
any locality preferences when scheduling executors, but this is being
addressed in SPARK-4352, after which executor-scheduling will be rack-aware
as well.

-Sandy

On Sat, Jul 18, 2015 at 6:25 PM, Mike Frampton <mi...@hotmail.com>
wrote:

> I wanted to ask a general question about Hadoop/Yarn and Apache Spark
> integration. I know that
> Hadoop on a physical cluster has rack awareness. i.e. It attempts to
> minimise network traffic
> by saving replicated blocks within a rack. i.e.
>
> I wondered whether, when Spark is configured to use Yarn as a cluster
> manager, it is able to
> use this feature to also minimise network traffic to a degree.
>
> Sorry if this questionn is not quite accurate but I think you can
> generally see what I mean ?
>