You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mike Frampton <mi...@hotmail.com> on 2015/07/19 03:25:31 UTC
[General Question] [Hadoop + Spark at scale] Spark Rack Awareness ?
I wanted to ask a general question about Hadoop/Yarn and Apache Spark integration. I know that
Hadoop on a physical cluster has rack awareness. i.e. It attempts to minimise network traffic
by saving replicated blocks within a rack. i.e.
I wondered whether, when Spark is configured to use Yarn as a cluster manager, it is able to
use this feature to also minimise network traffic to a degree.
Sorry if this questionn is not quite accurate but I think you can generally see what I mean ?
Re: [General Question] [Hadoop + Spark at scale] Spark Rack Awareness ?
Posted by Sandy Ryza <sa...@cloudera.com>.
Hi Mike,
Spark is rack-aware in its task scheduling. Currently Spark doesn't honor
any locality preferences when scheduling executors, but this is being
addressed in SPARK-4352, after which executor-scheduling will be rack-aware
as well.
-Sandy
On Sat, Jul 18, 2015 at 6:25 PM, Mike Frampton <mi...@hotmail.com>
wrote:
> I wanted to ask a general question about Hadoop/Yarn and Apache Spark
> integration. I know that
> Hadoop on a physical cluster has rack awareness. i.e. It attempts to
> minimise network traffic
> by saving replicated blocks within a rack. i.e.
>
> I wondered whether, when Spark is configured to use Yarn as a cluster
> manager, it is able to
> use this feature to also minimise network traffic to a degree.
>
> Sorry if this questionn is not quite accurate but I think you can
> generally see what I mean ?
>