You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Chris Fregly <ch...@fregly.com> on 2014/08/20 01:09:10 UTC

Re: Data Locality In Spark

and even the same process where the data might be cached.


these are the different locality levels:

PROCESS_LOCAL
NODE_LOCAL
RACK_LOCAL
ANY

relevant code:
https://github.com/apache/spark/blob/7712e724ad69dd0b83754e938e9799d13a4d43b9/core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala#L150

https://github.com/apache/spark/blob/63bdb1f41b4895e3a9444f7938094438a94d3007/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L250

relevant docs:
see the spark.locality configuration attributes here:
https://spark.apache.org/docs/latest/configuration.html


On Tue, Jul 8, 2014 at 1:13 PM, Sandy Ryza <sa...@cloudera.com> wrote:

> Hi Anish,
>
> Spark, like MapReduce, makes an effort to schedule tasks on the same nodes
> and racks that the input blocks reside on.
>
> -Sandy
>
>
> On Tue, Jul 8, 2014 at 12:27 PM, anishsneh@yahoo.co.in <
> anishsneh@yahoo.co.in> wrote:
>
> > Hi All
> >
> > My apologies for very basic question, do we have full support of data
> > locality in Spark MapReduce.
> >
> > Please suggest.
> >
> > --
> > Anish Sneh
> > "Experience is the best teacher."
> > http://in.linkedin.com/in/anishsneh
> >
> >
>