You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Todd <bi...@163.com> on 2016/01/28 03:50:50 UTC

How data locality is honored when spark is running on yarn

Hi,
I am kind of confused about how data locality is honored when  spark  is running on yarn(client or cluster mode),can someone please elaberate on this? Thanks!

Re: How data locality is honored when spark is running on yarn

Posted by Saisai Shao <sa...@gmail.com>.

Hi Todd,

There're two levels of locality based scheduling when you run Spark on Yarn
if dynamic allocation enabled:

1. Container allocation is based on the locality ratio of pending tasks,
this is Yarn specific and only works with dynamic allocation enabled.
2. Task scheduling is locality awared, this is same for different cluster
manager.

Thanks
Saisai

On Thu, Jan 28, 2016 at 10:50 AM, Todd <bi...@163.com> wrote:

> Hi,
> I am kind of confused about how data locality is honored when  spark  is
> running on yarn(client or cluster mode),can someone please elaberate on
> this? Thanks!
>
>
>