You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sandy Ryza (JIRA)" <ji...@apache.org> on 2014/05/11 00:13:18 UTC

[jira] [Created] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

Sandy Ryza created SPARK-1767:
---------------------------------

             Summary: Prefer HDFS-cached replicas when scheduling data-local tasks
                 Key: SPARK-1767
                 URL: https://issues.apache.org/jira/browse/SPARK-1767
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 1.0.0
            Reporter: Sandy Ryza






--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: [jira] [Created] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

Posted by Mridul Muralidharan <mr...@gmail.com>.
Hi Sandy,

  I assume you are referring to caching added to datanodes via new caching
api via NN ? (To preemptively mmap blocks).

I have not looked in detail, but does NN tell us about this in block
locations?
If yes, we can simply make those process local instead of node local for
executors on that node.

This would simply be a change to hadoop based rdd partitioning (what makes
it tricky is to expose currently 'alive' executors to partition)

Thanks
Mridul
On 15-May-2014 3:49 am, "Sandy Ryza (JIRA)" <ji...@apache.org> wrote:

> Sandy Ryza created SPARK-1767:
> ---------------------------------
>
>              Summary: Prefer HDFS-cached replicas when scheduling
> data-local tasks
>                  Key: SPARK-1767
>                  URL: https://issues.apache.org/jira/browse/SPARK-1767
>              Project: Spark
>           Issue Type: Improvement
>           Components: Spark Core
>     Affects Versions: 1.0.0
>             Reporter: Sandy Ryza
>
>
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>