You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vincent geomakowski (JIRA)" <ji...@apache.org> on 2017/01/17 08:57:26 UTC

[jira] [Created] (SPARK-19259) spark locality in CNI context

Vincent geomakowski created SPARK-19259:
-------------------------------------------

             Summary: spark locality in CNI context
                 Key: SPARK-19259
                 URL: https://issues.apache.org/jira/browse/SPARK-19259
             Project: Spark
          Issue Type: Improvement
          Components: Scheduler
    Affects Versions: 2.1.0, 2.0.2, 2.0.1, 2.0.0, 1.6.3, 1.6.2
         Environment: Mesos and all resources managers using CNI model (Kubernetes, GKE, ECS...)
            Reporter: Vincent geomakowski


When using CNI deployment model, each executor gets its own IP/hostname so Spark isn't able to schedule tasks locally depending on the hostnames advertised by the backend. Currently all backends providing data locality with Spark use the same method: advertise the topology by giving list of hostnames.
On one hand, data locality is mandatory for large scale production jobs as you can get huge performance improvement. On the other hand, CNI is clearly the future network model of all container deployments providing easy service discovery, isolation and security policies. So it would be very interesting to mix these two features in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org