You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:04:33 UTC

[jira] [Updated] (SPARK-19259) spark locality in CNI context

     [ https://issues.apache.org/jira/browse/SPARK-19259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-19259:
---------------------------------
    Labels: bulk-closed performance security  (was: performance security)

> spark locality in CNI context
> -----------------------------
>
>                 Key: SPARK-19259
>                 URL: https://issues.apache.org/jira/browse/SPARK-19259
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>         Environment: Mesos and all resources managers using CNI model (Kubernetes, GKE, ECS...)
>            Reporter: Vincent gromakowski
>            Priority: Major
>              Labels: bulk-closed, performance, security
>
> When using CNI deployment model, each executor gets its own IP/hostname so Spark isn't able to schedule tasks locally depending on the hostnames advertised by the backend. Currently all backends providing data locality with Spark use the same method: advertise the topology by giving list of hostnames.
> On one hand, data locality is mandatory for large scale production jobs as you can get huge performance improvement. On the other hand, CNI is clearly the future network model of all container deployments providing easy service discovery, isolation and security policies. So it would be very interesting to mix these two features in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org