You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Paweł Wiejacha (JIRA)" <ji...@apache.org> on 2019/07/25 18:25:00 UTC
[jira] [Commented] (SPARK-25787) [K8S] Spark can't use data
locality information
[ https://issues.apache.org/jira/browse/SPARK-25787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893043#comment-16893043 ]
Paweł Wiejacha commented on SPARK-25787:
----------------------------------------
I *can* reproduce this issue. In Spark UI, Locality Level is always ANY instead of NODE_LOCAL when reading data from HDFS.
As Yinan Li said, it seems that:
> Support for data locality on k8s has not been ported to the upstream Spark repo yet.
I think that at least the pull request below should be ported and merged to support HDFS data locality in Spark on Kubernetes.
https://github.com/apache-spark-on-k8s/spark/pull/216
Could you please reopen this issue?
> [K8S] Spark can't use data locality information
> -----------------------------------------------
>
> Key: SPARK-25787
> URL: https://issues.apache.org/jira/browse/SPARK-25787
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes
> Affects Versions: 2.4.0
> Reporter: Maciej Bryński
> Priority: Major
>
> I started experimenting with Spark based on this presentation:
> https://www.slideshare.net/databricks/hdfs-on-kuberneteslessons-learned-with-kimoon-kim
> I'm using excelent https://github.com/apache-spark-on-k8s/kubernetes-HDFS
> charts to deploy HDFS.
> Unfortunately reading from HDFS gives ANY locality for every task.
> Is data locality working on Kubernetes cluster ?
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org