You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Arne Zachlod <ar...@nerdkeller.org> on 2019/02/19 12:59:42 UTC

Spark on Kubernetes with persistent local storage

Hello,

I'm trying to host spark applications on a kubernetes cluster and want
to provide localized persistent storage to the spark workers in a small
research project I'm currently doing.
I googled a bit around and found that HDFS seems to be pretty well
supported with spark, but there arise some problems with the
localization of data if I want to do this as outlined in this talk [1].
As far as I understand it, most of the configurations for deploying this
are in their git repo [2]. But the spark-driver needs some patch to map
the workers and the HDFS datanodes correctly to the kubernetes nodes, is
something like this already part of the current spark codebase as of
spark 2.4.0? I had a look at the code but couldn't find anything related
to hdfs localization (pretty sure I just didn't look at the right place).

So, my question now is: is this even a viable option at the current
state of the project(s)? What storage solution would be recommended
instead if spark on kubernetes is given (so no yarn/mesos)?

Looking forward to your input.

Arne

[1] https://databricks.com/session/hdfs-on-kubernetes-lessons-learned
[2] https://github.com/apache-spark-on-k8s/kubernetes-HDFS

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org