You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Prashant Sharma <sc...@gmail.com> on 2020/03/27 12:50:42 UTC

[DISCUSS][K8s] Copy files securely to the pods or containers.

Hello All,
The issue SPARK-23153 <https://issues.apache.org/jira/browse/SPARK-23153>
lets us copy any file to the pod/container, by first copying it to a hadoop
supported filesystem e.g. HDFS, s3, cos etc. This is especially useful if,
the files have to be copied to large number of pods/nodes.  However, in
most cases we need the file to be copied only to the driver, it may not be
always convenient (esp. in case of clusters with smaller no. of nodes or
limited resources), to setup an additional intermediate storage just for
this, it cannot work without an intermediate distributed storage of some
sort.
So, while going through the code of kubectl cp command
<https://github.com/kubernetes/kubernetes/blob/master/pkg/kubectl/cmd/cp/cp.go>
. It appears, that we can use the same technique using
tar cf - /tmp/foo | kubectl exec -i -n <some-namespace> <some-pod> -- tar
xf - -C /tmp/bar to copy files in a more secure way (because the file goes
through kubernetes API, which has its own security in place)
This also lets us compress the file while sending.

If there is any interest in this sort of feature, I am ready to open an
issue and work on it. So let us discuss, if this has already been explored
and there are some known issues with this approach.

Thank you,
Prashant.