You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Stanis Shkel (JIRA)" <ji...@apache.org> on 2018/12/18 22:15:00 UTC
[jira] [Created] (SPARK-26401) [k8s] Init container drops necessary config options for pulling jars from azure storage

Stanis Shkel created SPARK-26401:
------------------------------------

             Summary: [k8s] Init container drops necessary config options for pulling jars from azure storage
                 Key: SPARK-26401
                 URL: https://issues.apache.org/jira/browse/SPARK-26401
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 2.3.2
            Reporter: Stanis Shkel


I am running spark-submit command that pulls a jar from a remote private azure storage account. As far as I understand jar is supposed to be pulled within init container of the driver. However, the container doesn't inherit  "spark.hadoop.fs.azure.account.key.$(STORAGE_ACCT).blob.core.windows.net=$(STORAGE_SECRET)" parameter that I pass in when running spark submit.

Here is what I found so far. spark-init container is called via the following command 

[https://github.com/apache/spark/blob/branch-2.3/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L83]

Which in the end turns into the following shell call
{code:bash}

exec /usr/lib/jvm/java-1.8-openjdk/bin/java -cp '/opt/spark/conf/:/opt/spark/jars/*' -Xmx1g org.apache.spark.deploy.k8s.SparkPodInitContainer /etc/spark-init/spark-init.properties

{code}

If I cat out spark-init properties the only parameters that are in there are 

spark.kubernetes.mountDependencies.jarsDownloadDir=/var/spark-data/spark-jars

spark.kubernetes.initContainer.remoteJars=wasbs\://mycontainer@testaccount.blob.core.windows.net/jars/myjar.jar,wasbs\://mycontainer@testaccount.blob.core.windows.net/jars/myjar.jar

spark.kubernetes.mountDependencies.filesDownloadDir=/var/spark-data/spark-files

My guess it's these params [https://github.com/apache/spark/blob/branch-2.3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/initcontainer/BasicInitContainerConfigurationStep.scala#L49]

However the spark.hadoop.fs.azure.account.key is not present in that file nor in the environment.

This causes download of the jar fail, the exception is as follows

{code:bash}

Exception in thread "main" org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: Container mycontainer in account testaccount.blob.core.windows.net not found, and we can't create it using anoynomous credentials.
 at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:938)
 at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:438)
 at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1048)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
 at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1910)
 at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:700)
 at org.apache.spark.util.Utils$.fetchFile(Utils.scala:492)
 at org.apache.spark.deploy.k8s.FileFetcher.fetchFile(SparkPodInitContainer.scala:91)
 at org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1$$anonfun$apply$2.apply(SparkPodInitContainer.scala:81)
 at org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1$$anonfun$apply$2.apply(SparkPodInitContainer.scala:79)
 at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
 at org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1.apply(SparkPodInitContainer.scala:79)
 at org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1.apply(SparkPodInitContainer.scala:77)
 at scala.Option.foreach(Option.scala:257)
 at org.apache.spark.deploy.k8s.SparkPodInitContainer.downloadFiles(SparkPodInitContainer.scala:77)
 at org.apache.spark.deploy.k8s.SparkPodInitContainer.run(SparkPodInitContainer.scala:56)
 at org.apache.spark.deploy.k8s.SparkPodInitContainer$.main(SparkPodInitContainer.scala:113)
 at org.apache.spark.deploy.k8s.SparkPodInitContainer.main(SparkPodInitContainer.scala)
Caused by: org.apache.hadoop.fs.azure.AzureException: Container qrefinery in account jr3e3d.blob.core.windows.net not found, and we can't create it using anoynomous credentials.
 at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:730)
 at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:933)
 ... 22 more

{code}

I am certain that the parameter is being passed in to the driver correctly. Due to https://issues.apache.org/jira/browse/SPARK-26400 spark-init container "succeeds" and the driver will fail with missing jar step. I can see -Dspark.hadoop.fs.azure.account.key as one of the flags for the driver CMD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org