You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2019/02/12 19:45:00 UTC
[jira] [Resolved] (SPARK-26401) [k8s] Init container drops necessary config options for pulling jars from azure storage

     [ https://issues.apache.org/jira/browse/SPARK-26401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcelo Vanzin resolved SPARK-26401.
------------------------------------
    Resolution: Won't Fix

Init containers were removed in 2.4; your configuration should be picked up fine by the driver and executors, which now handle downloading dependencies themselves.

> [k8s] Init container drops necessary config options for pulling jars from azure storage
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-26401
>                 URL: https://issues.apache.org/jira/browse/SPARK-26401
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 2.3.2
>            Reporter: Stanis Shkel
>            Priority: Major
>
> I am running spark-submit command that pulls a jar from a remote private azure storage account. As far as I understand jar is supposed to be pulled within init container of the driver. However, the container doesn't inherit  "spark.hadoop.fs.azure.account.key.$(STORAGE_ACCT).blob.core.windows.net=$(STORAGE_SECRET)" parameter that I pass in when running spark submit.
> Here is what I found so far. spark-init container is called via the following command 
> [https://github.com/apache/spark/blob/branch-2.3/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh#L83]
> Which in the end turns into the following shell call
> {code:bash}
> exec /usr/lib/jvm/java-1.8-openjdk/bin/java -cp '/opt/spark/conf/:/opt/spark/jars/*' -Xmx1g org.apache.spark.deploy.k8s.SparkPodInitContainer /etc/spark-init/spark-init.properties
> {code}
> If I cat out spark-init properties the only parameters that are in there are 
> spark.kubernetes.mountDependencies.jarsDownloadDir=/var/spark-data/spark-jars
> spark.kubernetes.initContainer.remoteJars=wasbs\://mycontainer@testaccount.blob.core.windows.net/jars/myjar.jar,wasbs\://mycontainer@testaccount.blob.core.windows.net/jars/myjar.jar
> spark.kubernetes.mountDependencies.filesDownloadDir=/var/spark-data/spark-files
> My guess it's these params [https://github.com/apache/spark/blob/branch-2.3/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/steps/initcontainer/BasicInitContainerConfigurationStep.scala#L49]
> However the spark.hadoop.fs.azure.account.key is not present in that file nor in the environment.
> This causes download of the jar fail, the exception is as follows
> {code:bash}
> Exception in thread "main" org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: Container mycontainer in account testaccount.blob.core.windows.net not found, and we can't create it using anoynomous credentials.
>  at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:938)
>  at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:438)
>  at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1048)
>  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
>  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
>  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
>  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
>  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
>  at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1910)
>  at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:700)
>  at org.apache.spark.util.Utils$.fetchFile(Utils.scala:492)
>  at org.apache.spark.deploy.k8s.FileFetcher.fetchFile(SparkPodInitContainer.scala:91)
>  at org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1$$anonfun$apply$2.apply(SparkPodInitContainer.scala:81)
>  at org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1$$anonfun$apply$2.apply(SparkPodInitContainer.scala:79)
>  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>  at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>  at org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1.apply(SparkPodInitContainer.scala:79)
>  at org.apache.spark.deploy.k8s.SparkPodInitContainer$$anonfun$downloadFiles$1.apply(SparkPodInitContainer.scala:77)
>  at scala.Option.foreach(Option.scala:257)
>  at org.apache.spark.deploy.k8s.SparkPodInitContainer.downloadFiles(SparkPodInitContainer.scala:77)
>  at org.apache.spark.deploy.k8s.SparkPodInitContainer.run(SparkPodInitContainer.scala:56)
>  at org.apache.spark.deploy.k8s.SparkPodInitContainer$.main(SparkPodInitContainer.scala:113)
>  at org.apache.spark.deploy.k8s.SparkPodInitContainer.main(SparkPodInitContainer.scala)
> Caused by: org.apache.hadoop.fs.azure.AzureException: Container qrefinery in account jr3e3d.blob.core.windows.net not found, and we can't create it using anoynomous credentials.
>  at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:730)
>  at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:933)
>  ... 22 more
> {code}
> I am certain that the parameter is being passed in to the driver correctly. Due to https://issues.apache.org/jira/browse/SPARK-26400 spark-init container "succeeds" and the driver will fail with missing jar step. I can see -Dspark.hadoop.fs.azure.account.key as one of the flags for the driver CMD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org