You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Kevin Grealish <ke...@microsoft.com> on 2016/10/01 01:49:00 UTC

regression: no longer able to use HDFS wasbs:// path for additional python files on LIVY batch submit

I'm seeing a regression when submitting a batch PySpark program with additional files using LIVY. This is YARN cluster mode. The program files are placed into the mounted Azure Storage before making the call to LIVY. This is happening from an application which has credentials for the storage and the LIVY endpoint, but not local file systems on the cluster. This previously worked but now I'm getting the error below.

Seems this restriction was introduced with https://github.com/apache/spark/commit/5081a0a9d47ca31900ea4de570de2cbb0e063105 (new in 1.6.2 and 2.0.0).

How should the scenario above be achieved now? Am I missing something?


Exception in thread "main" java.lang.IllegalArgumentException: Launching Python applications through spark-submit is currently only supported for local files: wasb://kevingrecluster2@xxxxxxxx.blob.core.windows.net/xxxxxxxxx/xxxxxxx.py
                at org.apache.spark.deploy.PythonRunner$.formatPath(PythonRunner.scala:104)
                at org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136)
                at org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
                at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
                at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
                at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
                at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
                at org.apache.spark.deploy.PythonRunner$.formatPaths(PythonRunner.scala:136)
                at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$11.apply(SparkSubmit.scala:639)
                at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$11.apply(SparkSubmit.scala:637)
                at scala.Option.foreach(Option.scala:236)
                at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:637)
                at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
                at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
                at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.Exception: spark-submit exited with code 1}.

RE: regression: no longer able to use HDFS wasbs:// path for additional python files on LIVY batch submit

Posted by Kevin Grealish <ke...@microsoft.com>.

Great. Thanks for the pointer. I see the fix is in 2.0.1-rc4.

Will there be a 1.6.3? If so, how are fixes considered for backporting?

From: Steve Loughran [mailto:stevel@hortonworks.com]
Sent: Monday, October 3, 2016 5:40 AM
To: Kevin Grealish <ke...@microsoft.com>
Cc: Apache Spark Dev <de...@spark.apache.org>
Subject: Re: regression: no longer able to use HDFS wasbs:// path for additional python files on LIVY batch submit

On 1 Oct 2016, at 02:49, Kevin Grealish <ke...@microsoft.com>> wrote:

I’m seeing a regression when submitting a batch PySpark program with additional files using LIVY. This is YARN cluster mode. The program files are placed into the mounted Azure Storage before making the call to LIVY. This is happening from an application which has credentials for the storage and the LIVY endpoint, but not local file systems on the cluster. This previously worked but now I’m getting the error below.

Seems this restriction was introduced with https://github.com/apache/spark/commit/5081a0a9d47ca31900ea4de570de2cbb0e063105<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fcommit%2F5081a0a9d47ca31900ea4de570de2cbb0e063105&data=01%7C01%7Ckevingre%40microsoft.com%7C6de8fd563cb143a4015108d3eb8a73a9%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=YiYyvdkzUMPKAHC6hPzN2kKm6vkgJWsb4a6KpkSUa18%3D&reserved=0> (new in 1.6.2 and 2.0.0).

How should the scenario above be achieved now? Am I missing something?

This has been fixed in https://issues.apache.org/jira/browse/SPARK-17512<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK-17512&data=01%7C01%7Ckevingre%40microsoft.com%7C6de8fd563cb143a4015108d3eb8a73a9%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=zh7rOQL1s2ZSIdqW%2Fz0PktGPcFpMQ7HRFKETp5qIhJk%3D&reserved=0> ; I don't know if its in 2.0.1 though

Exception in thread "main" java.lang.IllegalArgumentException: Launching Python applications through spark-submit is currently only supported for local files: wasb://kevingrecluster2@xxxxxxxx.blob.core.windows.net/xxxxxxxxx/xxxxxxx.py
                at org.apache.spark.deploy.PythonRunner$.formatPath(PythonRunner.scala:104)
                at org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136)
                at org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
                at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
                at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
                at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
                at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
                at org.apache.spark.deploy.PythonRunner$.formatPaths(PythonRunner.scala:136)
                at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$11.apply(SparkSubmit.scala:639)
                at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$11.apply(SparkSubmit.scala:637)
                at scala.Option.foreach(Option.scala:236)
                at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:637)
                at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
                at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
                at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.Exception: spark-submit exited with code 1}.

Re: regression: no longer able to use HDFS wasbs:// path for additional python files on LIVY batch submit

Posted by Steve Loughran <st...@hortonworks.com>.

On 1 Oct 2016, at 02:49, Kevin Grealish <ke...@microsoft.com>> wrote:

I’m seeing a regression when submitting a batch PySpark program with additional files using LIVY. This is YARN cluster mode. The program files are placed into the mounted Azure Storage before making the call to LIVY. This is happening from an application which has credentials for the storage and the LIVY endpoint, but not local file systems on the cluster. This previously worked but now I’m getting the error below.

Seems this restriction was introduced with https://github.com/apache/spark/commit/5081a0a9d47ca31900ea4de570de2cbb0e063105 (new in 1.6.2 and 2.0.0).

How should the scenario above be achieved now? Am I missing something?

This has been fixed in https://issues.apache.org/jira/browse/SPARK-17512 ; I don't know if its in 2.0.1 though


Exception in thread "main" java.lang.IllegalArgumentException: Launching Python applications through spark-submit is currently only supported for local files: wasb://kevingrecluster2@xxxxxxxx.blob.core.windows.net/xxxxxxxxx/xxxxxxx.py
                at org.apache.spark.deploy.PythonRunner$.formatPath(PythonRunner.scala:104)
                at org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136)
                at org.apache.spark.deploy.PythonRunner$$anonfun$formatPaths$3.apply(PythonRunner.scala:136)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
                at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
                at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
                at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
                at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
                at org.apache.spark.deploy.PythonRunner$.formatPaths(PythonRunner.scala:136)
                at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$11.apply(SparkSubmit.scala:639)
                at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$11.apply(SparkSubmit.scala:637)
                at scala.Option.foreach(Option.scala:236)
                at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:637)
                at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
                at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
                at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.Exception: spark-submit exited with code 1}.