You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "huangxiaopingRD (via GitHub)" <gi...@apache.org> on 2024/02/02 02:24:13 UTC

[PR] [SPARK-42727][CORE] Fix can't executing spark commands in the root directory when local mode is specified [spark]

huangxiaopingRD opened a new pull request, #44990:
URL: https://github.com/apache/spark/pull/44990

   ### What changes were proposed in this pull request?
   Special treatment for the root directory when split `userClassPath`
   
   ### Why are the changes needed?
   I found that executing the spark command in the "/" directory will report an error. The reason is that `userClassPath` is split according to "/"
   
   **Method to reproduce the issue:**
   
   <img width="631" alt="image" src="https://user-images.githubusercontent.com/35296098/223975469-18a3dd6a-7fc4-40c4-b6c1-7c9e62e8f48d.png">
   
   **Exception information:**
   ```
   23/03/09 17:10:53 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
   23/03/09 17:10:53 ERROR SparkContext: Error initializing SparkContext.
   java.util.NoSuchElementException: next on empty iterator
   	at scala.collection.Iterator$$anon$2.next(Iterator.scala:41)
   	at scala.collection.Iterator$$anon$2.next(Iterator.scala:39)
   	at scala.collection.IndexedSeqLike$Elements.next(IndexedSeqLike.scala:63)
   	at scala.collection.IterableLike.head(IterableLike.scala:109)
   	at scala.collection.IterableLike.head$(IterableLike.scala:108)
   	at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$head(ArrayOps.scala:198)
   	at scala.collection.IndexedSeqOptimized.head(IndexedSeqOptimized.scala:129)
   	at scala.collection.IndexedSeqOptimized.head$(IndexedSeqOptimized.scala:129)
   	at scala.collection.mutable.ArrayOps$ofRef.head(ArrayOps.scala:198)
   	at scala.collection.TraversableLike.last(TraversableLike.scala:519)
   	at scala.collection.TraversableLike.last$(TraversableLike.scala:518)
   	at scala.collection.mutable.ArrayOps$ofRef.scala$collection$IndexedSeqOptimized$$super$last(ArrayOps.scala:198)
   	at scala.collection.IndexedSeqOptimized.last(IndexedSeqOptimized.scala:135)
   	at scala.collection.IndexedSeqOptimized.last$(IndexedSeqOptimized.scala:135)
   	at scala.collection.mutable.ArrayOps$ofRef.last(ArrayOps.scala:198)
   	at org.apache.spark.executor.Executor.$anonfun$createClassLoader$1(Executor.scala:869)
   	at org.apache.spark.executor.Executor.$anonfun$createClassLoader$1$adapted(Executor.scala:868)
   	at scala.collection.immutable.List.foreach(List.scala:392)
   	at org.apache.spark.executor.Executor.createClassLoader(Executor.scala:868)
   	at org.apache.spark.executor.Executor.<init>(Executor.scala:159)
   	at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
   	at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
   	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
   	at org.apache.spark.SparkContext.<init>(SparkContext.scala:595)
   	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2681)
   	at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
   	at scala.Option.getOrElse(Option.scala:189)
   	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:52)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:334)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
   	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
   	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
   	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
   	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
   	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
   	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   23/03/09 17:10:53 ERROR Utils: Uncaught exception in thread main
   java.lang.NullPointerException
   	at org.apache.spark.scheduler.local.LocalSchedulerBackend.org$apache$spark$scheduler$local$LocalSchedulerBackend$$stop(LocalSchedulerBackend.scala:173)
   	at org.apache.spark.scheduler.local.LocalSchedulerBackend.stop(LocalSchedulerBackend.scala:144)
   	at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:881)
   	at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2371)
   	at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:2078)
   	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1489)
   	at org.apache.spark.SparkContext.stop(SparkContext.scala:2078)
   	at org.apache.spark.SparkContext.<init>(SparkContext.scala:674)
   	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2681)
   	at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:949)
   	at scala.Option.getOrElse(Option.scala:189)
   	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:943)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:52)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:334)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
   	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
   	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
   	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
   	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
   	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
   	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
   	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
   	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42727][CORE] Fix can't executing spark commands in the root directory when local mode is specified [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44990:
URL: https://github.com/apache/spark/pull/44990#discussion_r1475470332


##########
core/src/main/scala/org/apache/spark/executor/Executor.scala:
##########
@@ -1050,7 +1050,11 @@ private[spark] class Executor(
     // Bootstrap the list of jars with the user class path.
     val now = System.currentTimeMillis()
     userClassPath.foreach { url =>
-      currentJars(url.getPath().split("/").last) = now

Review Comment:
   Hm, did you set `spark.executor.extraClassPath` as `/`? This is user-specified class path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42727][CORE] Fix can't executing spark commands in the root directory when local mode is specified [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44990:
URL: https://github.com/apache/spark/pull/44990#discussion_r1475519937


##########
core/src/main/scala/org/apache/spark/executor/Executor.scala:
##########
@@ -1050,7 +1050,11 @@ private[spark] class Executor(
     // Bootstrap the list of jars with the user class path.
     val now = System.currentTimeMillis()
     userClassPath.foreach { url =>
-      currentJars(url.getPath().split("/").last) = now

Review Comment:
   Let's probably focus on fixing the error message instead of making sliently igoring



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42727][CORE] Fix can't executing spark commands in the root directory when local mode is specified [spark]

Posted by "huangxiaopingRD (via GitHub)" <gi...@apache.org>.
huangxiaopingRD commented on code in PR #44990:
URL: https://github.com/apache/spark/pull/44990#discussion_r1475488791


##########
core/src/main/scala/org/apache/spark/executor/Executor.scala:
##########
@@ -1050,7 +1050,11 @@ private[spark] class Executor(
     // Bootstrap the list of jars with the user class path.
     val now = System.currentTimeMillis()
     userClassPath.foreach { url =>
-      currentJars(url.getPath().split("/").last) = now

Review Comment:
   You are right, there is something wrong with my previous description. However, when `spark.executor.extraClassPath` is set to an empty string and the current directory is the root directory, an exception will occur because Spark does not verify whether spark.executor.extraClassPath is empty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42727][CORE] Fix can't executing spark commands in the root directory when local mode is specified [spark]

Posted by "huangxiaopingRD (via GitHub)" <gi...@apache.org>.
huangxiaopingRD commented on code in PR #44990:
URL: https://github.com/apache/spark/pull/44990#discussion_r1475484352


##########
core/src/main/scala/org/apache/spark/executor/Executor.scala:
##########
@@ -1050,7 +1050,11 @@ private[spark] class Executor(
     // Bootstrap the list of jars with the user class path.
     val now = System.currentTimeMillis()
     userClassPath.foreach { url =>
-      currentJars(url.getPath().split("/").last) = now

Review Comment:
   We mistakenly set `spark.executor.extraClassPath` to an empty string, which triggered this bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-42727][CORE] Fix can't executing spark commands in the root directory when local mode is specified [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on code in PR #44990:
URL: https://github.com/apache/spark/pull/44990#discussion_r1475485829


##########
core/src/main/scala/org/apache/spark/executor/Executor.scala:
##########
@@ -1050,7 +1050,11 @@ private[spark] class Executor(
     // Bootstrap the list of jars with the user class path.
     val now = System.currentTimeMillis()
     userClassPath.foreach { url =>
-      currentJars(url.getPath().split("/").last) = now

Review Comment:
   This isn't related to starting at the root or not. Setting `spark.executor.extraClassPath` matters.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org