You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yang Jie (Jira)" <ji...@apache.org> on 2021/09/10 15:38:00 UTC

[jira] [Commented] (SPARK-36636) SparkContextSuite random failure in Scala 2.13

    [ https://issues.apache.org/jira/browse/SPARK-36636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413249#comment-17413249 ] 

Yang Jie commented on SPARK-36636:
----------------------------------

After some investigation, I found the reasons for the random failure of these cases, which are summarized as follows:

Firstly, there are multiple consecutive cases in the SparkContextSuite using the local-cluster mode like `local-cluster[3, 1, 1024]`, each local-cluster will start a new local standalone cluster and submit a new application with appid format `app-yyyyMMddHHmmss-0000`, like `app-202109102324-0000`.

 

Therefore, if the cases with config `local-cluster[i, c, m]` are run continuously within 1s, worker directory collision will occur, 
the evidence is that I found a large number of the following logs in the log:

 
{code:java}
java.io.IOException: Failed to create directory /spark-mine/work/app-20210908074432-0000/1java.io.IOException: Failed to create directory /spark-mine/work/app-20210908074432-0000/1 at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:578) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)21/09/08 22:44:32.266 dispatcher-event-loop-0 INFO Worker: Asked to launch executor app-20210908074432-0000/0 for test21/09/08 22:44:32.266 dispatcher-event-loop-0 ERROR Worker: Failed to launch executor app-20210908074432-0000/0 for test.java.io.IOException: Failed to create directory /spark-mine/work/app-20210908074432-0000/0 at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:578) at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75) at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
{code}
Since the default value of `spark.deploy.maxExecutorRetries` is 10, the following phenomena will occur when 5 consecutive cases with `local-cluster[3, 1, 1024]` are completed within 1 second:

 
{code:java}
case 1: use worker directories: /app-202109102324-0000/0, /app-202109102324-0000/1, /app-202109102324-0000/2
case 2: retry 3 times then use worker directories: /app-202109102324-0000/3, /app-202109102324-0000/4, /app-202109102324-0000/5  
case 3: retry 6 times then use worker directories: /app-202109102324-0000/6, /app-202109102324-0000/7, /app-202109102324-0000/8
case 4: retry 9 times then use worker directories: /app-202109102324-0000/9, /app-202109102324-0000/10, /app-202109102324-0000/11
case 5: retry more than 10 times then failed

{code}

> SparkContextSuite random failure in Scala 2.13
> ----------------------------------------------
>
>                 Key: SPARK-36636
>                 URL: https://issues.apache.org/jira/browse/SPARK-36636
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>    Affects Versions: 3.3.0
>            Reporter: Yang Jie
>            Priority: Major
>
> run
> {code:java}
> build/mvn clean install -Pscala-2.13 -pl core -am{code}
> or
> {code:java}
> build/mvn clean install -Pscala-2.13 -pl core -am -Dtest=none -DwildcardSuites=org.apache.spark.SparkContextSuite
> {code}
> Some cases may fail as follows:
>  
> {code:java}
> - SPARK-33084: Add jar support Ivy URI -- test param key case sensitive *** FAILED ***
>   java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
> This stopped SparkContext was created at:
> org.apache.spark.SparkContextSuite.$anonfun$new$154(SparkContextSuite.scala:1155)
> org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> org.scalatest.Transformer.apply(Transformer.scala:22)
> org.scalatest.Transformer.apply(Transformer.scala:20)
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190)
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62)
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> scala.collection.immutable.List.foreach(List.scala:333)
> The currently active SparkContext was created at:
> org.apache.spark.SparkContextSuite.$anonfun$new$154(SparkContextSuite.scala:1155)
> org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> org.scalatest.Transformer.apply(Transformer.scala:22)
> org.scalatest.Transformer.apply(Transformer.scala:20)
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190)
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62)
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> scala.collection.immutable.List.foreach(List.scala:333)
>   at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:118)
>   at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1887)
>   at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2575)
>   at org.apache.spark.SparkContext.addJar(SparkContext.scala:2008)
>   at org.apache.spark.SparkContext.addJar(SparkContext.scala:1928)
>   at org.apache.spark.SparkContextSuite.$anonfun$new$154(SparkContextSuite.scala:1156)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   ...
> - SPARK-33084: Add jar support Ivy URI -- test transitive value case insensitive *** FAILED ***
>   org.apache.spark.SparkException: Only one SparkContext should be running in this JVM (see SPARK-2243).The currently running SparkContext was created at:
> org.apache.spark.SparkContextSuite.$anonfun$new$154(SparkContextSuite.scala:1155)
> org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> org.scalatest.Transformer.apply(Transformer.scala:22)
> org.scalatest.Transformer.apply(Transformer.scala:20)
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190)
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62)
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> scala.collection.immutable.List.foreach(List.scala:333)
>   at org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2647)
>   at scala.Option.foreach(Option.scala:437)
>   at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2644)
>   at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2734)
>   at org.apache.spark.SparkContext.<init>(SparkContext.scala:95)
>   at org.apache.spark.SparkContextSuite.$anonfun$new$159(SparkContextSuite.scala:1166)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   ...
> - SPARK-34346: hadoop configuration priority for spark/hive/hadoop configs *** FAILED ***
>   org.apache.spark.SparkException: Only one SparkContext should be running in this JVM (see SPARK-2243).The currently running SparkContext was created at:
> org.apache.spark.SparkContextSuite.$anonfun$new$154(SparkContextSuite.scala:1155)
> org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> org.scalatest.Transformer.apply(Transformer.scala:22)
> org.scalatest.Transformer.apply(Transformer.scala:20)
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190)
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62)
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62)
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> scala.collection.immutable.List.foreach(List.scala:333)
>   at org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2647)
>   at scala.Option.foreach(Option.scala:437)
>   at org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2644)
>   at org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2734)
>   at org.apache.spark.SparkContext.<init>(SparkContext.scala:95)
>   at org.apache.spark.SparkContextSuite.$anonfun$new$164(SparkContextSuite.scala:1192)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   ...
> - SPARK-34225: addFile/addJar shouldn't further encode URI if a URI form string is passed *** FAILED ***
>   java.lang.NullPointerException:
>   at org.apache.spark.SparkContextSuite.$anonfun$new$166(SparkContextSuite.scala:1245)
>   at org.apache.spark.SparkContextSuite.$anonfun$new$166$adapted(SparkContextSuite.scala:1211)
>   at org.apache.spark.SparkFunSuite.withTempDir(SparkFunSuite.scala:208)
>   at org.apache.spark.SparkContextSuite.$anonfun$new$165(SparkContextSuite.scala:1211)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   ...
> {code}
>  but which case fails has a certain randomness



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org