You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2018/05/06 10:28:27 UTC

[GitHub] spark pull request #21251: [SPARK-10878][core] Fix race condition when multi...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/21251

    [SPARK-10878][core] Fix race condition when multiple clients resolves artifacts at the same time

    ## What changes were proposed in this pull request?
    
    When multiple clients attempt to resolve artifacts via the `--packages` parameter, they could run into race condition when they each attempt to modify the dummy `org.apache.spark-spark-submit-parent-default.xml` file created in the default ivy cache dir.
    This PR changes the behavior to encode UUID in the dummy module descriptor so each client will operate on a different resolution file in the ivy cache dir. In addition, this patch changes the behavior of when and which resolution files are cleaned to prevent accumulation of resolution files in the default ivy cache dir.
    
    Since this PR is a successor of #18801, close #18801. Many codes were ported from #11494. **Many efforts were put here. I think this PR should credit to @Victsm .**
    
    ## How was this patch tested?
    
    added UT into `SparkSubmitUtilsSuite`
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-10878

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21251.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21251
    
----
commit 949ec1db20a99443dcc596f6e348170cb96f0124
Author: Kazuaki Ishizaki <is...@...>
Date:   2018-05-06T10:22:06Z

    initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    LGTM. Merging to master / 2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    **[Test build #90405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90405/testReport)** for PR 21251 at commit [`9a6377b`](https://github.com/apache/spark/commit/9a6377bf5b5943db98900f5dd18a0222cd699d58).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21251: [SPARK-10878][core] Fix race condition when multi...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21251#discussion_r186566903
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -1204,7 +1205,36 @@ private[spark] object SparkSubmitUtils {
     
       /** A nice function to use in tests as well. Values are dummy strings. */
       def getModuleDescriptor: DefaultModuleDescriptor = DefaultModuleDescriptor.newDefaultInstance(
    -    ModuleRevisionId.newInstance("org.apache.spark", "spark-submit-parent", "1.0"))
    +    // Include UUID in module name, so multiple clients resolving maven coordinate at the same time
    +    // do not modify the same resolution file concurrently.
    +    ModuleRevisionId.newInstance("org.apache.spark",
    +      s"spark-submit-parent-${UUID.randomUUID.toString}",
    +      "1.0"))
    +
    +  /**
    +   * clear ivy resolution from current launch. The resolution file is usually at
    +   * ~/.ivy2/org.apache.spark-spark-submit-parent-$UUID-default.xml,
    +   * ~/.ivy2/resolved-org.apache.spark-spark-submit-parent-$UUID-1.0.xml, and
    +   * ~/.ivy2/resolved-org.apache.spark-spark-submit-parent-$UUID-1.0.properties.
    +   * Since each launch will have its own resolution files created, delete them after
    +   * each resolution to prevent accumulation of these files in the ivy cache dir.
    +   */
    +  private def clearIvyResolutionFiles(
    +      mdId: ModuleRevisionId,
    +      ivySettings: IvySettings,
    +      ivyConfName: String): Unit = {
    +    val currentResolutionFiles = Seq[File](
    +      new File(ivySettings.getDefaultCache,
    --- End diff --
    
    nit: you could move `new File(ivySettings.getDefaultCache` to the foreach loop instead.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3069/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2963/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21251: [SPARK-10878][core] Fix race condition when multi...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21251#discussion_r186567456
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -1204,7 +1205,36 @@ private[spark] object SparkSubmitUtils {
     
       /** A nice function to use in tests as well. Values are dummy strings. */
       def getModuleDescriptor: DefaultModuleDescriptor = DefaultModuleDescriptor.newDefaultInstance(
    -    ModuleRevisionId.newInstance("org.apache.spark", "spark-submit-parent", "1.0"))
    +    // Include UUID in module name, so multiple clients resolving maven coordinate at the same time
    +    // do not modify the same resolution file concurrently.
    +    ModuleRevisionId.newInstance("org.apache.spark",
    +      s"spark-submit-parent-${UUID.randomUUID.toString}",
    +      "1.0"))
    +
    +  /**
    +   * clear ivy resolution from current launch. The resolution file is usually at
    --- End diff --
    
    nit: start comment with capital letter.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3062/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    **[Test build #90273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90273/testReport)** for PR 21251 at commit [`949ec1d`](https://github.com/apache/spark/commit/949ec1db20a99443dcc596f6e348170cb96f0124).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90396/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90273/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    **[Test build #90273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90273/testReport)** for PR 21251 at commit [`949ec1d`](https://github.com/apache/spark/commit/949ec1db20a99443dcc596f6e348170cb96f0124).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90405/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21251: [SPARK-10878][core] Fix race condition when multi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21251


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21251: [SPARK-10878][core] Fix race condition when multi...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21251#discussion_r186567342
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala ---
    @@ -256,4 +256,20 @@ class SparkSubmitUtilsSuite extends SparkFunSuite with BeforeAndAfterAll {
           assert(jarPath.indexOf("mydep") >= 0, "should find dependency")
         }
       }
    +
    +  test("SPARK-10878: test resolution files cleaned after resolving artifact") {
    +    val main = new MavenCoordinate("my.great.lib", "mylib", "0.1")
    +
    +    IvyTestUtils.withRepository(main, None, None) { repo =>
    +      val ivySettings = SparkSubmitUtils.buildIvySettings(Some(repo), Some(tempIvyPath))
    +      val jarPath = SparkSubmitUtils.resolveMavenCoordinates(
    +        main.toString,
    +        ivySettings,
    +        isTest = true)
    +      val r = """.*org.apache.spark-spark-submit-parent-.*""".r
    +      assert(ivySettings.getDefaultCache.listFiles.map(_.getName)
    +        .forall { case n @ r() => false case _ => true },
    --- End diff --
    
    nit: I think using `!<list of files>.exists(r.findFirstIn(_).isDefined)` would be slightly clearer than you version.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    **[Test build #90396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90396/testReport)** for PR 21251 at commit [`9a6377b`](https://github.com/apache/spark/commit/9a6377bf5b5943db98900f5dd18a0222cd699d58).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    cc @jiangxb1987 @vanzin


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    **[Test build #90396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90396/testReport)** for PR 21251 at commit [`9a6377b`](https://github.com/apache/spark/commit/9a6377bf5b5943db98900f5dd18a0222cd699d58).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    cc  @gatorsmile


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21251: [SPARK-10878][core] Fix race condition when multiple cli...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21251
  
    **[Test build #90405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90405/testReport)** for PR 21251 at commit [`9a6377b`](https://github.com/apache/spark/commit/9a6377bf5b5943db98900f5dd18a0222cd699d58).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org