You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2014/12/15 20:25:39 UTC

[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/3704

    [SPARK-4826] Fix generation of temp file names in WAL tests

    This PR is another approach for fixing SPARK-4826, an issue where a bug in how we generate temp. file names was causing spurious test failures in the write ahead log suites.
    
    Closes #3695.
    Closes #3701.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark SPARK-4826

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3704
    
----
commit 86c1944fdaf16dc2e3a758bbd6f5a84c2d925535
Author: Josh Rosen <jo...@databricks.com>
Date:   2014-12-15T18:59:40Z

    Revert "HOTFIX: Disabling failing block manager test"
    
    This reverts commit 4c0673879b5c504797dafb11607d14b04c1bf47d.

commit 93629194b4756229a75914d8d80c10b138ce7500
Author: Josh Rosen <jo...@databricks.com>
Date:   2014-12-15T19:23:27Z

    [SPARK-4826] Fix bug in generation of temp file names. in WAL suites.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3704#discussion_r21851620
  
    --- Diff: streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala ---
    @@ -44,7 +43,7 @@ class WriteAheadLogSuite extends FunSuite with BeforeAndAfter {
       before {
         tempDir = Files.createTempDir()
         testDir = tempDir.toString
    -    testFile = new File(tempDir, Random.nextString(10)).toString
    +    testFile = new File(tempDir, "testFile").toString
    --- End diff --
    
    This is kind of an unrelated change, but I wanted to remove this `Random.nextString()` call since it seemed confusing and didn't seem to serve any obvious purpose, since the `tempDir` is re-created before each test anyways.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67079193
  
    Alright, I'm going to merge this.  Thanks for looking this over!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3704#discussion_r21852089
  
    --- Diff: streaming/src/test/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDDSuite.scala ---
    @@ -38,36 +38,42 @@ class WriteAheadLogBackedBlockRDDSuite extends FunSuite with BeforeAndAfterAll {
       var blockManager: BlockManager = null
       var dir: File = null
     
    +  override def beforeEach(): Unit = {
    +    dir = Files.createTempDir()
    +  }
    +
    +  override def afterEach(): Unit = {
    +    dir.delete()
    --- End diff --
    
    Good catch; I've fixed this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67068703
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24467/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67054336
  
    +1. The latest changes look good. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3704#discussion_r21849567
  
    --- Diff: streaming/src/test/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDDSuite.scala ---
    @@ -137,7 +137,12 @@ class WriteAheadLogBackedBlockRDDSuite extends FunSuite with BeforeAndAfterAll {
           blockIds: Seq[BlockId]
         ): Seq[WriteAheadLogFileSegment] = {
         require(blockData.size === blockIds.size)
    -    val writer = new WriteAheadLogWriter(new File(dir, Random.nextString(10)).toString, hadoopConf)
    +    val logFilePath = {
    +      val f = File.createTempFile("wal", null, dir)
    +      assert(f.delete())
    --- End diff --
    
    This might look race prone (deleting a file and hoping that someone else won't come along and write it in the meantime), but it should be safe because:
    
    1. Different Jenkins builds will have different temp directories (`dir`).
    2. Within a JVM, multiple calls to `createTempFile` will never return the same pathname ([see Javadoc](http://docs.oracle.com/javase/7/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String,%20java.io.File))).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3704#discussion_r21851601
  
    --- Diff: streaming/src/test/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDDSuite.scala ---
    @@ -38,36 +38,42 @@ class WriteAheadLogBackedBlockRDDSuite extends FunSuite with BeforeAndAfterAll {
       var blockManager: BlockManager = null
       var dir: File = null
     
    +  override def beforeEach(): Unit = {
    +    dir = Files.createTempDir()
    +  }
    +
    +  override def afterEach(): Unit = {
    +    dir.delete()
    --- End diff --
    
    This doesn't work if the dir is not empty. You could use `Utils.createTempDir()` and, optionally, `Utils.deleteRecursively()` (since `createTempDir` already takes care of that for you).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67052541
  
    > This precludes parallel execution of the tests within a single copy of the write-ahead log suites
    
    I assume that frameworks handle that automatically (e.g. by having multiple instances of the test class), otherwise you could never parallelize tests that use "before" initializers. I'm pretty sure that works as intented at least with JUnit, but not super familiar with scalatest internals.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67055659
  
      [Test build #24467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24467/consoleFull) for   PR 3704 at commit [`f2307f5`](https://github.com/apache/spark/commit/f2307f55134cb14beac42c84c330304926e8d5d6).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67050050
  
      [Test build #24464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24464/consoleFull) for   PR 3704 at commit [`9362919`](https://github.com/apache/spark/commit/93629194b4756229a75914d8d80c10b138ce7500).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67054841
  
      [Test build #24466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24466/consoleFull) for   PR 3704 at commit [`a693ddb`](https://github.com/apache/spark/commit/a693ddb1f8fb796337f1aee3c81d3fb7537888a1).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67069732
  
    Looks like a robust approach to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67067815
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24466/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67052062
  
    The root issue is that this code is trying to return a unique file system path that meets two conditions:
    
    - The path does not correspond to an existing file.
    - Files / directories created at that path will be cleaned up by the test suite.
    
    I think this was the intent expressed by the original `Random.nextString(10)` code.  One approach would be to have the _directory_ be random and per-test and to just use a fixed filename within that directory.  This precludes parallel execution of the tests within a single copy of the write-ahead log suites, but I don't think we're pursuing that type of parallelism right now so I don't think that will be a huge deal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/3704


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67067803
  
      [Test build #24466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24466/consoleFull) for   PR 3704 at commit [`a693ddb`](https://github.com/apache/spark/commit/a693ddb1f8fb796337f1aee3c81d3fb7537888a1).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class WriteAheadLogBackedBlockRDDSuite extends FunSuite with BeforeAndAfterAll with BeforeAndAfterEach `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67051178
  
    I buy your explanation, given the javadoc, so this LGTM. But I think a cleaner approach that doesn't require reasoning like that to convince people would be to just use a different temp dir per test (i.e. use `BeforeAndAfter` and create the dir in the `before` block).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67054165
  
    Alright, I've simplified things to move the temp. dir creation to `beforeEach` instead of `beforeAll` and to use a fixed filename within that directory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67068688
  
      [Test build #24467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24467/consoleFull) for   PR 3704 at commit [`f2307f5`](https://github.com/apache/spark/commit/f2307f55134cb14beac42c84c330304926e8d5d6).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class WriteAheadLogBackedBlockRDDSuite extends FunSuite with BeforeAndAfterAll with BeforeAndAfterEach `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67063087
  
      [Test build #24464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24464/consoleFull) for   PR 3704 at commit [`9362919`](https://github.com/apache/spark/commit/93629194b4756229a75914d8d80c10b138ce7500).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67050710
  
    This looks good to me, though the approach does not make it obvious why this approach was chosen (of course you can figure out in this context, but imagine reading this code a year later). 
    
    I think the other two are slightly simpler approaches by ensuring unique names on file creation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4826] Fix generation of temp file names...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3704#issuecomment-67063097
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24464/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org