You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by JoshRosen <gi...@git.apache.org> on 2015/02/16 22:45:29 UTC

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/4633

    [SPARK-1600] Refactor FileInputStream tests to remove Thread.sleep() calls and SystemClock usage (branch-1.2 backport)

    (This PR backports #3801 into `branch-1.2` (1.2.2))
    
    This patch refactors Spark Streaming's FileInputStream tests to remove uses of Thread.sleep() and SystemClock, which should hopefully resolve some longstanding flakiness in these tests (see SPARK-1600).
    
    Key changes:
    
    - Modify FileInputDStream to use the scheduler's Clock instead of System.currentTimeMillis(); this allows it to be tested using ManualClock.
    - Fix a synchronization issue in ManualClock's `currentTime` method.
    - Add a StreamingTestWaiter class which allows callers to block until a certain number of batches have finished.
    - Change the FileInputStream tests so that files' modification times are manually set based off of ManualClock; this eliminates many Thread.sleep calls.
    - Update these tests to use the withStreamingContext fixture.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark spark-1600-b12-backport

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4633.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4633
    
----
commit e5d3dc4cfb9c890c2685f2c5ce624cb23f336f9b
Author: Josh Rosen <jo...@databricks.com>
Date:   2015-01-06T08:31:19Z

    [SPARK-1600] Refactor FileInputStream tests to remove Thread.sleep() calls and SystemClock usage
    
    This patch refactors Spark Streaming's FileInputStream tests to remove uses of Thread.sleep() and SystemClock, which should hopefully resolve some longstanding flakiness in these tests (see SPARK-1600).
    
    Key changes:
    
    - Modify FileInputDStream to use the scheduler's Clock instead of System.currentTimeMillis(); this allows it to be tested using ManualClock.
    - Fix a synchronization issue in ManualClock's `currentTime` method.
    - Add a StreamingTestWaiter class which allows callers to block until a certain number of batches have finished.
    - Change the FileInputStream tests so that files' modification times are manually set based off of ManualClock; this eliminates many Thread.sleep calls.
    - Update these tests to use the withStreamingContext fixture.
    
    Author: Josh Rosen <jo...@databricks.com>
    
    Closes #3801 from JoshRosen/SPARK-1600 and squashes the following commits:
    
    e4494f4 [Josh Rosen] Address a potential race when setting file modification times
    8340bd0 [Josh Rosen] Use set comparisons for output.
    0b9c252 [Josh Rosen] Fix some ManualClock usage problems.
    1cc689f [Josh Rosen] ConcurrentHashMap -> SynchronizedMap
    db26c3a [Josh Rosen] Use standard timeout in ScalaTest `eventually` blocks.
    3939432 [Josh Rosen] Rename StreamingTestWaiter to BatchCounter
    0b9c3a1 [Josh Rosen] Wait for checkpoint to complete
    863d71a [Josh Rosen] Remove Thread.sleep that was used to make task run slowly
    b4442c3 [Josh Rosen] batchTimeToSelectedFiles should be thread-safe
    15b48ee [Josh Rosen] Replace several TestWaiter methods w/ ScalaTest eventually.
    fffc51c [Josh Rosen] Revert "Remove last remaining sleep() call"
    dbb8247 [Josh Rosen] Remove last remaining sleep() call
    566a63f [Josh Rosen] Fix log message and comment typos
    da32f3f [Josh Rosen] Fix log message and comment typos
    3689214 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-1600
    c8f06b1 [Josh Rosen] Remove Thread.sleep calls in FileInputStream CheckpointSuite test.
    d4f2d87 [Josh Rosen] Refactor file input stream tests to not rely on SystemClock.
    dda1403 [Josh Rosen] Add StreamingTestWaiter class.
    3c3efc3 [Josh Rosen] Synchronize `currentTime` in ManualClock
    a95ddc4 [Josh Rosen] Modify FileInputDStream to use Clock class.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4633#issuecomment-74591552
  
    Since this backport passes tests, I'm going to merge it to `branch-1.2` (1.2.2) to see if this reduces flakiness there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4633#issuecomment-74589708
  
      [Test build #27577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27577/consoleFull) for   PR 4633 at commit [`e5d3dc4`](https://github.com/apache/spark/commit/e5d3dc4cfb9c890c2685f2c5ce624cb23f336f9b).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen closed the pull request at:

    https://github.com/apache/spark/pull/4633


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4633#issuecomment-74577468
  
      [Test build #27577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27577/consoleFull) for   PR 4633 at commit [`e5d3dc4`](https://github.com/apache/spark/commit/e5d3dc4cfb9c890c2685f2c5ce624cb23f336f9b).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4633#issuecomment-74589724
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27577/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org