You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by shaofei007 <gi...@git.apache.org> on 2017/07/23 05:39:28 UTC

[GitHub] spark pull request #18718: [SPARK-21357][DStreams] FileInputDStream not remo...

GitHub user shaofei007 opened a pull request:

    https://github.com/apache/spark/pull/18718

    [SPARK-21357][DStreams] FileInputDStream not remove out of date RDD

    ## What changes were proposed in this pull request?
    
    ```DStreams
             class FileInputDStream
    
     [line 162]   protected[streaming] override def clearMetadata(time: Time) {
        batchTimeToSelectedFiles.synchronized {
          val oldFiles = batchTimeToSelectedFiles.filter(_._1 < (time - rememberDuration))
          batchTimeToSelectedFiles --= oldFiles.keys
    
    ```
    The above code does not remove the old generatedRDDs. "super.clearMetadata(time)" was added to the beginning of clearMetadata to remove the old generatedRDDs.
    
    
    ## How was this patch tested?
    
    At the end of clearMetadata, the testing code (print the number of generatedRDDs) was added to check the old RDDS were removed manually.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shaofei007/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18718
    
----
commit bded22e93072bae51978c4a1b08fdc873ecea80c
Author: Fei Shao <14...@qq.com>
Date:   2017-07-22T12:21:27Z

    [SPARK-21357][DStreams] FileInputDStream not remove out of date RDD
    
    ### What changes were proposed in this pull request?
    ```DStreams
       class FileInputDStream
    
      protected[streaming] override def clearMetadata(time: Time) {
        batchTimeToSelectedFiles.synchronized {
          val oldFiles = batchTimeToSelectedFiles.filter(_._1 < (time - rememberDuration))
          batchTimeToSelectedFiles --= oldFiles.keys
    
    ```
    The above code does not remove the old generatedRDDs. "super.clearMetadata(time)" was added to the beginning of clearMetadata to remove the old generatedRDDs.
    
    ## How was this patch tested
    At the end of clearMetadata, the testing code (print the number of generatedRDDs) was added to check the old RDDS were removed.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18718: [SPARK-21357][DStreams] FileInputDStream not remove out ...

Posted by shaofei007 <gi...@git.apache.org>.
Github user shaofei007 commented on the issue:

    https://github.com/apache/spark/pull/18718
  
    @tdas
     how do you think about this PR please?
    
    @asfgit
    Can we close it please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18718: [SPARK-21357][DStreams] FileInputDStream not remo...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18718


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18718: [SPARK-21357][DStreams] FileInputDStream not remove out ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18718
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18718: [SPARK-21357][DStreams] FileInputDStream not remove out ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18718
  
    **[Test build #3848 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3848/testReport)** for PR 18718 at commit [`5a34428`](https://github.com/apache/spark/commit/5a344283d1a961c04daadacad32c8695891a3266).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18718: [SPARK-21357][DStreams] FileInputDStream not remove out ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18718
  
    Seems reasonable, but, I am sort of not sure why it hasn't come up before. @tdas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18718: [SPARK-21357][DStreams] FileInputDStream not remove out ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18718
  
    Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18718: [SPARK-21357][DStreams] FileInputDStream not remove out ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18718
  
    @shaofei007 I actually think your fix is correct, so I wouldn't close it. The comment says: `  /** Clear the old time-to-files mappings along with old RDDs */` But only the superclass method clears the old RDDs. In the absence of any comments to the contrary, I'd merge this, at least to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18718: [SPARK-21357][DStreams] FileInputDStream not remove out ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18718
  
    **[Test build #3848 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3848/testReport)** for PR 18718 at commit [`5a34428`](https://github.com/apache/spark/commit/5a344283d1a961c04daadacad32c8695891a3266).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org