You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by nikunjb <gi...@git.apache.org> on 2018/09/14 23:37:12 UTC

[GitHub] spark pull request #22423: [SPARK-25302][STREAMING] Checkpoint the reducedSt...

GitHub user nikunjb opened a pull request:

    https://github.com/apache/spark/pull/22423

    [SPARK-25302][STREAMING] Checkpoint the reducedStream in ReducedWindo…

    …wDStream so as to cut the lineage completely to parent
    
      ## What changes were proposed in this pull request?
    
      Dstream.reduceByKeyAndWindow() with inverse reduce functions eventually creates
      a ReducedWindowDStream but did not checkpoint it.
      When combined with the issue described in SPARK-25303,
      it results in the problem described in the post here:
    
      http://apache-spark-user-list.1001560.n3.nabble.com/DStream-reduceByKeyAndWindow-not-using-checkpointed-data-for-inverse-reducing-old-data-td33332.html
    
      This change will checkpoint reducedStream inside to cut the lineage.
      A separate patch is being created for the issue in SPARK-25303
    
      ## How was this patch tested?
    
      Running the existing Unit tests. A couple of existing unit test classes were
      written assuming they will not be used in serialization. Had to declare
      SparkContext as transient like other tests in order to make them work
      due to new checkpointing causing serialization to occur.
    
      ## What improvement does this patch make?
    
      Cuts the lineage to the parent DStream. When combined with the fix
      in SPARK-25303, unpersits the intermediate RDDs and Input DStreams that
      were being remembered for too long, resulting in much lower memory usage
      on the Executors. Observed from the DAG chart differences and
      data from the Storage tab on the Driver UI.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nikunjb/spark spark-25302

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22423.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22423
    
----
commit 36929ad2424861e1a66bb8c867b240e1aec8febc
Author: Nikunj Bansal <ni...@...>
Date:   2018-09-14T22:37:36Z

    [SPARK-25302][STREAMING] Checkpoint the reducedStream in ReducedWindowDStream so as to cut the lineage completely to parent
    
      ## What changes were proposed in this pull request?
    
      Dstream.reduceByKeyAndWindow() with inverse reduce functions eventually creates
      a ReducedWindowDStream but did not checkpoint it.
      When combined with the issue described in SPARK-25303,
      it results in the problem described in the post here:
    
      http://apache-spark-user-list.1001560.n3.nabble.com/DStream-reduceByKeyAndWindow-not-using-checkpointed-data-for-inverse-reducing-old-data-td33332.html
    
      This change will checkpoint reducedStream inside to cut the lineage.
      A separate patch is being created for the issue in SPARK-25303
    
      ## How was this patch tested?
    
      Running the existing Unit tests. A couple of unit test classes were
      written assuming they will not be used in serialization. Had to declare
      SparkContext as transient like other tests in order to make them work
      due to new checkpointing causing serialization to occur.
    
      ## What improvement does this patch make?
    
      Cuts the lineage to the parent DStream. When combined with the fix
      in SPARK-25303, unpersits the intermediate RDDs and Input DStreams that
      were being remembered for too long, resulting in much lower memory usage
      on the Executors. Observed from the DAG chart differences and
      data from the Storage tab on the Driver UI.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by nikunjb <gi...@git.apache.org>.
Github user nikunjb commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    @dongjoon-hyun @tdas Can you please help review this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22423: [SPARK-25302][STREAMING] Checkpoint the reducedStream in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22423
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org