You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zhengruifeng <gi...@git.apache.org> on 2018/04/02 04:07:44 UTC

[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...

GitHub user zhengruifeng opened a pull request:

    https://github.com/apache/spark/pull/20956

    [SPARK-23841][ML] NodeIdCache should unpersist the last cached nodeIdsForInstances

    ## What changes were proposed in this pull request?
    unpersist the last cached nodeIdsForInstances in `deleteAllCheckpoints`
    
    ## How was this patch tested?
    existing tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhengruifeng/spark NodeIdCache_cleanup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20956.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20956
    
----
commit 96529235a9ce7279ec3eee7ad58b4c7e3c8119ae
Author: Zheng RuiFeng <ru...@...>
Date:   2018-04-02T04:01:20Z

    init pr

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...

Posted by sujithjay <gi...@git.apache.org>.
Github user sujithjay commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20956#discussion_r178517357
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/NodeIdCache.scala ---
    @@ -95,7 +95,7 @@ private[spark] class NodeIdCache(
           splits: Array[Array[Split]]): Unit = {
         if (prevNodeIdsForInstances != null) {
           // Unpersist the previous one if one exists.
    -      prevNodeIdsForInstances.unpersist()
    +      prevNodeIdsForInstances.unpersist(false)
    --- End diff --
    
    Is this change required? Should not the call to `unpersist` remain blocking?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20956
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88805/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/20956
  
    @srowen  Could you please help reviewing this? Thanks in advance


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...

Posted by sujithjay <gi...@git.apache.org>.
Github user sujithjay commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20956#discussion_r178518257
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/NodeIdCache.scala ---
    @@ -166,9 +166,13 @@ private[spark] class NodeIdCache(
             }
           }
         }
    +    if (nodeIdsForInstances != null) {
    +      // Unpersist current one if one exists.
    +      nodeIdsForInstances.unpersist(false)
    +    }
         if (prevNodeIdsForInstances != null) {
           // Unpersist the previous one if one exists.
    -      prevNodeIdsForInstances.unpersist()
    +      prevNodeIdsForInstances.unpersist(false)
    --- End diff --
    
    Same question as above. `deleteAllCheckpoints` is blocking because it involves calls to `FileSystem.delete`. So, does it make sense to make the call to `unpersist` non-blocking? Am I missing something here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20956
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20956


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20956
  
    **[Test build #88805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88805/testReport)** for PR 20956 at commit [`9652923`](https://github.com/apache/spark/commit/96529235a9ce7279ec3eee7ad58b4c7e3c8119ae).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20956#discussion_r180064831
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/NodeIdCache.scala ---
    @@ -166,9 +166,13 @@ private[spark] class NodeIdCache(
             }
           }
         }
    +    if (nodeIdsForInstances != null) {
    +      // Unpersist current one if one exists.
    +      nodeIdsForInstances.unpersist(false)
    +    }
         if (prevNodeIdsForInstances != null) {
           // Unpersist the previous one if one exists.
    -      prevNodeIdsForInstances.unpersist()
    +      prevNodeIdsForInstances.unpersist(false)
    --- End diff --
    
    For now `deleteAllCheckpoints` is only called once in whole MLLIB, and current `unpsersit` of `prevNodeIdsForInstances` is in it. So I think we do not need to impl another method to unpersist datasets (like `PeriodicCheckpointer`)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20956
  
    **[Test build #88805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88805/testReport)** for PR 20956 at commit [`9652923`](https://github.com/apache/spark/commit/96529235a9ce7279ec3eee7ad58b4c7e3c8119ae).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20956
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1894/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20956#discussion_r180063562
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/NodeIdCache.scala ---
    @@ -95,7 +95,7 @@ private[spark] class NodeIdCache(
           splits: Array[Array[Split]]): Unit = {
         if (prevNodeIdsForInstances != null) {
           // Unpersist the previous one if one exists.
    -      prevNodeIdsForInstances.unpersist()
    +      prevNodeIdsForInstances.unpersist(false)
    --- End diff --
    
    This is not required, but it is usually safe to unpersist without blocking in MLLIB.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20956
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/20956
  
    Merged to master


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org