You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zhengruifeng <gi...@git.apache.org> on 2018/04/02 04:07:44 UTC
[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/20956
[SPARK-23841][ML] NodeIdCache should unpersist the last cached nodeIdsForInstances
## What changes were proposed in this pull request?
unpersist the last cached nodeIdsForInstances in `deleteAllCheckpoints`
## How was this patch tested?
existing tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zhengruifeng/spark NodeIdCache_cleanup
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20956.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20956
----
commit 96529235a9ce7279ec3eee7ad58b4c7e3c8119ae
Author: Zheng RuiFeng <ru...@...>
Date: 2018-04-02T04:01:20Z
init pr
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...
Posted by sujithjay <gi...@git.apache.org>.
Github user sujithjay commented on a diff in the pull request:
https://github.com/apache/spark/pull/20956#discussion_r178517357
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/NodeIdCache.scala ---
@@ -95,7 +95,7 @@ private[spark] class NodeIdCache(
splits: Array[Array[Split]]): Unit = {
if (prevNodeIdsForInstances != null) {
// Unpersist the previous one if one exists.
- prevNodeIdsForInstances.unpersist()
+ prevNodeIdsForInstances.unpersist(false)
--- End diff --
Is this change required? Should not the call to `unpersist` remain blocking?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20956
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88805/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...
Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/20956
@srowen Could you please help reviewing this? Thanks in advance
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...
Posted by sujithjay <gi...@git.apache.org>.
Github user sujithjay commented on a diff in the pull request:
https://github.com/apache/spark/pull/20956#discussion_r178518257
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/NodeIdCache.scala ---
@@ -166,9 +166,13 @@ private[spark] class NodeIdCache(
}
}
}
+ if (nodeIdsForInstances != null) {
+ // Unpersist current one if one exists.
+ nodeIdsForInstances.unpersist(false)
+ }
if (prevNodeIdsForInstances != null) {
// Unpersist the previous one if one exists.
- prevNodeIdsForInstances.unpersist()
+ prevNodeIdsForInstances.unpersist(false)
--- End diff --
Same question as above. `deleteAllCheckpoints` is blocking because it involves calls to `FileSystem.delete`. So, does it make sense to make the call to `unpersist` non-blocking? Am I missing something here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20956
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20956
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20956
**[Test build #88805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88805/testReport)** for PR 20956 at commit [`9652923`](https://github.com/apache/spark/commit/96529235a9ce7279ec3eee7ad58b4c7e3c8119ae).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...
Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/20956#discussion_r180064831
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/NodeIdCache.scala ---
@@ -166,9 +166,13 @@ private[spark] class NodeIdCache(
}
}
}
+ if (nodeIdsForInstances != null) {
+ // Unpersist current one if one exists.
+ nodeIdsForInstances.unpersist(false)
+ }
if (prevNodeIdsForInstances != null) {
// Unpersist the previous one if one exists.
- prevNodeIdsForInstances.unpersist()
+ prevNodeIdsForInstances.unpersist(false)
--- End diff --
For now `deleteAllCheckpoints` is only called once in whole MLLIB, and current `unpsersit` of `prevNodeIdsForInstances` is in it. So I think we do not need to impl another method to unpersist datasets (like `PeriodicCheckpointer`)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20956
**[Test build #88805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88805/testReport)** for PR 20956 at commit [`9652923`](https://github.com/apache/spark/commit/96529235a9ce7279ec3eee7ad58b4c7e3c8119ae).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20956
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1894/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20956: [SPARK-23841][ML] NodeIdCache should unpersist th...
Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/20956#discussion_r180063562
--- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/NodeIdCache.scala ---
@@ -95,7 +95,7 @@ private[spark] class NodeIdCache(
splits: Array[Array[Split]]): Unit = {
if (prevNodeIdsForInstances != null) {
// Unpersist the previous one if one exists.
- prevNodeIdsForInstances.unpersist()
+ prevNodeIdsForInstances.unpersist(false)
--- End diff --
This is not required, but it is usually safe to unpersist without blocking in MLLIB.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20956
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20956: [SPARK-23841][ML] NodeIdCache should unpersist the last ...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/20956
Merged to master
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org