You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zhengruifeng <gi...@git.apache.org> on 2017/05/31 03:17:54 UTC

[GitHub] spark pull request #18152: [SPARK-20930][ML] Destroy broadcasted centers aft...

GitHub user zhengruifeng opened a pull request:

    https://github.com/apache/spark/pull/18152

    [SPARK-20930][ML]  Destroy broadcasted centers after computing cost in KMeans

    ## What changes were proposed in this pull request?
     Destroy broadcasted centers after computing cost
    ## How was this patch tested?
    existing tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhengruifeng/spark destroy_kmeans_model

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18152.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18152
    
----
commit 7d96ee85650cc75ddce7f06fe632def015861ec5
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2017-05-31T03:09:14Z

    create pr

commit 37369922c2048a90469ac010adb5e398e1c90304
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2017-05-31T03:09:38Z

    create pr

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    **[Test build #77728 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77728/testReport)** for PR 18152 at commit [`3fd52a8`](https://github.com/apache/spark/commit/3fd52a82c1cf8e7c1219c2a7dc9c93b3ee9a7a95).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18152: [SPARK-20930][ML] Destroy broadcasted centers aft...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18152#discussion_r119855432
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala ---
    @@ -320,6 +320,7 @@ class LocalLDAModel private[spark] (
     
             docBound
           }.sum()
    +    ElogbetaBc.destroy(blocking = false)
    --- End diff --
    
    @zhengruifeng but the broadcast isn't actually used. Its `.value` is called, locally, not from a distributed method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    @srowen  Thanks for pointing that out. Btw, I reviewed all calls of broadcast in ml again and found this instance also happens in `LDAModel.scala`. 
    I think now all broadcasted objects in ml will be destroyed after computation, except transform/predict/predictSoft/etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77570/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77728/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18152: [SPARK-20930][ML] Destroy broadcasted centers aft...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18152#discussion_r120030962
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala ---
    @@ -320,6 +320,7 @@ class LocalLDAModel private[spark] (
     
             docBound
           }.sum()
    +    ElogbetaBc.destroy(blocking = false)
    --- End diff --
    
    @srowen You are right. I removed the unnecessary broadcasting.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    **[Test build #77570 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77570/testReport)** for PR 18152 at commit [`3736992`](https://github.com/apache/spark/commit/37369922c2048a90469ac010adb5e398e1c90304).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    **[Test build #77570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77570/testReport)** for PR 18152 at commit [`3736992`](https://github.com/apache/spark/commit/37369922c2048a90469ac010adb5e398e1c90304).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18152: [SPARK-20930][ML] Destroy broadcasted centers aft...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18152


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    **[Test build #77618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77618/testReport)** for PR 18152 at commit [`ddc9ffb`](https://github.com/apache/spark/commit/ddc9ffb138f6c21ffc3f786a1b2ecde67381a832).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    **[Test build #77728 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77728/testReport)** for PR 18152 at commit [`3fd52a8`](https://github.com/apache/spark/commit/3fd52a82c1cf8e7c1219c2a7dc9c93b3ee9a7a95).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18152: [SPARK-20930][ML] Destroy broadcasted centers aft...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18152#discussion_r119767978
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala ---
    @@ -320,6 +320,7 @@ class LocalLDAModel private[spark] (
     
             docBound
           }.sum()
    +    ElogbetaBc.destroy(blocking = false)
    --- End diff --
    
    `getTopicDistributionMethod` method is only used in `ml.clustering.LDA#transform`, so I think this maybe useful if the model size is large.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    **[Test build #77618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77618/testReport)** for PR 18152 at commit [`ddc9ffb`](https://github.com/apache/spark/commit/ddc9ffb138f6c21ffc3f786a1b2ecde67381a832).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18152: [SPARK-20930][ML] Destroy broadcasted centers aft...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18152#discussion_r119568147
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala ---
    @@ -320,6 +320,7 @@ class LocalLDAModel private[spark] (
     
             docBound
           }.sum()
    +    ElogbetaBc.destroy(blocking = false)
    --- End diff --
    
    Looks good; how about in the `getTopicDistributionMethod` method too? I actually think that broadcast is pointless.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77618/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18152: [SPARK-20930][ML] Destroy broadcasted centers after comp...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18152
  
    Yes, I agree with that. It looks like there may be a similar instance in `GradientDescent.scala`, where `bcWeights` is used in a `treeAggregate` but then never destroyed.
    
    In every other similar instance I see, yes, the broadcast is destroyed after it's used to collect a result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org