You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by husseinhazimeh <gi...@git.apache.org> on 2016/06/29 19:16:54 UTC

[GitHub] spark pull request #13980: [SPARK-16198] [MLlib] [ML] Change access level of...

GitHub user husseinhazimeh opened a pull request:

    https://github.com/apache/spark/pull/13980

    [SPARK-16198] [MLlib] [ML] Change access level of prediction functions to public

    ## What changes were proposed in this pull request?
    
    Change the access level of the `predict` method in all ML predictors and `predictProbability` in probabilistic classifiers from protected/private to public. 
    
    ## Why?
    
    Please refer to the [JIRA ticket](https://issues.apache.org/jira/browse/SPARK-16198) for more details on the experiment performed and why this is a good idea. 
    
    **Summary:** The transform method of predictors in spark.ml has a relatively high latency when predicting single instances or small batches, which is mainly due to the overhead introduced by DataFrame operations. For a text classification task on the RCV1 datatset, changing the access level of the low-level `predict` method from protected to public and using it to make predictions reduced the latency of single predictions by three to four folds and that of batches by 50%. While the `transform` method is flexible and sufficient for general usage, exposing the low-level `predict` method to the public API can benefit many applications that require low-latency response. 
    
    
    
    ## How was this patch tested?
    
    All the current tests for predictors passed.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/husseinhazimeh/spark lowlatency

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13980.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13980
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13980: [SPARK-16198] [MLlib] [ML] Change access level of predic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13980
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13980: [SPARK-16198] [MLlib] [ML] Change access level of predic...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/13980
  
    Let's merge this into #8883 and continue there; this can be closed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13980: [SPARK-16198] [MLlib] [ML] Change access level of predic...

Posted by husseinhazimeh <gi...@git.apache.org>.
Github user husseinhazimeh commented on the issue:

    https://github.com/apache/spark/pull/13980
  
    @srowen can you please have a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13980: [SPARK-16198] [MLlib] [ML] Change access level of predic...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13980
  
    **[Test build #61631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61631/consoleFull)** for PR 13980 at commit [`02eced8`](https://github.com/apache/spark/commit/02eced879f603736b24be5e50de1d966e992800c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13980: [SPARK-16198] [MLlib] [ML] Change access level of predic...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/13980
  
    Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13980: [SPARK-16198] [MLlib] [ML] Change access level of predic...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13980
  
    **[Test build #61631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61631/consoleFull)** for PR 13980 at commit [`02eced8`](https://github.com/apache/spark/commit/02eced879f603736b24be5e50de1d966e992800c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13980: [SPARK-16198] [MLlib] [ML] Change access level of predic...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/13980
  
    I can't OK this, but can test it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13980: [SPARK-16198] [MLlib] [ML] Change access level of predic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13980
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61631/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13980: [SPARK-16198] [MLlib] [ML] Change access level of...

Posted by husseinhazimeh <gi...@git.apache.org>.
Github user husseinhazimeh closed the pull request at:

    https://github.com/apache/spark/pull/13980


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13980: [SPARK-16198] [MLlib] [ML] Change access level of predic...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/13980
  
    Is this a duplicate of [SPARK-10884](https://issues.apache.org/jira/browse/SPARK-10884)? The PR is here: https://github.com/apache/spark/pull/8883


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13980: [SPARK-16198] [MLlib] [ML] Change access level of predic...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13980
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13980: [SPARK-16198] [MLlib] [ML] Change access level of predic...

Posted by husseinhazimeh <gi...@git.apache.org>.
Github user husseinhazimeh commented on the issue:

    https://github.com/apache/spark/pull/13980
  
    Thanks @sethah ! I had no idea this existed. As you've mentioned in your comment in PR #8883, exposing the prediction methods to the public API can be handy for ensemble methods, and now there is an extra reason: latency.
    
    I'm not sure about the merge conflicts in #8883 but they're probably due to the new unit tests introduced. Do you think new unit tests are needed? The current tests for `transform` are already invoking `predict` internally which can ensure correctness..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org