You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2017/03/20 10:12:59 UTC

[GitHub] spark pull request #17360: [WIP][SPARK-20029][ML] ML LinearRegression suppor...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/17360

    [WIP][SPARK-20029][ML] ML LinearRegression supports bound constrained optimization.

    ## What changes were proposed in this pull request?
    MLlib ```LinearRegression``` should support bound constrained optimization. Users can add bound constraints to coefficients to make the solver produce solution in the specified range.
    Under the hood, we call breeze [```L-BFGS-B```](https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/LBFGSB.scala) as the solver for bound constrained optimization. And we only support L2 regularization currently.
    
    ## How was this patch tested?
    Unit tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-20029

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17360.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17360
    
----
commit aa7e7684d167315ee04d3528ff3e76e9d8e407f9
Author: Yanbo Liang <yb...@gmail.com>
Date:   2017-03-20T10:09:39Z

    ML LinearRegression supports bound constrained optimization.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [WIP][SPARK-20029][ML] ML LinearRegression supports boun...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [WIP][SPARK-20029][ML] ML LinearRegression supports boun...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74876/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4225/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    @sethah I left some questions on [SPARK-17136](https://issues.apache.org/jira/browse/SPARK-17136). I think the main question we should figure out is whether we still expose the optimizer params as the estimator params after SPARK-17136. I'm more prefer to keep these params in estimators, make the optimizer layer as an internal API, and users can register their own optimizer implementation such as the data source support. Since I found this is more aligned with the original [ML pipeline design](https://docs.google.com/document/d/1rVwXRjWKfIb-7PI6b86ipytwbUH7irSNLF1_6dLmh8o/edit#) which stores params outside a pipeline component.
    So I think this PR is not conflict with SPARK-17136 and can work parallel. I'm also open to hear your thoughts. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    **[Test build #97789 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97789/testReport)** for PR 17360 at commit [`5af16cb`](https://github.com/apache/spark/commit/5af16cba0317d81d5b80d4ba5021a3436733ae14).
     * This patch **fails build dependency tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [WIP][SPARK-20029][ML] ML LinearRegression supports boun...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    **[Test build #74876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74876/testReport)** for PR 17360 at commit [`aa7e768`](https://github.com/apache/spark/commit/aa7e7684d167315ee04d3528ff3e76e9d8e407f9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74979/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [WIP][SPARK-20029][ML] ML LinearRegression supports boun...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    **[Test build #74876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74876/testReport)** for PR 17360 at commit [`aa7e768`](https://github.com/apache/spark/commit/aa7e7684d167315ee04d3528ff3e76e9d8e407f9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    **[Test build #74979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74979/testReport)** for PR 17360 at commit [`5af16cb`](https://github.com/apache/spark/commit/5af16cba0317d81d5b80d4ba5021a3436733ae14).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [WIP][SPARK-20029][ML] ML LinearRegression supports boun...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    I don't think this is the best approach. We're further confounding the algorithm API with parameters of the optimizer used to fit the algorithm.
    
    I strongly prefer to put more effort into getting this right via [SPARK-17136](https://issues.apache.org/jira/browse/SPARK-17136). For what it's worth, I have an initial PR basically ready that provides an API that makes adding this functionality trivial.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    **[Test build #97747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97747/testReport)** for PR 17360 at commit [`5af16cb`](https://github.com/apache/spark/commit/5af16cba0317d81d5b80d4ba5021a3436733ae14).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4251/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    **[Test build #97754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97754/testReport)** for PR 17360 at commit [`5af16cb`](https://github.com/apache/spark/commit/5af16cba0317d81d5b80d4ba5021a3436733ae14).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    **[Test build #74979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74979/testReport)** for PR 17360 at commit [`5af16cb`](https://github.com/apache/spark/commit/5af16cba0317d81d5b80d4ba5021a3436733ae14).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    **[Test build #97754 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97754/testReport)** for PR 17360 at commit [`5af16cb`](https://github.com/apache/spark/commit/5af16cba0317d81d5b80d4ba5021a3436733ae14).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    **[Test build #97747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97747/testReport)** for PR 17360 at commit [`5af16cb`](https://github.com/apache/spark/commit/5af16cba0317d81d5b80d4ba5021a3436733ae14).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17360: [SPARK-20029][ML] ML LinearRegression supports bound con...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/17360
  
    @yanboliang Thanks for your feedback! The design of the optimizer interface, or even whether it should be included at all, is definitely open for discussion and your suggestions are much appreciated. If SPARK-17136 proceeds as you suggest (internal optimization API that allows users to register optimizers) then it is possible that this PR does not conflict with that JIRA (though I don't know about the details of that, so even that I'm not sure of). However, that matter is far from settled. If we end up deciding to provide the external optimizer API as is currently suggested in that JIRA, then these two _do_ conflict. If we add the ability to specify parameter bounds on the estimator, then add an optimizer API, we have added yet more optimizer parameters to the estimator that can conflict with parameters of the optimizer provided to the estimator.
    
    My point is that I think these are two competing approaches and we should settle on one over the other before we make API changes that cannot be undone. I'm open to potentially changing the design of SPARK-17136, but we need to decide on something first. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org