You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zero323 <gi...@git.apache.org> on 2017/05/09 11:01:29 UTC

[GitHub] spark pull request #17922: [SPARK-2060][PYTHON][ML] Python API Changes for C...

GitHub user zero323 opened a pull request:

    https://github.com/apache/spark/pull/17922

    [SPARK-2060][PYTHON][ML] Python API Changes for Constrained Logistic Regression Params

    ## What changes were proposed in this pull request?
    
    - Add new `Params` to `pyspark.ml.classification.LogisticRegression`.
    - Add `toMatrix` method to `pyspark.ml.param.TypeConverters`.
    - Add `generate_multinomial_logistic_input` helper to `pyspark.ml.tests`.
    
    ## How was this patch tested?
    
    Unit tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zero323/spark SPARK-20601

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17922.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17922
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r123707614
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -170,6 +170,15 @@ def toVector(value):
             raise TypeError("Could not convert %s to vector" % value)
     
         @staticmethod
    +    def toMatrix(value):
    +        """
    +        Convert a value to ML Matrix, if possible
    --- End diff --
    
    ```ML``` -> ```MLlib```, ```MLlib``` is the only official name for both ```spark.mllib``` and ```spark.ml``` package.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r115497704
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -71,6 +71,34 @@
     ser = PickleSerializer()
     
     
    +def generate_multinomial_logistic_input(
    +        weights, x_mean, x_variance, add_intercept, n_points, seed=None):
    +    """Creates multinomial logistic dataset"""
    +
    +    if seed:
    +        np.random.seed(seed)
    +    n_features = x_mean.shape[0]
    +
    +    x = np.random.randn(n_points, n_features)
    +    x = x * np.sqrt(x_variance) + x_mean
    +
    +    if add_intercept:
    +        x = np.hstack([x, np.ones((n_points, 1))])
    +
    +    # Compute margins
    +    margins = np.hstack([np.zeros((n_points, 1)), x.dot(weights.T)])
    +    # Shift to avoid overflow and compute probs
    +    probs = np.exp(np.subtract(margins, margins.max(axis=1).reshape(n_points, -1)))
    +    # Compute cumulative prob
    +    cum_probs = np.cumsum(probs / probs.sum(axis=1).reshape(n_points, -1), axis=1)
    +    # Asign class
    --- End diff --
    
    "Assign class", though IMO you could also just do away with the comments in this section.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 closed the pull request at:

    https://github.com/apache/spark/pull/17922


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    **[Test build #76672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76672/testReport)** for PR 17922 at commit [`d4eeb0f`](https://github.com/apache/spark/commit/d4eeb0f756d5d29cbc4b4acd774007855d83503f).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    **[Test build #76674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76674/testReport)** for PR 17922 at commit [`550e165`](https://github.com/apache/spark/commit/550e165673e7f6c9052724fd3a93c4f38a7bcb3d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r123769816
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -170,6 +170,15 @@ def toVector(value):
             raise TypeError("Could not convert %s to vector" % value)
     
         @staticmethod
    +    def toMatrix(value):
    +        """
    +        Convert a value to ML Matrix, if possible
    --- End diff --
    
    This is not a big issue, but you can still refer the [clarification](https://spark.apache.org/docs/latest/ml-guide.html#announcement-dataframe-based-api-is-primary-api) in MLlib user guide to get the convention in MLlib.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76671/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r126769478
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -832,6 +860,96 @@ def test_logistic_regression(self):
             except OSError:
                 pass
     
    +    def logistic_regression_check_thresholds(self):
    +        self.assertIsInstance(
    +            LogisticRegression(threshold=0.5, thresholds=[0.5, 0.5]),
    +            LogisticRegressionModel
    +        )
    +
    +        self.assertRaisesRegexp(
    +            ValueError,
    +            "Logistic Regression getThreshold found inconsistent.*$",
    +            LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
    +        )
    +
    +    def test_binomial_logistic_regression_bounds(self):
    --- End diff --
    
    I agree this is probably overkill for testing this.  The functionality is already in Scala and should be tested there, here in python we are just setting the parameters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    **[Test build #76674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76674/testReport)** for PR 17922 at commit [`550e165`](https://github.com/apache/spark/commit/550e165673e7f6c9052724fd3a93c4f38a7bcb3d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    **[Test build #76756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76756/testReport)** for PR 17922 at commit [`aa219c6`](https://github.com/apache/spark/commit/aa219c66a41e40e072e27b5b1fae417c34d7be90).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r119992981
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -819,6 +847,84 @@ def logistic_regression_check_thresholds(self):
                 LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
             )
     
    +    def test_binomial_logistic_regression_bounds(self):
    --- End diff --
    
    Usually it's not need to write exactly the same test as Scala in PySpark, we can use a simple test with loading a dataset  or generating a very simple dataset and run constrained LR on it. You can refer test cases in [test.py](https://github.com/apache/spark/blob/master/python/pyspark/ml/tests.py) or other tests like [this](https://github.com/apache/spark/blob/master/python/pyspark/ml/classification.py#L200).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    **[Test build #76687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76687/testReport)** for PR 17922 at commit [`57faa5b`](https://github.com/apache/spark/commit/57faa5b18805d6bcd61a4cb34044c91a3c0b69e1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-2060][PYTHON][ML] Python API Changes for Constrai...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    **[Test build #76671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76671/testReport)** for PR 17922 at commit [`1c0cb74`](https://github.com/apache/spark/commit/1c0cb7406a74205315c441131710494f082ddbd3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r126766060
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -246,18 +246,55 @@ class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredicti
                        "be used in the model. Supported options: auto, binomial, multinomial",
                        typeConverter=TypeConverters.toString)
     
    +    lowerBoundsOnCoefficients = Param(Params._dummy(), "lowerBoundsOnCoefficients",
    +                                      "The lower bounds on coefficients if fitting under bound "
    +                                      "constrained optimization. The bound matrix must be "
    +                                      "compatible with the shape "
    +                                      "(1, number of features) for binomial regression, or "
    +                                      "(number of classes, number of features) "
    +                                      "for multinomial regression.",
    +                                      typeConverter=TypeConverters.toMatrix)
    +
    +    upperBoundsOnCoefficients = Param(Params._dummy(), "upperBoundsOnCoefficients",
    +                                      "The upper bounds on coefficients if fitting under bound "
    +                                      "constrained optimization. The bound matrix must be "
    +                                      "compatible with the shape "
    +                                      "(1, number of features) for binomial regression, or "
    +                                      "(number of classes, number of features) "
    +                                      "for multinomial regression.",
    +                                      typeConverter=TypeConverters.toMatrix)
    +
    +    lowerBoundsOnIntercepts = Param(Params._dummy(), "lowerBoundsOnIntercepts",
    +                                    "The lower bounds on intercepts if fitting under bound "
    +                                    "constrained optimization. The bounds vector size must be"
    +                                    "equal with 1 for binomial regression, or the number of"
    +                                    "lasses for multinomial regression.",
    +                                    typeConverter=TypeConverters.toVector)
    +
    +    upperBoundsOnIntercepts = Param(Params._dummy(), "upperBoundsOnIntercepts",
    +                                    "The upper bounds on intercepts if fitting under bound "
    +                                    "constrained optimization. The bound vector size must be "
    +                                    "equal with 1 for binomial regression, or the number of "
    +                                    "classes for multinomial regression.",
    +                                    typeConverter=TypeConverters.toVector)
    +
         @keyword_only
         def __init__(self, featuresCol="features", labelCol="label", predictionCol="prediction",
                      maxIter=100, regParam=0.0, elasticNetParam=0.0, tol=1e-6, fitIntercept=True,
                      threshold=0.5, thresholds=None, probabilityCol="probability",
                      rawPredictionCol="rawPrediction", standardization=True, weightCol=None,
    -                 aggregationDepth=2, family="auto"):
    +                 aggregationDepth=2, family="auto",
    +                 lowerBoundsOnCoefficients=None, upperBoundsOnCoefficients=None,
    --- End diff --
    
    should fill up the previous line before starting another, here and below


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    **[Test build #77705 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77705/testReport)** for PR 17922 at commit [`649bf28`](https://github.com/apache/spark/commit/649bf2843241f1ea7a4e6fd231355c4347118750).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r115555267
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -374,6 +415,48 @@ def getFamily(self):
             """
             return self.getOrDefault(self.family)
     
    +    @since("2.2.0")
    --- End diff --
    
    Probably. I've seen that Scala version has been targeted for 2.2.1 so who knows? But let's make 2.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    **[Test build #76671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76671/testReport)** for PR 17922 at commit [`1c0cb74`](https://github.com/apache/spark/commit/1c0cb7406a74205315c441131710494f082ddbd3).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    @BryanCutler @yanboliang @nchammas Thanks for all the comments. Unfortunately I don't have access to a hardware I can use for development at this moment, and most I likely I won't have in the upcoming weeks.  I going to close this PR, but I'd really appreciate if one of you could pick it up from here. TIA 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r115497473
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -374,6 +415,48 @@ def getFamily(self):
             """
             return self.getOrDefault(self.family)
     
    +    @since("2.2.0")
    --- End diff --
    
    Since we're voting on 2.2 now, I presume this will make it for 2.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Sure @yanboliang. Give me a sec.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77705/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r123711301
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -832,6 +860,96 @@ def test_logistic_regression(self):
             except OSError:
                 pass
     
    +    def logistic_regression_check_thresholds(self):
    +        self.assertIsInstance(
    +            LogisticRegression(threshold=0.5, thresholds=[0.5, 0.5]),
    +            LogisticRegressionModel
    +        )
    +
    +        self.assertRaisesRegexp(
    +            ValueError,
    +            "Logistic Regression getThreshold found inconsistent.*$",
    +            LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
    +        )
    +
    +    def test_binomial_logistic_regression_bounds(self):
    --- End diff --
    
    @zero323 We usually run PySpark MLlib test with loading a dataset from ```data/mllib/``` or manual generating a dummy/hard-coded dataset rather than rewrite the same test case as Scala. We keep PySpark test as simple as possible. You can refer [this test case](https://github.com/apache/spark/blob/master/python/pyspark/ml/tests.py#L664). Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r123766355
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -832,6 +860,96 @@ def test_logistic_regression(self):
             except OSError:
                 pass
     
    +    def logistic_regression_check_thresholds(self):
    +        self.assertIsInstance(
    +            LogisticRegression(threshold=0.5, thresholds=[0.5, 0.5]),
    +            LogisticRegressionModel
    +        )
    +
    +        self.assertRaisesRegexp(
    +            ValueError,
    +            "Logistic Regression getThreshold found inconsistent.*$",
    +            LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
    +        )
    +
    +    def test_binomial_logistic_regression_bounds(self):
    --- End diff --
    
    Example datasets are not that good for checking constraints, and generator seems like a better idea than creating large enough example by hand. I can of course remove it, if this is an issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76674/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r123768466
  
    --- Diff: python/pyspark/ml/tests.py ---
    @@ -832,6 +860,96 @@ def test_logistic_regression(self):
             except OSError:
                 pass
     
    +    def logistic_regression_check_thresholds(self):
    +        self.assertIsInstance(
    +            LogisticRegression(threshold=0.5, thresholds=[0.5, 0.5]),
    +            LogisticRegressionModel
    +        )
    +
    +        self.assertRaisesRegexp(
    +            ValueError,
    +            "Logistic Regression getThreshold found inconsistent.*$",
    +            LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
    +        )
    +
    +    def test_binomial_logistic_regression_bounds(self):
    --- End diff --
    
    For PySpark, we should only check the output is consistent with Scala. The most straight-forward way for this test should be loading data directly and run constraint LR on it:
    ```
    data_path = "data/mllib/sample_multiclass_classification_data.txt"
    df = spark.read.format("libsvm").load(data_path)
    ......
    ```
    This will make the test case simple and time-saving. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76687/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76672/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    **[Test build #76756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76756/testReport)** for PR 17922 at commit [`aa219c6`](https://github.com/apache/spark/commit/aa219c66a41e40e072e27b5b1fae417c34d7be90).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r126765747
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -246,18 +246,55 @@ class LogisticRegression(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredicti
                        "be used in the model. Supported options: auto, binomial, multinomial",
                        typeConverter=TypeConverters.toString)
     
    +    lowerBoundsOnCoefficients = Param(Params._dummy(), "lowerBoundsOnCoefficients",
    +                                      "The lower bounds on coefficients if fitting under bound "
    +                                      "constrained optimization. The bound matrix must be "
    +                                      "compatible with the shape "
    +                                      "(1, number of features) for binomial regression, or "
    +                                      "(number of classes, number of features) "
    +                                      "for multinomial regression.",
    +                                      typeConverter=TypeConverters.toMatrix)
    +
    +    upperBoundsOnCoefficients = Param(Params._dummy(), "upperBoundsOnCoefficients",
    +                                      "The upper bounds on coefficients if fitting under bound "
    +                                      "constrained optimization. The bound matrix must be "
    +                                      "compatible with the shape "
    +                                      "(1, number of features) for binomial regression, or "
    +                                      "(number of classes, number of features) "
    +                                      "for multinomial regression.",
    +                                      typeConverter=TypeConverters.toMatrix)
    --- End diff --
    
    I think you can condense this and the above text blocks some more


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    **[Test build #77705 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77705/testReport)** for PR 17922 at commit [`649bf28`](https://github.com/apache/spark/commit/649bf2843241f1ea7a4e6fd231355c4347118750).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r126767873
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -170,6 +170,15 @@ def toVector(value):
             raise TypeError("Could not convert %s to vector" % value)
     
         @staticmethod
    +    def toMatrix(value):
    +        """
    +        Convert a value to ML Matrix, if possible
    +        """
    +        if isinstance(value, Matrix):
    +            return value
    --- End diff --
    
    Is this method really necessary?  It's not actually converting anything, just checking the type.  If this wasn't here and the user put something other than a Matrix, what error would be raised?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76756/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    **[Test build #76687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76687/testReport)** for PR 17922 at commit [`57faa5b`](https://github.com/apache/spark/commit/57faa5b18805d6bcd61a4cb34044c91a3c0b69e1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17922: [SPARK-20601][PYTHON][ML] Python API Changes for ...

Posted by zero323 <gi...@git.apache.org>.
Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17922#discussion_r123765079
  
    --- Diff: python/pyspark/ml/param/__init__.py ---
    @@ -170,6 +170,15 @@ def toVector(value):
             raise TypeError("Could not convert %s to vector" % value)
     
         @staticmethod
    +    def toMatrix(value):
    +        """
    +        Convert a value to ML Matrix, if possible
    --- End diff --
    
    While I am aware of this, distinction between `ml.linalg` and `mllib.linalg`, is a common source of confusion for the PySpark users. Of course we could be more forgiving, and automatically convert objects to the required class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17922: [SPARK-20601][PYTHON][ML] Python API Changes for Constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17922
  
    **[Test build #76672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76672/testReport)** for PR 17922 at commit [`d4eeb0f`](https://github.com/apache/spark/commit/d4eeb0f756d5d29cbc4b4acd774007855d83503f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org