You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangmiao1981 <gi...@git.apache.org> on 2017/01/24 19:43:55 UTC

[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

GitHub user wangmiao1981 opened a pull request:

    https://github.com/apache/spark/pull/16694

    [SPARK-19336][ML][Pyspark]: LinearSVC Python API 

    ## What changes were proposed in this pull request?
    
    Add Python API for the newly added LinearSVC algorithm.
    
    ## How was this patch tested?
    
    Add new doc string test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangmiao1981/spark ser

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16694.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16694
    
----
commit 020f6fcd821a15a98201aef5541c0e040a1e1e79
Author: wm624@hotmail.com <wm...@hotmail.com>
Date:   2017-01-24T05:51:20Z

    linearsvm python initial checkin

commit f5c9856b3bce096be6c7f39e9869d662c8d5bed2
Author: wm624@hotmail.com <wm...@hotmail.com>
Date:   2017-01-24T07:33:16Z

    check in doc test

commit 605c102349ce81fbda229cfdef86dea791024edf
Author: wm624@hotmail.com <wm...@hotmail.com>
Date:   2017-01-24T07:36:12Z

    add shared param

commit abafaebdaace472ea643d3d7f1457e58d5b37831
Author: wm624@hotmail.com <wm...@hotmail.com>
Date:   2017-01-24T19:41:47Z

    add a negative test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71946/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    **[Test build #71948 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71948/testReport)** for PR 16694 at commit [`2980e67`](https://github.com/apache/spark/commit/2980e67d3415df2b810a9df9b96f2a5402c5c490).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16694#discussion_r98309772
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -60,6 +61,137 @@ def numClasses(self):
     
     
     @inherit_doc
    +class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    +                HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, HasStandardization,
    +                HasThreshold, HasWeightCol, HasAggregationDepth, JavaMLWritable, JavaMLReadable):
    +    """
    +    Linear SVM Classifier (https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
    +    This binary classifier optimizes the Hinge Loss using the OWLQN optimizer.
    +
    +    >>> from pyspark.sql import Row
    +    >>> from pyspark.ml.linalg import Vectors
    +    >>> bdf = sc.parallelize([
    +    ...     Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)),
    +    ...     Row(label=0.0, weight=2.0, features=Vectors.sparse(1, [], []))]).toDF()
    +    >>> svm = LinearSVC(maxIter=5, regParam=0.01, weightCol="weight")
    +    >>> model = svm.fit(bdf)
    +    >>> model.coefficients
    +    DenseVector([1.909])
    +    >>> model.intercept
    +    -1.0045358384178
    +    >>> model.numClasses
    +    2
    +    >>> model.numFeatures
    +    1
    +    >>> test0 = sc.parallelize([Row(features=Vectors.dense(-1.0))]).toDF()
    +    >>> result = model.transform(test0).head()
    +    >>> result.prediction
    +    0.0
    +    >>> result.rawPrediction
    +    DenseVector([2.9135, -2.9135])
    +    >>> test1 = sc.parallelize([Row(features=Vectors.sparse(1, [0], [1.0]))]).toDF()
    +    >>> model.transform(test1).head().prediction
    +    1.0
    +    >>> svm.setParams("vector")
    --- End diff --
    
    I know, there are some not great examples to follow.  It'd be nice to clean those out sometime...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by wangmiao1981 <gi...@git.apache.org>.
Github user wangmiao1981 commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    cc @hhbyyh Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16694#discussion_r98141071
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -60,6 +61,137 @@ def numClasses(self):
     
     
     @inherit_doc
    +class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    +                HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, HasStandardization,
    +                HasThreshold, HasWeightCol, HasAggregationDepth, JavaMLWritable, JavaMLReadable):
    +    """
    +    Linear SVM Classifier (https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
    --- End diff --
    
    Have you tried generating the docs?  Check out other examples to see how to do links.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by wangmiao1981 <gi...@git.apache.org>.
Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16694#discussion_r98247261
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -60,6 +61,137 @@ def numClasses(self):
     
     
     @inherit_doc
    +class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    +                HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, HasStandardization,
    +                HasThreshold, HasWeightCol, HasAggregationDepth, JavaMLWritable, JavaMLReadable):
    +    """
    +    Linear SVM Classifier (https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
    --- End diff --
    
    OK. I will fix it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    **[Test build #71946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71946/testReport)** for PR 16694 at commit [`98bd7e7`](https://github.com/apache/spark/commit/98bd7e77161e249a028e18ebfe19898a9b8952ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71948/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16694#discussion_r98141074
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -60,6 +61,137 @@ def numClasses(self):
     
     
     @inherit_doc
    +class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    +                HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, HasStandardization,
    +                HasThreshold, HasWeightCol, HasAggregationDepth, JavaMLWritable, JavaMLReadable):
    +    """
    +    Linear SVM Classifier (https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
    +    This binary classifier optimizes the Hinge Loss using the OWLQN optimizer.
    +
    +    >>> from pyspark.sql import Row
    +    >>> from pyspark.ml.linalg import Vectors
    +    >>> bdf = sc.parallelize([
    +    ...     Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)),
    --- End diff --
    
    I'd simplify this example since it is going to be part of the documentation:
    * Remove "weight"
    * Just use dense vectors to make the doc clearer.  Sparse vectors are tested elsewhere for Python and should be tested in Scala for LinearSVC (for which I'll make a JIRA).
    * Make the feature vectors be length 2 or 3


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    **[Test build #71945 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71945/testReport)** for PR 16694 at commit [`abafaeb`](https://github.com/apache/spark/commit/abafaebdaace472ea643d3d7f1457e58d5b37831).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    **[Test build #71948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71948/testReport)** for PR 16694 at commit [`2980e67`](https://github.com/apache/spark/commit/2980e67d3415df2b810a9df9b96f2a5402c5c490).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16694


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    **[Test build #72080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72080/testReport)** for PR 16694 at commit [`e2e9943`](https://github.com/apache/spark/commit/e2e9943cdde4aaa2fd81e8d525b0f6e94120017b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    **[Test build #71946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71946/testReport)** for PR 16694 at commit [`98bd7e7`](https://github.com/apache/spark/commit/98bd7e77161e249a028e18ebfe19898a9b8952ac).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    **[Test build #71945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71945/testReport)** for PR 16694 at commit [`abafaeb`](https://github.com/apache/spark/commit/abafaebdaace472ea643d3d7f1457e58d5b37831).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by wangmiao1981 <gi...@git.apache.org>.
Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16694#discussion_r98247158
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -60,6 +61,137 @@ def numClasses(self):
     
     
     @inherit_doc
    +class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    +                HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, HasStandardization,
    +                HasThreshold, HasWeightCol, HasAggregationDepth, JavaMLWritable, JavaMLReadable):
    +    """
    +    Linear SVM Classifier (https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
    +    This binary classifier optimizes the Hinge Loss using the OWLQN optimizer.
    +
    +    >>> from pyspark.sql import Row
    +    >>> from pyspark.ml.linalg import Vectors
    +    >>> bdf = sc.parallelize([
    +    ...     Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)),
    --- End diff --
    
    OK. I will modify it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16694#discussion_r98141079
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -60,6 +61,137 @@ def numClasses(self):
     
     
     @inherit_doc
    +class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    +                HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, HasStandardization,
    +                HasThreshold, HasWeightCol, HasAggregationDepth, JavaMLWritable, JavaMLReadable):
    +    """
    +    Linear SVM Classifier (https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
    +    This binary classifier optimizes the Hinge Loss using the OWLQN optimizer.
    +
    +    >>> from pyspark.sql import Row
    +    >>> from pyspark.ml.linalg import Vectors
    +    >>> bdf = sc.parallelize([
    +    ...     Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)),
    +    ...     Row(label=0.0, weight=2.0, features=Vectors.sparse(1, [], []))]).toDF()
    +    >>> svm = LinearSVC(maxIter=5, regParam=0.01, weightCol="weight")
    +    >>> model = svm.fit(bdf)
    +    >>> model.coefficients
    +    DenseVector([1.909])
    +    >>> model.intercept
    +    -1.0045358384178
    +    >>> model.numClasses
    +    2
    +    >>> model.numFeatures
    +    1
    +    >>> test0 = sc.parallelize([Row(features=Vectors.dense(-1.0))]).toDF()
    +    >>> result = model.transform(test0).head()
    +    >>> result.prediction
    +    0.0
    +    >>> result.rawPrediction
    +    DenseVector([2.9135, -2.9135])
    +    >>> test1 = sc.parallelize([Row(features=Vectors.sparse(1, [0], [1.0]))]).toDF()
    +    >>> model.transform(test1).head().prediction
    +    1.0
    +    >>> svm.setParams("vector")
    --- End diff --
    
    Put this in a unit test (tests.py), not here in the doc tests (though I also don't think you really need this test)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16694#discussion_r98141066
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -63,7 +63,7 @@ class LinearSVC @Since("2.2.0") (
       def this() = this(Identifiable.randomUID("linearsvc"))
     
       /**
    -   * Set the regularization parameter.
    +   * Sets the regularization parameter.
    --- End diff --
    
    There's no need to change this.  Most other algorithms use "set" not "sets"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    LGTM, thank you!
    Merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16694#discussion_r98141077
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -60,6 +61,137 @@ def numClasses(self):
     
     
     @inherit_doc
    +class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    +                HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, HasStandardization,
    +                HasThreshold, HasWeightCol, HasAggregationDepth, JavaMLWritable, JavaMLReadable):
    +    """
    +    Linear SVM Classifier (https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
    +    This binary classifier optimizes the Hinge Loss using the OWLQN optimizer.
    +
    +    >>> from pyspark.sql import Row
    +    >>> from pyspark.ml.linalg import Vectors
    +    >>> bdf = sc.parallelize([
    +    ...     Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)),
    +    ...     Row(label=0.0, weight=2.0, features=Vectors.sparse(1, [], []))]).toDF()
    +    >>> svm = LinearSVC(maxIter=5, regParam=0.01, weightCol="weight")
    +    >>> model = svm.fit(bdf)
    +    >>> model.coefficients
    +    DenseVector([1.909])
    +    >>> model.intercept
    +    -1.0045358384178
    +    >>> model.numClasses
    +    2
    +    >>> model.numFeatures
    +    1
    +    >>> test0 = sc.parallelize([Row(features=Vectors.dense(-1.0))]).toDF()
    +    >>> result = model.transform(test0).head()
    +    >>> result.prediction
    +    0.0
    +    >>> result.rawPrediction
    +    DenseVector([2.9135, -2.9135])
    +    >>> test1 = sc.parallelize([Row(features=Vectors.sparse(1, [0], [1.0]))]).toDF()
    --- End diff --
    
    No need to test sparse vectors here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72080/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by wangmiao1981 <gi...@git.apache.org>.
Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16694#discussion_r98247098
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -60,6 +61,137 @@ def numClasses(self):
     
     
     @inherit_doc
    +class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    +                HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, HasStandardization,
    +                HasThreshold, HasWeightCol, HasAggregationDepth, JavaMLWritable, JavaMLReadable):
    +    """
    +    Linear SVM Classifier (https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
    +    This binary classifier optimizes the Hinge Loss using the OWLQN optimizer.
    +
    +    >>> from pyspark.sql import Row
    +    >>> from pyspark.ml.linalg import Vectors
    +    >>> bdf = sc.parallelize([
    +    ...     Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)),
    +    ...     Row(label=0.0, weight=2.0, features=Vectors.sparse(1, [], []))]).toDF()
    +    >>> svm = LinearSVC(maxIter=5, regParam=0.01, weightCol="weight")
    +    >>> model = svm.fit(bdf)
    +    >>> model.coefficients
    +    DenseVector([1.909])
    +    >>> model.intercept
    +    -1.0045358384178
    +    >>> model.numClasses
    +    2
    +    >>> model.numFeatures
    +    1
    +    >>> test0 = sc.parallelize([Row(features=Vectors.dense(-1.0))]).toDF()
    +    >>> result = model.transform(test0).head()
    +    >>> result.prediction
    +    0.0
    +    >>> result.rawPrediction
    +    DenseVector([2.9135, -2.9135])
    +    >>> test1 = sc.parallelize([Row(features=Vectors.sparse(1, [0], [1.0]))]).toDF()
    +    >>> model.transform(test1).head().prediction
    +    1.0
    +    >>> svm.setParams("vector")
    --- End diff --
    
     I follow the LogisticRegression to create this test. I will remove it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71945/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16694#discussion_r98141073
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -60,6 +61,137 @@ def numClasses(self):
     
     
     @inherit_doc
    +class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter,
    +                HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, HasStandardization,
    +                HasThreshold, HasWeightCol, HasAggregationDepth, JavaMLWritable, JavaMLReadable):
    +    """
    +    Linear SVM Classifier (https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM)
    +    This binary classifier optimizes the Hinge Loss using the OWLQN optimizer.
    +
    +    >>> from pyspark.sql import Row
    +    >>> from pyspark.ml.linalg import Vectors
    +    >>> bdf = sc.parallelize([
    --- End diff --
    
    Rename bdf -> df


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    **[Test build #72080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72080/testReport)** for PR 16694 at commit [`e2e9943`](https://github.com/apache/spark/commit/e2e9943cdde4aaa2fd81e8d525b0f6e94120017b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16694
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org