You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2017/01/09 10:39:39 UTC

[GitHub] spark pull request #16515: [MINOR][PYTHON][EXAMPLE] Fix binary classificatio...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/16515

    [MINOR][PYTHON][EXAMPLE] Fix binary classification metrics example to work

    ## What changes were proposed in this pull request?
    
    LibSVM datasource loads `ml.linalg.SparseVector` whereas the examples requires it to be `mllib.linalg.SparseVector`.  Scala exmaples, `BinaryClassificationMetricsExample.scala` is fine.
    
    ```
      File ".../spark/examples/src/main/python/mllib/binary_classification_metrics_example.py", line 39, in <lambda>
        .rdd.map(lambda row: LabeledPoint(row[0], row[1]))
      File ".../spark/python/pyspark/mllib/regression.py", line 54, in __init__
        self.features = _convert_to_vector(features)
      File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 80, in _convert_to_vector
        raise TypeError("Cannot convert type %s into Vector" % type(l))
    TypeError: Cannot convert type <class 'pyspark.ml.linalg.SparseVector'> into Vector
    ```
    
    ## How was this patch tested?
    
    Manually via
    
    ```
    ./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark minor-example-fix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16515.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16515
    
----
commit 9a4fd40609d6ef71dc3fd3db0f72502eb3f070f0
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-01-09T10:31:15Z

    Fix binary classification metrics example to work

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [SPARK-19134][PYTHON][EXAMPLE] Fix several Python mllib ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    **[Test build #71077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71077/testReport)** for PR 16515 at commit [`9a4fd40`](https://github.com/apache/spark/commit/9a4fd40609d6ef71dc3fd3db0f72502eb3f070f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16515: [MINOR][PYTHON][EXAMPLE] Fix binary classificatio...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16515#discussion_r95132326
  
    --- Diff: examples/src/main/python/mllib/binary_classification_metrics_example.py ---
    @@ -18,25 +18,20 @@
     Binary Classification Metrics Example.
     """
     from __future__ import print_function
    -from pyspark.sql import SparkSession
    +from pyspark import SparkContext
     # $example on$
     from pyspark.mllib.classification import LogisticRegressionWithLBFGS
     from pyspark.mllib.evaluation import BinaryClassificationMetrics
    -from pyspark.mllib.regression import LabeledPoint
    +from pyspark.mllib.util import MLUtils
     # $example off$
     
     if __name__ == "__main__":
    -    spark = SparkSession\
    -        .builder\
    -        .appName("BinaryClassificationMetricsExample")\
    -        .getOrCreate()
    +    sc = SparkContext(appName="BinaryClassificationMetricsExample")
    --- End diff --
    
    I just used `SparkContext` to be consistent with other examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [MINOR][PYTHON][EXAMPLE] Fix binary classification metri...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    @yanboliang Could I please ask to take a look please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16515: [SPARK-19134][EXAMPLE] Fix several sql, mllib and...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16515#discussion_r95222673
  
    --- Diff: examples/src/main/python/mllib/binary_classification_metrics_example.py ---
    @@ -18,25 +18,20 @@
     Binary Classification Metrics Example.
     """
     from __future__ import print_function
    -from pyspark.sql import SparkSession
    +from pyspark import SparkContext
     # $example on$
     from pyspark.mllib.classification import LogisticRegressionWithLBFGS
     from pyspark.mllib.evaluation import BinaryClassificationMetrics
    -from pyspark.mllib.regression import LabeledPoint
    +from pyspark.mllib.util import MLUtils
     # $example off$
     
     if __name__ == "__main__":
    -    spark = SparkSession\
    -        .builder\
    -        .appName("BinaryClassificationMetricsExample")\
    -        .getOrCreate()
    +    sc = SparkContext(appName="BinaryClassificationMetricsExample")
    --- End diff --
    
    +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [MINOR][PYTHON][EXAMPLE] Fix binary classification metri...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    Hm.. actually, it seems there are more. Let me open a JIRA and sweep it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16515: [SPARK-19134][PYTHON][EXAMPLE] Fix several Python...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16515#discussion_r95140529
  
    --- Diff: examples/src/main/python/mllib/binary_classification_metrics_example.py ---
    @@ -18,25 +18,20 @@
     Binary Classification Metrics Example.
     """
     from __future__ import print_function
    -from pyspark.sql import SparkSession
    +from pyspark import SparkContext
     # $example on$
     from pyspark.mllib.classification import LogisticRegressionWithLBFGS
     from pyspark.mllib.evaluation import BinaryClassificationMetrics
    -from pyspark.mllib.regression import LabeledPoint
    +from pyspark.mllib.util import MLUtils
     # $example off$
     
     if __name__ == "__main__":
    -    spark = SparkSession\
    -        .builder\
    -        .appName("BinaryClassificationMetricsExample")\
    -        .getOrCreate()
    +    sc = SparkContext(appName="BinaryClassificationMetricsExample")
    --- End diff --
    
    Is the point that this is an .mllib example rather than .ml so should use the older API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16515: [SPARK-19134][EXAMPLE] Fix several sql, mllib and...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16515


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [WIP][SPARK-19134][PYTHON][SQL][EXAMPLE] Fix several Pyt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71077/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16515: [SPARK-19134][PYTHON][EXAMPLE] Fix several Python...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16515#discussion_r95140703
  
    --- Diff: examples/src/main/python/mllib/binary_classification_metrics_example.py ---
    @@ -18,25 +18,20 @@
     Binary Classification Metrics Example.
     """
     from __future__ import print_function
    -from pyspark.sql import SparkSession
    +from pyspark import SparkContext
     # $example on$
     from pyspark.mllib.classification import LogisticRegressionWithLBFGS
     from pyspark.mllib.evaluation import BinaryClassificationMetrics
    -from pyspark.mllib.regression import LabeledPoint
    +from pyspark.mllib.util import MLUtils
     # $example off$
     
     if __name__ == "__main__":
    -    spark = SparkSession\
    -        .builder\
    -        .appName("BinaryClassificationMetricsExample")\
    -        .getOrCreate()
    +    sc = SparkContext(appName="BinaryClassificationMetricsExample")
    --- End diff --
    
    Yes, it is up to my understanding.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [SPARK-19134][EXAMPLE] Fix several sql, mllib and status...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    **[Test build #71080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71080/testReport)** for PR 16515 at commit [`1ce29fb`](https://github.com/apache/spark/commit/1ce29fb39147c1df5df299b874327a4ee12d0778).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [SPARK-19134][EXAMPLE] Fix several sql, mllib and status...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71080/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [WIP][SPARK-19134][PYTHON][SQL][EXAMPLE] Fix several Pyt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [SPARK-19134][EXAMPLE] Fix several sql, mllib and status...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    **[Test build #71080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71080/testReport)** for PR 16515 at commit [`1ce29fb`](https://github.com/apache/spark/commit/1ce29fb39147c1df5df299b874327a4ee12d0778).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [SPARK-19134][EXAMPLE] Fix several sql, mllib and status...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [WIP][SPARK-19134][PYTHON][SQL][EXAMPLE] Fix several Pyt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71078/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [WIP][SPARK-19134][PYTHON][SQL][EXAMPLE] Fix several Pyt...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    **[Test build #71078 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71078/testReport)** for PR 16515 at commit [`1f8c11e`](https://github.com/apache/spark/commit/1f8c11e9c775be805131c7535feec31f34782a9c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [SPARK-19134][EXAMPLE] Fix several sql, mllib and status...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    LGTM, merged into master. Thanks for catching this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [WIP][SPARK-19134][PYTHON][SQL][EXAMPLE] Fix several Pyt...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [SPARK-19134][EXAMPLE] Fix several sql, mllib and status...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    Thank you all!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [SPARK-19134][PYTHON][EXAMPLE] Fix several Python mllib ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    **[Test build #71078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71078/testReport)** for PR 16515 at commit [`1f8c11e`](https://github.com/apache/spark/commit/1f8c11e9c775be805131c7535feec31f34782a9c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16515: [SPARK-19134][PYTHON][EXAMPLE] Fix several Python mllib ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16515
  
    **[Test build #71077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71077/testReport)** for PR 16515 at commit [`9a4fd40`](https://github.com/apache/spark/commit/9a4fd40609d6ef71dc3fd3db0f72502eb3f070f0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org