You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by zhengruifeng <gi...@git.apache.org> on 2016/05/05 05:04:37 UTC

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

GitHub user zhengruifeng opened a pull request:

    https://github.com/apache/spark/pull/12920

    [SPARK-15141][DOC] Add python example for OneVsRest

    ## What changes were proposed in this pull request?
    Add python example for OneVsRest
    
    
    ## How was this patch tested?
    manual tests
    `./bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py --input=data/mllib/sample_multiclass_classification_data.txt --fracTest=0.33 --maxIter=2 --fitIntercept=false --regParam=0.001 --tol=0.012`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhengruifeng/spark ovr_pe

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12920.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12920
    
----
commit 6059e85e5438fd0d9c2f00b5c483f83bde397df1
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2016-04-30T04:40:58Z

    create

commit c5a1400655a78443be502aa3b9cfefaa62d2b2ac
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2016-05-03T10:32:13Z

    finish pr

commit 474e2523e6089b83a8ff225bcddb4b683207a6e6
Author: Zheng RuiFeng <ru...@foxmail.com>
Date:   2016-05-05T05:01:20Z

    use argparse

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217707107
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58100/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE] Add python example for ...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62191270
  
    --- Diff: examples/src/main/python/ml/one_vs_rest_example.py ---
    @@ -0,0 +1,125 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +import argparse
    +
    +from pyspark import SparkContext
    +
    +# $example on$
    +from pyspark.ml.classification import LogisticRegression, OneVsRest
    +from pyspark.mllib.evaluation import MulticlassMetrics
    +from pyspark.sql import SQLContext
    +# $example off$
    +
    +"""
    +An example runner for Multiclass to Binary Reduction with One Vs Rest.
    +The example uses Logistic Regression as the base classifier. All parameters that
    +can be specified on the base classifier can be passed in to the runner options.
    +Run with:
    +
    +  bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py
    +"""
    +
    +
    +def parse():
    +    parser = argparse.ArgumentParser()
    +    parser.add_argument("--input",
    +                        help="input path to labeled examples. This path must be specified")
    +    parser.add_argument("--fracTest", type=float, default=0.2,
    +                        help="fraction of data to hold out for testing.  If given option testInput,"
    +                             " this option is ignored. default: 0.2")
    +    parser.add_argument("--testInput",
    +                        help="iinput path to test dataset. If given, option fracTest is ignored")
    +    parser.add_argument("--maxIter", type=int, default=100,
    +                        help="maximum number of iterations for Logistic Regression. default: 100")
    +    parser.add_argument("--tol", type=float, default=1e-6,
    +                        help="the convergence tolerance of iterations for Logistic Regression."
    +                             " default: 1e-6")
    +    parser.add_argument("--fitIntercept", default="true",
    +                        help="fit intercept for Logistic Regression. default: true")
    +    parser.add_argument("--regParam", type=float,
    +                        help="the regularization parameter for Logistic Regression. default: None")
    +    parser.add_argument("--elasticNetParam", type=float,
    +                        help="the ElasticNet mixing parameter for Logistic Regression. default:"
    +                             " None")
    +    params = parser.parse_args()
    +
    +    assert params.input is not None, "input is required"
    +    assert 0 <= params.fracTest < 1, "fracTest value incorrect; should be in [0,1)."
    +    assert params.fitIntercept in ("true", "false")
    +    params.fitIntercept = params.fitIntercept == "true"
    +
    +    return params
    +
    +if __name__ == "__main__":
    +
    +    params = parse()
    +
    +    sc = SparkContext(appName="PythonOneVsRestExample")
    --- End diff --
    
    Use ```SparkSession``` instead of ```SQLContext```. See #12809 for details.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217077510
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57841/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217078369
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217622843
  
    @zhengruifeng made a few comments - most importantly it's better to use the ml evaluators throughout as these are DataFrame API examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE] Add python example for ...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217146684
  
    @zhengruifeng I think the `OneVsRest` examples are a little bloated. Could we turn them into simpler versions (essentially just the `run` part of the example) similar to say `GradientBoostedTreeClassifierExample`. Same applies for Java. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217082197
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57843/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE] Add python example for ...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217146073
  
    @HyukjinKwon I get your concern, but these examples are actually directly included in the HTML documentation for Spark, via `include_example`, so they are both examples and docs. For this kind of thing that touches the HTML doc I think `[DOC]` is important to include.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-218274940
  
    @zhengruifeng made a few comments. Pending those I think this is ready


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217077506
  
    **[Test build #57841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57841/consoleFull)** for PR 12920 at commit [`474e252`](https://github.com/apache/spark/commit/474e2523e6089b83a8ff225bcddb4b683207a6e6).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217082196
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217333854
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57948/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217338429
  
    **[Test build #57952 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57952/consoleFull)** for PR 12920 at commit [`06cab7f`](https://github.com/apache/spark/commit/06cab7f493c994d3ba34ffc91dc0fdc1bb861313).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217105494
  
    I am not sure if fixing examples can have the component  `[DOC]` in the title. I saw `[EXAMPLE]` component was used by @dongjoon-hyun. This is a pretty minor but making good examples of PRs will help all other contributors. 
    
    @dongjoon-hyun Do you mind If I ask your thoughts please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-218384621
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217692278
  
    Just use accuracy I think similar to the decision tree example
    
    On Sun, 8 May 2016 at 04:21, Ruifeng Zheng <no...@github.com> wrote:
    
    > @MLnick <https://github.com/MLnick> MulticlassClassificationEvaluator do
    > not support confusionMatrix not. So I will just remove the computaion of
    > confusionMatrix.
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/12920#issuecomment-217683598>
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217337677
  
    **[Test build #57952 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57952/consoleFull)** for PR 12920 at commit [`06cab7f`](https://github.com/apache/spark/commit/06cab7f493c994d3ba34ffc91dc0fdc1bb861313).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217078370
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57842/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-218382743
  
    **[Test build #58344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58344/consoleFull)** for PR 12920 at commit [`9049002`](https://github.com/apache/spark/commit/9049002d48bee41a389782c082a7fd4752a6d3fe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217706791
  
    **[Test build #58100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58100/consoleFull)** for PR 12920 at commit [`16c2e74`](https://github.com/apache/spark/commit/16c2e742ccd01e05a9853c09c872f68ae64d07f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-218387973
  
    LGTM, merged to master and branch-2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE] Add python example for ...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217153849
  
    @MLnick Agreed. I will remove the args-parsing blocks in the three example files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217333851
  
    **[Test build #57948 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57948/consoleFull)** for PR 12920 at commit [`a8d2681`](https://github.com/apache/spark/commit/a8d26817e4aade52e22a2b3fb1d5ead846439d1e).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217077816
  
    **[Test build #57842 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57842/consoleFull)** for PR 12920 at commit [`44875ed`](https://github.com/apache/spark/commit/44875ed4a9e2e3594f1697be10f3be380a612b0e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-218341872
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62741049
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaOneVsRestExample.java ---
    @@ -17,222 +17,69 @@
     
     package org.apache.spark.examples.ml;
     
    -import org.apache.commons.cli.*;
    -
     // $example on$
     import org.apache.spark.ml.classification.LogisticRegression;
     import org.apache.spark.ml.classification.OneVsRest;
     import org.apache.spark.ml.classification.OneVsRestModel;
    -import org.apache.spark.ml.util.MetadataUtils;
    -import org.apache.spark.mllib.evaluation.MulticlassMetrics;
    +import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator;
     import org.apache.spark.mllib.linalg.Matrix;
    --- End diff --
    
    This import is no longer required I believe


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217685059
  
    **[Test build #58082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58082/consoleFull)** for PR 12920 at commit [`af95019`](https://github.com/apache/spark/commit/af95019e1a5d3f8365dbbedafb788e0a8f3e91a1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12920


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217686109
  
    **[Test build #58082 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58082/consoleFull)** for PR 12920 at commit [`af95019`](https://github.com/apache/spark/commit/af95019e1a5d3f8365dbbedafb788e0a8f3e91a1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-218340225
  
    @MLnick Thanks. Updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62414405
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaOneVsRestExample.java ---
    @@ -17,102 +17,66 @@
     
     package org.apache.spark.examples.ml;
     
    -import org.apache.commons.cli.*;
    -
     // $example on$
     import org.apache.spark.ml.classification.LogisticRegression;
     import org.apache.spark.ml.classification.OneVsRest;
     import org.apache.spark.ml.classification.OneVsRestModel;
    -import org.apache.spark.ml.util.MetadataUtils;
     import org.apache.spark.mllib.evaluation.MulticlassMetrics;
     import org.apache.spark.mllib.linalg.Matrix;
    -import org.apache.spark.mllib.linalg.Vector;
     import org.apache.spark.sql.Dataset;
     import org.apache.spark.sql.Row;
    -import org.apache.spark.sql.SparkSession;
    -import org.apache.spark.sql.types.StructField;
     // $example off$
    +import org.apache.spark.sql.SparkSession;
    +
     
     /**
      * An example runner for Multiclass to Binary Reduction with One Vs Rest.
    - * The example uses Logistic Regression as the base classifier. All parameters that
    - * can be specified on the base classifier can be passed in to the runner options.
    + * The example uses Logistic Regression as the base classifier.
      * Run with
      * <pre>
    - * bin/run-example ml.JavaOneVsRestExample [options]
    + * bin/run-example ml.JavaOneVsRestExample
      * </pre>
      */
     public class JavaOneVsRestExample {
    -
    -  private static class Params {
    -    String input;
    -    String testInput = null;
    -    Integer maxIter = 100;
    -    double tol = 1E-6;
    -    boolean fitIntercept = true;
    -    Double regParam = null;
    -    Double elasticNetParam = null;
    -    double fracTest = 0.2;
    -  }
    -
       public static void main(String[] args) {
    -    // parse the arguments
    -    Params params = parse(args);
         SparkSession spark = SparkSession
           .builder()
           .appName("JavaOneVsRestExample")
           .getOrCreate();
     
         // $example on$
    -    // configure the base classifier
    -    LogisticRegression classifier = new LogisticRegression()
    -      .setMaxIter(params.maxIter)
    -      .setTol(params.tol)
    -      .setFitIntercept(params.fitIntercept);
    +    // load data file.
    +    Dataset<Row> inputData = spark.read().format("libsvm")
    +      .load("data/mllib/sample_multiclass_classification_data.txt");
     
    -    if (params.regParam != null) {
    -      classifier.setRegParam(params.regParam);
    -    }
    -    if (params.elasticNetParam != null) {
    -      classifier.setElasticNetParam(params.elasticNetParam);
    -    }
    +    // generate the train/test split.
    +    Dataset<Row>[] tmp = inputData.randomSplit(new double[]{0.8, 0.2});
    +    Dataset<Row> train = tmp[0];
    +    Dataset<Row> test = tmp[1];
     
    -    // instantiate the One Vs Rest Classifier
    -    OneVsRest ovr = new OneVsRest().setClassifier(classifier);
    +    // configure the base classifier.
    +    LogisticRegression classifier = new LogisticRegression()
    +      .setMaxIter(10)
    +      .setTol(1E-6)
    +      .setFitIntercept(true);
     
    -    String input = params.input;
    -    Dataset<Row> inputData = spark.read().format("libsvm").load(input);
    -    Dataset<Row> train;
    -    Dataset<Row> test;
    -
    -    // compute the train/ test split: if testInput is not provided use part of input
    -    String testInput = params.testInput;
    -    if (testInput != null) {
    -      train = inputData;
    -      // compute the number of features in the training set.
    -      int numFeatures = inputData.first().<Vector>getAs(1).size();
    -      test = spark.read().format("libsvm").option("numFeatures",
    -        String.valueOf(numFeatures)).load(testInput);
    -    } else {
    -      double f = params.fracTest;
    -      Dataset<Row>[] tmp = inputData.randomSplit(new double[]{1 - f, f}, 12345);
    -      train = tmp[0];
    -      test = tmp[1];
    -    }
    +    // instantiate the One Vs Rest Classifier.
    +    OneVsRest ovr = new OneVsRest().setClassifier(classifier);
     
    -    // train the multiclass model
    +    // train the multiclass model.
         OneVsRestModel ovrModel = ovr.fit(train.cache());
     
    -    // score the model on test data
    -    Dataset<Row> predictions = ovrModel.transform(test.cache())
    +    // score the model on test data.
    +    Dataset<Row> predictions = ovrModel.transform(test)
           .select("prediction", "label");
     
    -    // obtain metrics
    +    // obtain metrics.
    --- End diff --
    
    same here, use ml evaluator


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217338463
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217077444
  
    **[Test build #57841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57841/consoleFull)** for PR 12920 at commit [`474e252`](https://github.com/apache/spark/commit/474e2523e6089b83a8ff225bcddb4b683207a6e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217081358
  
    **[Test build #57843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57843/consoleFull)** for PR 12920 at commit [`63d90ed`](https://github.com/apache/spark/commit/63d90edee26c6af19380d89408ad80090be23952).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62429495
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaOneVsRestExample.java ---
    @@ -17,222 +17,69 @@
     
     package org.apache.spark.examples.ml;
     
    -import org.apache.commons.cli.*;
    -
     // $example on$
     import org.apache.spark.ml.classification.LogisticRegression;
     import org.apache.spark.ml.classification.OneVsRest;
     import org.apache.spark.ml.classification.OneVsRestModel;
    -import org.apache.spark.ml.util.MetadataUtils;
    -import org.apache.spark.mllib.evaluation.MulticlassMetrics;
    +import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator;
     import org.apache.spark.mllib.linalg.Matrix;
    -import org.apache.spark.mllib.linalg.Vector;
     import org.apache.spark.sql.Dataset;
     import org.apache.spark.sql.Row;
    -import org.apache.spark.sql.SparkSession;
    -import org.apache.spark.sql.types.StructField;
     // $example off$
    +import org.apache.spark.sql.SparkSession;
    +
     
     /**
      * An example runner for Multiclass to Binary Reduction with One Vs Rest.
    - * The example uses Logistic Regression as the base classifier. All parameters that
    - * can be specified on the base classifier can be passed in to the runner options.
    + * The example uses Logistic Regression as the base classifier.
      * Run with
      * <pre>
    - * bin/run-example ml.JavaOneVsRestExample [options]
    + * bin/run-example ml.JavaOneVsRestExample
      * </pre>
      */
     public class JavaOneVsRestExample {
    -
    -  private static class Params {
    -    String input;
    -    String testInput = null;
    -    Integer maxIter = 100;
    -    double tol = 1E-6;
    -    boolean fitIntercept = true;
    -    Double regParam = null;
    -    Double elasticNetParam = null;
    -    double fracTest = 0.2;
    -  }
    -
       public static void main(String[] args) {
    -    // parse the arguments
    -    Params params = parse(args);
         SparkSession spark = SparkSession
           .builder()
           .appName("JavaOneVsRestExample")
           .getOrCreate();
     
         // $example on$
    -    // configure the base classifier
    -    LogisticRegression classifier = new LogisticRegression()
    -      .setMaxIter(params.maxIter)
    -      .setTol(params.tol)
    -      .setFitIntercept(params.fitIntercept);
    +    // load data file.
    +    Dataset<Row> inputData = spark.read().format("libsvm")
    +      .load("data/mllib/sample_multiclass_classification_data.txt");
     
    -    if (params.regParam != null) {
    -      classifier.setRegParam(params.regParam);
    -    }
    -    if (params.elasticNetParam != null) {
    -      classifier.setElasticNetParam(params.elasticNetParam);
    -    }
    +    // generate the train/test split.
    +    Dataset<Row>[] tmp = inputData.randomSplit(new double[]{0.8, 0.2});
    +    Dataset<Row> train = tmp[0];
    +    Dataset<Row> test = tmp[1];
     
    -    // instantiate the One Vs Rest Classifier
    -    OneVsRest ovr = new OneVsRest().setClassifier(classifier);
    -
    -    String input = params.input;
    -    Dataset<Row> inputData = spark.read().format("libsvm").load(input);
    -    Dataset<Row> train;
    -    Dataset<Row> test;
    +    // configure the base classifier.
    +    LogisticRegression classifier = new LogisticRegression()
    +      .setMaxIter(10)
    +      .setTol(1E-6)
    +      .setFitIntercept(true);
     
    -    // compute the train/ test split: if testInput is not provided use part of input
    -    String testInput = params.testInput;
    -    if (testInput != null) {
    -      train = inputData;
    -      // compute the number of features in the training set.
    -      int numFeatures = inputData.first().<Vector>getAs(1).size();
    -      test = spark.read().format("libsvm").option("numFeatures",
    -        String.valueOf(numFeatures)).load(testInput);
    -    } else {
    -      double f = params.fracTest;
    -      Dataset<Row>[] tmp = inputData.randomSplit(new double[]{1 - f, f}, 12345);
    -      train = tmp[0];
    -      test = tmp[1];
    -    }
    +    // instantiate the One Vs Rest Classifier.
    +    OneVsRest ovr = new OneVsRest().setClassifier(classifier);
     
    -    // train the multiclass model
    +    // train the multiclass model.
         OneVsRestModel ovrModel = ovr.fit(train.cache());
     
    -    // score the model on test data
    -    Dataset<Row> predictions = ovrModel.transform(test.cache())
    +    // score the model on test data.
    +    Dataset<Row> predictions = ovrModel.transform(test)
           .select("prediction", "label");
     
    -    // obtain metrics
    -    MulticlassMetrics metrics = new MulticlassMetrics(predictions);
    -    StructField predictionColSchema = predictions.schema().apply("prediction");
    -    Integer numClasses = (Integer) MetadataUtils.getNumClasses(predictionColSchema).get();
    -
    -    // compute the false positive rate per label
    -    StringBuilder results = new StringBuilder();
    -    results.append("label\tfpr\n");
    -    for (int label = 0; label < numClasses; label++) {
    -      results.append(label);
    -      results.append("\t");
    -      results.append(metrics.falsePositiveRate((double) label));
    -      results.append("\n");
    -    }
    +    // obtain evaluator.
    +    MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator()
    +            .setMetricName("precision");
     
    -    Matrix confusionMatrix = metrics.confusionMatrix();
    -    // output the Confusion Matrix
    -    System.out.println("Confusion Matrix");
    -    System.out.println(confusionMatrix);
    -    System.out.println();
    -    System.out.println(results);
    +    // compute the classification error on test data.
    +    double precision = evaluator.evaluate(predictions);
    +    System.out.print("Test Error : " + (1 - precision));
    --- End diff --
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-218341875
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58306/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62801050
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaOneVsRestExample.java ---
    @@ -17,222 +17,68 @@
     
     package org.apache.spark.examples.ml;
     
    -import org.apache.commons.cli.*;
    -
     // $example on$
     import org.apache.spark.ml.classification.LogisticRegression;
     import org.apache.spark.ml.classification.OneVsRest;
     import org.apache.spark.ml.classification.OneVsRestModel;
    -import org.apache.spark.ml.util.MetadataUtils;
    -import org.apache.spark.mllib.evaluation.MulticlassMetrics;
    -import org.apache.spark.mllib.linalg.Matrix;
    -import org.apache.spark.mllib.linalg.Vector;
    +import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator;
     import org.apache.spark.sql.Dataset;
     import org.apache.spark.sql.Row;
    -import org.apache.spark.sql.SparkSession;
    -import org.apache.spark.sql.types.StructField;
     // $example off$
    +import org.apache.spark.sql.SparkSession;
    +
     
     /**
    - * An example runner for Multiclass to Binary Reduction with One Vs Rest.
    - * The example uses Logistic Regression as the base classifier. All parameters that
    - * can be specified on the base classifier can be passed in to the runner options.
    + * An example of Multiclass to Binary Reduction with One Vs Rest,
    + * using Logistic Regression as the base classifier.
      * Run with
      * <pre>
    - * bin/run-example ml.JavaOneVsRestExample [options]
    + * bin/run-example ml.JavaOneVsRestExample
      * </pre>
      */
     public class JavaOneVsRestExample {
    -
    -  private static class Params {
    -    String input;
    -    String testInput = null;
    -    Integer maxIter = 100;
    -    double tol = 1E-6;
    -    boolean fitIntercept = true;
    -    Double regParam = null;
    -    Double elasticNetParam = null;
    -    double fracTest = 0.2;
    -  }
    -
       public static void main(String[] args) {
    -    // parse the arguments
    -    Params params = parse(args);
         SparkSession spark = SparkSession
           .builder()
           .appName("JavaOneVsRestExample")
           .getOrCreate();
     
         // $example on$
    -    // configure the base classifier
    -    LogisticRegression classifier = new LogisticRegression()
    -      .setMaxIter(params.maxIter)
    -      .setTol(params.tol)
    -      .setFitIntercept(params.fitIntercept);
    +    // load data file.
    +    Dataset<Row> inputData = spark.read().format("libsvm")
    +      .load("data/mllib/sample_multiclass_classification_data.txt");
     
    -    if (params.regParam != null) {
    -      classifier.setRegParam(params.regParam);
    -    }
    -    if (params.elasticNetParam != null) {
    -      classifier.setElasticNetParam(params.elasticNetParam);
    -    }
    +    // generate the train/test split.
    +    Dataset<Row>[] tmp = inputData.randomSplit(new double[]{0.8, 0.2});
    +    Dataset<Row> train = tmp[0];
    +    Dataset<Row> test = tmp[1];
     
    -    // instantiate the One Vs Rest Classifier
    -    OneVsRest ovr = new OneVsRest().setClassifier(classifier);
    -
    -    String input = params.input;
    -    Dataset<Row> inputData = spark.read().format("libsvm").load(input);
    -    Dataset<Row> train;
    -    Dataset<Row> test;
    +    // configure the base classifier.
    +    LogisticRegression classifier = new LogisticRegression()
    +      .setMaxIter(10)
    +      .setTol(1E-6)
    +      .setFitIntercept(true);
     
    -    // compute the train/ test split: if testInput is not provided use part of input
    -    String testInput = params.testInput;
    -    if (testInput != null) {
    -      train = inputData;
    -      // compute the number of features in the training set.
    -      int numFeatures = inputData.first().<Vector>getAs(1).size();
    -      test = spark.read().format("libsvm").option("numFeatures",
    -        String.valueOf(numFeatures)).load(testInput);
    -    } else {
    -      double f = params.fracTest;
    -      Dataset<Row>[] tmp = inputData.randomSplit(new double[]{1 - f, f}, 12345);
    -      train = tmp[0];
    -      test = tmp[1];
    -    }
    +    // instantiate the One Vs Rest Classifier.
    +    OneVsRest ovr = new OneVsRest().setClassifier(classifier);
     
    -    // train the multiclass model
    +    // train the multiclass model.
    --- End diff --
    
    Right. I will fix it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217333308
  
    **[Test build #57948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57948/consoleFull)** for PR 12920 at commit [`a8d2681`](https://github.com/apache/spark/commit/a8d26817e4aade52e22a2b3fb1d5ead846439d1e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62414401
  
    --- Diff: examples/src/main/python/ml/one_vs_rest_example.py ---
    @@ -0,0 +1,78 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +# $example on$
    +from pyspark.ml.classification import LogisticRegression, OneVsRest
    +from pyspark.mllib.evaluation import MulticlassMetrics
    +# $example off$
    +from pyspark.sql import SparkSession
    +
    +"""
    +An example runner for Multiclass to Binary Reduction with One Vs Rest.
    +The example uses Logistic Regression as the base classifier.
    +Run with:
    +  bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py
    +"""
    +
    +
    +if __name__ == "__main__":
    +    spark = SparkSession \
    +        .builder \
    +        .appName("OneHotEncoderExample") \
    +        .getOrCreate()
    +
    +    # $example on$
    +    # load data file.
    +    inputData = spark.read.format("libsvm") \
    +        .load("data/mllib/sample_multiclass_classification_data.txt")
    +
    +    # generate the train/test split.
    +    (train, test) = inputData.randomSplit([0.8, 0.2])
    +
    +    # instantiate the base classifier.
    +    lrParams = {'maxIter': 10, 'tol': 1E-6, 'fitIntercept': True}
    +    lr = LogisticRegression(**lrParams)
    +
    +    # instantiate the One Vs Rest Classifier.
    +    ovr = OneVsRest(classifier=lr)
    +
    +    # train the multiclass model.
    +    ovrModel = ovr.fit(train)
    +
    +    # score the model on test data.
    +    predictions = ovrModel.transform(test)
    +
    +    # obtain metrics.
    +    predictionAndLabels = predictions.rdd.map(lambda r: (r.prediction, r.label))
    +    metrics = MulticlassMetrics(predictionAndLabels)
    --- End diff --
    
    same here, use ml evaluator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217338464
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57952/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62414391
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/OneVsRestExample.scala ---
    @@ -18,171 +18,71 @@
     // scalastyle:off println
     package org.apache.spark.examples.ml
     
    -import java.util.concurrent.TimeUnit.{NANOSECONDS => NANO}
    -
    -import scopt.OptionParser
    -
     // $example on$
    -import org.apache.spark.examples.mllib.AbstractParams
     import org.apache.spark.ml.classification.{LogisticRegression, OneVsRest}
    -import org.apache.spark.ml.util.MetadataUtils
     import org.apache.spark.mllib.evaluation.MulticlassMetrics
    -import org.apache.spark.mllib.linalg.Vector
     import org.apache.spark.sql.DataFrame
     // $example off$
     import org.apache.spark.sql.SparkSession
     
     /**
      * An example runner for Multiclass to Binary Reduction with One Vs Rest.
    - * The example uses Logistic Regression as the base classifier. All parameters that
    - * can be specified on the base classifier can be passed in to the runner options.
    + * The example uses Logistic Regression as the base classifier.
      * Run with
      * {{{
    - * ./bin/run-example ml.OneVsRestExample [options]
    + * ./bin/run-example ml.OneVsRestExample
      * }}}
    - * For local mode, run
    - * {{{
    - * ./bin/spark-submit --class org.apache.spark.examples.ml.OneVsRestExample --driver-memory 1g
    - *   [examples JAR path] [options]
    - * }}}
    - * If you use it as a template to create your own app, please use `spark-submit` to submit your app.
      */
    -object OneVsRestExample {
    -
    -  case class Params private[ml] (
    -      input: String = null,
    -      testInput: Option[String] = None,
    -      maxIter: Int = 100,
    -      tol: Double = 1E-6,
    -      fitIntercept: Boolean = true,
    -      regParam: Option[Double] = None,
    -      elasticNetParam: Option[Double] = None,
    -      fracTest: Double = 0.2) extends AbstractParams[Params]
     
    +object OneVsRestExample {
       def main(args: Array[String]) {
    -    val defaultParams = Params()
    -
    -    val parser = new OptionParser[Params]("OneVsRest Example") {
    -      head("OneVsRest Example: multiclass to binary reduction using OneVsRest")
    -      opt[String]("input")
    -        .text("input path to labeled examples. This path must be specified")
    -        .required()
    -        .action((x, c) => c.copy(input = x))
    -      opt[Double]("fracTest")
    -        .text(s"fraction of data to hold out for testing.  If given option testInput, " +
    -        s"this option is ignored. default: ${defaultParams.fracTest}")
    -        .action((x, c) => c.copy(fracTest = x))
    -      opt[String]("testInput")
    -        .text("input path to test dataset.  If given, option fracTest is ignored")
    -        .action((x, c) => c.copy(testInput = Some(x)))
    -      opt[Int]("maxIter")
    -        .text(s"maximum number of iterations for Logistic Regression." +
    -          s" default: ${defaultParams.maxIter}")
    -        .action((x, c) => c.copy(maxIter = x))
    -      opt[Double]("tol")
    -        .text(s"the convergence tolerance of iterations for Logistic Regression." +
    -          s" default: ${defaultParams.tol}")
    -        .action((x, c) => c.copy(tol = x))
    -      opt[Boolean]("fitIntercept")
    -        .text(s"fit intercept for Logistic Regression." +
    -        s" default: ${defaultParams.fitIntercept}")
    -        .action((x, c) => c.copy(fitIntercept = x))
    -      opt[Double]("regParam")
    -        .text(s"the regularization parameter for Logistic Regression.")
    -        .action((x, c) => c.copy(regParam = Some(x)))
    -      opt[Double]("elasticNetParam")
    -        .text(s"the ElasticNet mixing parameter for Logistic Regression.")
    -        .action((x, c) => c.copy(elasticNetParam = Some(x)))
    -      checkConfig { params =>
    -        if (params.fracTest < 0 || params.fracTest >= 1) {
    -          failure(s"fracTest ${params.fracTest} value incorrect; should be in [0,1).")
    -        } else {
    -          success
    -        }
    -      }
    -    }
    -    parser.parse(args, defaultParams).map { params =>
    -      run(params)
    -    }.getOrElse {
    -      sys.exit(1)
    -    }
    -  }
    -
    -  private def run(params: Params) {
         val spark = SparkSession
           .builder
    -      .appName(s"OneVsRestExample with $params")
    +      .appName(s"OneVsRestExample")
           .getOrCreate()
     
    +    import spark.implicits._
    +
         // $example on$
    -    val inputData = spark.read.format("libsvm").load(params.input)
    -    // compute the train/test split: if testInput is not provided use part of input.
    -    val data = params.testInput match {
    -      case Some(t) =>
    -        // compute the number of features in the training set.
    -        val numFeatures = inputData.first().getAs[Vector](1).size
    -        val testData = spark.read.option("numFeatures", numFeatures.toString)
    -          .format("libsvm").load(t)
    -        Array[DataFrame](inputData, testData)
    -      case None =>
    -        val f = params.fracTest
    -        inputData.randomSplit(Array(1 - f, f), seed = 12345)
    -    }
    -    val Array(train, test) = data.map(_.cache())
    +    // load data file.
    +    val inputData: DataFrame = spark.read.format("libsvm")
    +      .load("data/mllib/sample_multiclass_classification_data.txt")
    +
    +    // generate the train/test split.
    +    val Array(train, test) = inputData.randomSplit(Array(0.8, 0.2))
     
         // instantiate the base classifier
         val classifier = new LogisticRegression()
    -      .setMaxIter(params.maxIter)
    -      .setTol(params.tol)
    -      .setFitIntercept(params.fitIntercept)
    -
    -    // Set regParam, elasticNetParam if specified in params
    -    params.regParam.foreach(classifier.setRegParam)
    -    params.elasticNetParam.foreach(classifier.setElasticNetParam)
    +      .setMaxIter(10)
    +      .setTol(1E-6)
    +      .setFitIntercept(true)
     
         // instantiate the One Vs Rest Classifier.
    -
         val ovr = new OneVsRest()
         ovr.setClassifier(classifier)
     
         // train the multiclass model.
    -    val (trainingDuration, ovrModel) = time(ovr.fit(train))
    +    val ovrModel = ovr.fit(train)
     
         // score the model on test data.
    -    val (predictionDuration, predictions) = time(ovrModel.transform(test))
    +    val predictions = ovrModel.transform(test)
     
    -    // evaluate the model
    -    val predictionsAndLabels = predictions.select("prediction", "label")
    -      .rdd.map(row => (row.getDouble(0), row.getDouble(1)))
    -
    -    val metrics = new MulticlassMetrics(predictionsAndLabels)
    +    // obtain metrics.
    +    val metrics = new MulticlassMetrics(predictions.as[(Double, Double)].rdd)
    --- End diff --
    
    I think we should prefer to use `ml.evaluation.MulticlassClassificationEvaluator` here as it's a DataFrame API example. You may have to change the metric used - see `DecisionTreeClassificationExample` for example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217077404
  
    There seems no similar API to directly get `numClass` like scala
    `val numClasses = MetadataUtils.getNumClasses(predictionColSchema).get`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217686125
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58082/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-218341785
  
    **[Test build #58306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58306/consoleFull)** for PR 12920 at commit [`77ff733`](https://github.com/apache/spark/commit/77ff733f298fe5042070dfb5cbcf33519f9a6a18).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62414441
  
    --- Diff: examples/src/main/python/ml/one_vs_rest_example.py ---
    @@ -0,0 +1,78 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +# $example on$
    +from pyspark.ml.classification import LogisticRegression, OneVsRest
    +from pyspark.mllib.evaluation import MulticlassMetrics
    +# $example off$
    +from pyspark.sql import SparkSession
    +
    +"""
    +An example runner for Multiclass to Binary Reduction with One Vs Rest.
    +The example uses Logistic Regression as the base classifier.
    +Run with:
    +  bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py
    +"""
    +
    +
    +if __name__ == "__main__":
    +    spark = SparkSession \
    +        .builder \
    +        .appName("OneHotEncoderExample") \
    --- End diff --
    
    Incorrect app name here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217077508
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217707105
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-218384544
  
    **[Test build #58344 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58344/consoleFull)** for PR 12920 at commit [`9049002`](https://github.com/apache/spark/commit/9049002d48bee41a389782c082a7fd4752a6d3fe).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-218340667
  
    **[Test build #58306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58306/consoleFull)** for PR 12920 at commit [`77ff733`](https://github.com/apache/spark/commit/77ff733f298fe5042070dfb5cbcf33519f9a6a18).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62414456
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/OneVsRestExample.scala ---
    @@ -18,171 +18,71 @@
     // scalastyle:off println
     package org.apache.spark.examples.ml
     
    -import java.util.concurrent.TimeUnit.{NANOSECONDS => NANO}
    -
    -import scopt.OptionParser
    -
     // $example on$
    -import org.apache.spark.examples.mllib.AbstractParams
     import org.apache.spark.ml.classification.{LogisticRegression, OneVsRest}
    -import org.apache.spark.ml.util.MetadataUtils
     import org.apache.spark.mllib.evaluation.MulticlassMetrics
    -import org.apache.spark.mllib.linalg.Vector
     import org.apache.spark.sql.DataFrame
     // $example off$
     import org.apache.spark.sql.SparkSession
     
     /**
      * An example runner for Multiclass to Binary Reduction with One Vs Rest.
    - * The example uses Logistic Regression as the base classifier. All parameters that
    - * can be specified on the base classifier can be passed in to the runner options.
    + * The example uses Logistic Regression as the base classifier.
      * Run with
      * {{{
    - * ./bin/run-example ml.OneVsRestExample [options]
    + * ./bin/run-example ml.OneVsRestExample
      * }}}
    - * For local mode, run
    - * {{{
    - * ./bin/spark-submit --class org.apache.spark.examples.ml.OneVsRestExample --driver-memory 1g
    - *   [examples JAR path] [options]
    - * }}}
    - * If you use it as a template to create your own app, please use `spark-submit` to submit your app.
      */
    -object OneVsRestExample {
    -
    -  case class Params private[ml] (
    -      input: String = null,
    -      testInput: Option[String] = None,
    -      maxIter: Int = 100,
    -      tol: Double = 1E-6,
    -      fitIntercept: Boolean = true,
    -      regParam: Option[Double] = None,
    -      elasticNetParam: Option[Double] = None,
    -      fracTest: Double = 0.2) extends AbstractParams[Params]
     
    +object OneVsRestExample {
       def main(args: Array[String]) {
    -    val defaultParams = Params()
    -
    -    val parser = new OptionParser[Params]("OneVsRest Example") {
    -      head("OneVsRest Example: multiclass to binary reduction using OneVsRest")
    -      opt[String]("input")
    -        .text("input path to labeled examples. This path must be specified")
    -        .required()
    -        .action((x, c) => c.copy(input = x))
    -      opt[Double]("fracTest")
    -        .text(s"fraction of data to hold out for testing.  If given option testInput, " +
    -        s"this option is ignored. default: ${defaultParams.fracTest}")
    -        .action((x, c) => c.copy(fracTest = x))
    -      opt[String]("testInput")
    -        .text("input path to test dataset.  If given, option fracTest is ignored")
    -        .action((x, c) => c.copy(testInput = Some(x)))
    -      opt[Int]("maxIter")
    -        .text(s"maximum number of iterations for Logistic Regression." +
    -          s" default: ${defaultParams.maxIter}")
    -        .action((x, c) => c.copy(maxIter = x))
    -      opt[Double]("tol")
    -        .text(s"the convergence tolerance of iterations for Logistic Regression." +
    -          s" default: ${defaultParams.tol}")
    -        .action((x, c) => c.copy(tol = x))
    -      opt[Boolean]("fitIntercept")
    -        .text(s"fit intercept for Logistic Regression." +
    -        s" default: ${defaultParams.fitIntercept}")
    -        .action((x, c) => c.copy(fitIntercept = x))
    -      opt[Double]("regParam")
    -        .text(s"the regularization parameter for Logistic Regression.")
    -        .action((x, c) => c.copy(regParam = Some(x)))
    -      opt[Double]("elasticNetParam")
    -        .text(s"the ElasticNet mixing parameter for Logistic Regression.")
    -        .action((x, c) => c.copy(elasticNetParam = Some(x)))
    -      checkConfig { params =>
    -        if (params.fracTest < 0 || params.fracTest >= 1) {
    -          failure(s"fracTest ${params.fracTest} value incorrect; should be in [0,1).")
    -        } else {
    -          success
    -        }
    -      }
    -    }
    -    parser.parse(args, defaultParams).map { params =>
    -      run(params)
    -    }.getOrElse {
    -      sys.exit(1)
    -    }
    -  }
    -
    -  private def run(params: Params) {
         val spark = SparkSession
           .builder
    -      .appName(s"OneVsRestExample with $params")
    +      .appName(s"OneVsRestExample")
           .getOrCreate()
     
    +    import spark.implicits._
    +
         // $example on$
    -    val inputData = spark.read.format("libsvm").load(params.input)
    -    // compute the train/test split: if testInput is not provided use part of input.
    -    val data = params.testInput match {
    -      case Some(t) =>
    -        // compute the number of features in the training set.
    -        val numFeatures = inputData.first().getAs[Vector](1).size
    -        val testData = spark.read.option("numFeatures", numFeatures.toString)
    -          .format("libsvm").load(t)
    -        Array[DataFrame](inputData, testData)
    -      case None =>
    -        val f = params.fracTest
    -        inputData.randomSplit(Array(1 - f, f), seed = 12345)
    -    }
    -    val Array(train, test) = data.map(_.cache())
    +    // load data file.
    +    val inputData: DataFrame = spark.read.format("libsvm")
    +      .load("data/mllib/sample_multiclass_classification_data.txt")
    +
    +    // generate the train/test split.
    +    val Array(train, test) = inputData.randomSplit(Array(0.8, 0.2))
     
         // instantiate the base classifier
         val classifier = new LogisticRegression()
    -      .setMaxIter(params.maxIter)
    -      .setTol(params.tol)
    -      .setFitIntercept(params.fitIntercept)
    -
    -    // Set regParam, elasticNetParam if specified in params
    -    params.regParam.foreach(classifier.setRegParam)
    -    params.elasticNetParam.foreach(classifier.setElasticNetParam)
    +      .setMaxIter(10)
    +      .setTol(1E-6)
    +      .setFitIntercept(true)
     
         // instantiate the One Vs Rest Classifier.
    -
         val ovr = new OneVsRest()
         ovr.setClassifier(classifier)
    --- End diff --
    
    minor but perhaps use `val over = new OneVsRest().setClassifier(...)` to match Java example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62282420
  
    --- Diff: examples/src/main/python/ml/one_vs_rest_example.py ---
    @@ -0,0 +1,125 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +import argparse
    +
    +from pyspark import SparkContext
    +
    +# $example on$
    +from pyspark.ml.classification import LogisticRegression, OneVsRest
    +from pyspark.mllib.evaluation import MulticlassMetrics
    +from pyspark.sql import SQLContext
    +# $example off$
    +
    +"""
    +An example runner for Multiclass to Binary Reduction with One Vs Rest.
    +The example uses Logistic Regression as the base classifier. All parameters that
    +can be specified on the base classifier can be passed in to the runner options.
    +Run with:
    +
    +  bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py
    +"""
    +
    +
    +def parse():
    +    parser = argparse.ArgumentParser()
    +    parser.add_argument("--input",
    +                        help="input path to labeled examples. This path must be specified")
    +    parser.add_argument("--fracTest", type=float, default=0.2,
    +                        help="fraction of data to hold out for testing.  If given option testInput,"
    +                             " this option is ignored. default: 0.2")
    +    parser.add_argument("--testInput",
    +                        help="iinput path to test dataset. If given, option fracTest is ignored")
    +    parser.add_argument("--maxIter", type=int, default=100,
    +                        help="maximum number of iterations for Logistic Regression. default: 100")
    +    parser.add_argument("--tol", type=float, default=1e-6,
    +                        help="the convergence tolerance of iterations for Logistic Regression."
    +                             " default: 1e-6")
    +    parser.add_argument("--fitIntercept", default="true",
    +                        help="fit intercept for Logistic Regression. default: true")
    +    parser.add_argument("--regParam", type=float,
    +                        help="the regularization parameter for Logistic Regression. default: None")
    +    parser.add_argument("--elasticNetParam", type=float,
    +                        help="the ElasticNet mixing parameter for Logistic Regression. default:"
    +                             " None")
    +    params = parser.parse_args()
    +
    +    assert params.input is not None, "input is required"
    +    assert 0 <= params.fracTest < 1, "fracTest value incorrect; should be in [0,1)."
    +    assert params.fitIntercept in ("true", "false")
    +    params.fitIntercept = params.fitIntercept == "true"
    +
    +    return params
    +
    +if __name__ == "__main__":
    +
    +    params = parse()
    +
    +    sc = SparkContext(appName="PythonOneVsRestExample")
    --- End diff --
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62414422
  
    --- Diff: examples/src/main/python/ml/one_vs_rest_example.py ---
    @@ -0,0 +1,78 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +# $example on$
    +from pyspark.ml.classification import LogisticRegression, OneVsRest
    +from pyspark.mllib.evaluation import MulticlassMetrics
    +# $example off$
    +from pyspark.sql import SparkSession
    +
    +"""
    +An example runner for Multiclass to Binary Reduction with One Vs Rest.
    +The example uses Logistic Regression as the base classifier.
    +Run with:
    +  bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py
    +"""
    +
    +
    +if __name__ == "__main__":
    +    spark = SparkSession \
    +        .builder \
    +        .appName("OneHotEncoderExample") \
    +        .getOrCreate()
    +
    +    # $example on$
    +    # load data file.
    +    inputData = spark.read.format("libsvm") \
    +        .load("data/mllib/sample_multiclass_classification_data.txt")
    +
    +    # generate the train/test split.
    +    (train, test) = inputData.randomSplit([0.8, 0.2])
    +
    +    # instantiate the base classifier.
    +    lrParams = {'maxIter': 10, 'tol': 1E-6, 'fitIntercept': True}
    --- End diff --
    
    This is a little strange, why not simply `LogisticRegression(maxIter=10, tol=...)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62800683
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaOneVsRestExample.java ---
    @@ -17,222 +17,68 @@
     
     package org.apache.spark.examples.ml;
     
    -import org.apache.commons.cli.*;
    -
     // $example on$
     import org.apache.spark.ml.classification.LogisticRegression;
     import org.apache.spark.ml.classification.OneVsRest;
     import org.apache.spark.ml.classification.OneVsRestModel;
    -import org.apache.spark.ml.util.MetadataUtils;
    -import org.apache.spark.mllib.evaluation.MulticlassMetrics;
    -import org.apache.spark.mllib.linalg.Matrix;
    -import org.apache.spark.mllib.linalg.Vector;
    +import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator;
     import org.apache.spark.sql.Dataset;
     import org.apache.spark.sql.Row;
    -import org.apache.spark.sql.SparkSession;
    -import org.apache.spark.sql.types.StructField;
     // $example off$
    +import org.apache.spark.sql.SparkSession;
    +
     
     /**
    - * An example runner for Multiclass to Binary Reduction with One Vs Rest.
    - * The example uses Logistic Regression as the base classifier. All parameters that
    - * can be specified on the base classifier can be passed in to the runner options.
    + * An example of Multiclass to Binary Reduction with One Vs Rest,
    + * using Logistic Regression as the base classifier.
      * Run with
      * <pre>
    - * bin/run-example ml.JavaOneVsRestExample [options]
    + * bin/run-example ml.JavaOneVsRestExample
      * </pre>
      */
     public class JavaOneVsRestExample {
    -
    -  private static class Params {
    -    String input;
    -    String testInput = null;
    -    Integer maxIter = 100;
    -    double tol = 1E-6;
    -    boolean fitIntercept = true;
    -    Double regParam = null;
    -    Double elasticNetParam = null;
    -    double fracTest = 0.2;
    -  }
    -
       public static void main(String[] args) {
    -    // parse the arguments
    -    Params params = parse(args);
         SparkSession spark = SparkSession
           .builder()
           .appName("JavaOneVsRestExample")
           .getOrCreate();
     
         // $example on$
    -    // configure the base classifier
    -    LogisticRegression classifier = new LogisticRegression()
    -      .setMaxIter(params.maxIter)
    -      .setTol(params.tol)
    -      .setFitIntercept(params.fitIntercept);
    +    // load data file.
    +    Dataset<Row> inputData = spark.read().format("libsvm")
    +      .load("data/mllib/sample_multiclass_classification_data.txt");
     
    -    if (params.regParam != null) {
    -      classifier.setRegParam(params.regParam);
    -    }
    -    if (params.elasticNetParam != null) {
    -      classifier.setElasticNetParam(params.elasticNetParam);
    -    }
    +    // generate the train/test split.
    +    Dataset<Row>[] tmp = inputData.randomSplit(new double[]{0.8, 0.2});
    +    Dataset<Row> train = tmp[0];
    +    Dataset<Row> test = tmp[1];
     
    -    // instantiate the One Vs Rest Classifier
    -    OneVsRest ovr = new OneVsRest().setClassifier(classifier);
    -
    -    String input = params.input;
    -    Dataset<Row> inputData = spark.read().format("libsvm").load(input);
    -    Dataset<Row> train;
    -    Dataset<Row> test;
    +    // configure the base classifier.
    +    LogisticRegression classifier = new LogisticRegression()
    +      .setMaxIter(10)
    +      .setTol(1E-6)
    +      .setFitIntercept(true);
     
    -    // compute the train/ test split: if testInput is not provided use part of input
    -    String testInput = params.testInput;
    -    if (testInput != null) {
    -      train = inputData;
    -      // compute the number of features in the training set.
    -      int numFeatures = inputData.first().<Vector>getAs(1).size();
    -      test = spark.read().format("libsvm").option("numFeatures",
    -        String.valueOf(numFeatures)).load(testInput);
    -    } else {
    -      double f = params.fracTest;
    -      Dataset<Row>[] tmp = inputData.randomSplit(new double[]{1 - f, f}, 12345);
    -      train = tmp[0];
    -      test = tmp[1];
    -    }
    +    // instantiate the One Vs Rest Classifier.
    +    OneVsRest ovr = new OneVsRest().setClassifier(classifier);
     
    -    // train the multiclass model
    +    // train the multiclass model.
    --- End diff --
    
    Sorry one last thing - here we use `train.cache()` but we don't do that in the other examples. Actually in general we don't seem to do that in any other examples from a quick look. So perhaps remove that and just do `ovr.fit(train);`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217683598
  
    @MLnick `MulticlassClassificationEvaluator` do not support `confusionMatrix` not. So I will just remove the computaion of `confusionMatrix`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-218384623
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58344/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62428631
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaOneVsRestExample.java ---
    @@ -17,222 +17,69 @@
     
     package org.apache.spark.examples.ml;
     
    -import org.apache.commons.cli.*;
    -
     // $example on$
     import org.apache.spark.ml.classification.LogisticRegression;
     import org.apache.spark.ml.classification.OneVsRest;
     import org.apache.spark.ml.classification.OneVsRestModel;
    -import org.apache.spark.ml.util.MetadataUtils;
    -import org.apache.spark.mllib.evaluation.MulticlassMetrics;
    +import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator;
     import org.apache.spark.mllib.linalg.Matrix;
    -import org.apache.spark.mllib.linalg.Vector;
     import org.apache.spark.sql.Dataset;
     import org.apache.spark.sql.Row;
    -import org.apache.spark.sql.SparkSession;
    -import org.apache.spark.sql.types.StructField;
     // $example off$
    +import org.apache.spark.sql.SparkSession;
    +
     
     /**
      * An example runner for Multiclass to Binary Reduction with One Vs Rest.
    - * The example uses Logistic Regression as the base classifier. All parameters that
    - * can be specified on the base classifier can be passed in to the runner options.
    + * The example uses Logistic Regression as the base classifier.
      * Run with
      * <pre>
    - * bin/run-example ml.JavaOneVsRestExample [options]
    + * bin/run-example ml.JavaOneVsRestExample
      * </pre>
      */
     public class JavaOneVsRestExample {
    -
    -  private static class Params {
    -    String input;
    -    String testInput = null;
    -    Integer maxIter = 100;
    -    double tol = 1E-6;
    -    boolean fitIntercept = true;
    -    Double regParam = null;
    -    Double elasticNetParam = null;
    -    double fracTest = 0.2;
    -  }
    -
       public static void main(String[] args) {
    -    // parse the arguments
    -    Params params = parse(args);
         SparkSession spark = SparkSession
           .builder()
           .appName("JavaOneVsRestExample")
           .getOrCreate();
     
         // $example on$
    -    // configure the base classifier
    -    LogisticRegression classifier = new LogisticRegression()
    -      .setMaxIter(params.maxIter)
    -      .setTol(params.tol)
    -      .setFitIntercept(params.fitIntercept);
    +    // load data file.
    +    Dataset<Row> inputData = spark.read().format("libsvm")
    +      .load("data/mllib/sample_multiclass_classification_data.txt");
     
    -    if (params.regParam != null) {
    -      classifier.setRegParam(params.regParam);
    -    }
    -    if (params.elasticNetParam != null) {
    -      classifier.setElasticNetParam(params.elasticNetParam);
    -    }
    +    // generate the train/test split.
    +    Dataset<Row>[] tmp = inputData.randomSplit(new double[]{0.8, 0.2});
    +    Dataset<Row> train = tmp[0];
    +    Dataset<Row> test = tmp[1];
     
    -    // instantiate the One Vs Rest Classifier
    -    OneVsRest ovr = new OneVsRest().setClassifier(classifier);
    -
    -    String input = params.input;
    -    Dataset<Row> inputData = spark.read().format("libsvm").load(input);
    -    Dataset<Row> train;
    -    Dataset<Row> test;
    +    // configure the base classifier.
    +    LogisticRegression classifier = new LogisticRegression()
    +      .setMaxIter(10)
    +      .setTol(1E-6)
    +      .setFitIntercept(true);
     
    -    // compute the train/ test split: if testInput is not provided use part of input
    -    String testInput = params.testInput;
    -    if (testInput != null) {
    -      train = inputData;
    -      // compute the number of features in the training set.
    -      int numFeatures = inputData.first().<Vector>getAs(1).size();
    -      test = spark.read().format("libsvm").option("numFeatures",
    -        String.valueOf(numFeatures)).load(testInput);
    -    } else {
    -      double f = params.fracTest;
    -      Dataset<Row>[] tmp = inputData.randomSplit(new double[]{1 - f, f}, 12345);
    -      train = tmp[0];
    -      test = tmp[1];
    -    }
    +    // instantiate the One Vs Rest Classifier.
    +    OneVsRest ovr = new OneVsRest().setClassifier(classifier);
     
    -    // train the multiclass model
    +    // train the multiclass model.
         OneVsRestModel ovrModel = ovr.fit(train.cache());
     
    -    // score the model on test data
    -    Dataset<Row> predictions = ovrModel.transform(test.cache())
    +    // score the model on test data.
    +    Dataset<Row> predictions = ovrModel.transform(test)
           .select("prediction", "label");
     
    -    // obtain metrics
    -    MulticlassMetrics metrics = new MulticlassMetrics(predictions);
    -    StructField predictionColSchema = predictions.schema().apply("prediction");
    -    Integer numClasses = (Integer) MetadataUtils.getNumClasses(predictionColSchema).get();
    -
    -    // compute the false positive rate per label
    -    StringBuilder results = new StringBuilder();
    -    results.append("label\tfpr\n");
    -    for (int label = 0; label < numClasses; label++) {
    -      results.append(label);
    -      results.append("\t");
    -      results.append(metrics.falsePositiveRate((double) label));
    -      results.append("\n");
    -    }
    +    // obtain evaluator.
    +    MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator()
    +            .setMetricName("precision");
     
    -    Matrix confusionMatrix = metrics.confusionMatrix();
    -    // output the Confusion Matrix
    -    System.out.println("Confusion Matrix");
    -    System.out.println(confusionMatrix);
    -    System.out.println();
    -    System.out.println(results);
    +    // compute the classification error on test data.
    +    double precision = evaluator.evaluate(predictions);
    +    System.out.print("Test Error : " + (1 - precision));
    --- End diff --
    
    Nit: println


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE] Add python example for ...

Posted by yanboliang <gi...@git.apache.org>.

Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217158354
  
    +1 @MLnick 
    There are two types of example code under ```examples/```:
    * Example applications (the ones with command-line parsing). Those are served as template code for users to build their own standalone applications. We should create those only for important algorithms.
    * Tutorial code. Those are for the user guide, usually without options to play with. We should keep one example per algorithm.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217707084
  
    **[Test build #58100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58100/consoleFull)** for PR 12920 at commit [`16c2e74`](https://github.com/apache/spark/commit/16c2e742ccd01e05a9853c09c872f68ae64d07f4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217686124
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217333852
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Add python example...

Posted by zhengruifeng <gi...@git.apache.org>.

Github user zhengruifeng commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217347142
  
    @MLnick Args-Parsing was removed in those examples


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217082153
  
    **[Test build #57843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57843/consoleFull)** for PR 12920 at commit [`63d90ed`](https://github.com/apache/spark/commit/63d90edee26c6af19380d89408ad80090be23952).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][DOC] Add python example for OneV...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12920#issuecomment-217078338
  
    **[Test build #57842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57842/consoleFull)** for PR 12920 at commit [`44875ed`](https://github.com/apache/spark/commit/44875ed4a9e2e3594f1697be10f3be380a612b0e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-15141][EXAMPLE][DOC] Update OneVsRest E...

Posted by MLnick <gi...@git.apache.org>.

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12920#discussion_r62741242
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaOneVsRestExample.java ---
    @@ -17,222 +17,69 @@
     
     package org.apache.spark.examples.ml;
     
    -import org.apache.commons.cli.*;
    -
     // $example on$
     import org.apache.spark.ml.classification.LogisticRegression;
     import org.apache.spark.ml.classification.OneVsRest;
     import org.apache.spark.ml.classification.OneVsRestModel;
    -import org.apache.spark.ml.util.MetadataUtils;
    -import org.apache.spark.mllib.evaluation.MulticlassMetrics;
    +import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator;
     import org.apache.spark.mllib.linalg.Matrix;
    -import org.apache.spark.mllib.linalg.Vector;
     import org.apache.spark.sql.Dataset;
     import org.apache.spark.sql.Row;
    -import org.apache.spark.sql.SparkSession;
    -import org.apache.spark.sql.types.StructField;
     // $example off$
    +import org.apache.spark.sql.SparkSession;
    +
     
     /**
      * An example runner for Multiclass to Binary Reduction with One Vs Rest.
    --- End diff --
    
    you can remove "runner" in this doc string (in this example and the others)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org