You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2016/04/28 14:43:14 UTC

[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/12754

    [SPARK-14979] [ML] [PySpark] Add examples for GeneralizedLinearRegression

    ## What changes were proposed in this pull request?
    Add Scala/Java/Python examples for ```GeneralizedLinearRegression```.
    
    ## How was this patch tested?
    It's example and tested offline.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-14979

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12754.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12754
    
----
commit a04646750cfed32a40d2745bd0f3fffe1ccdec00
Author: Yanbo Liang <yb...@gmail.com>
Date:   2016-04-28T12:40:08Z

    Add examples for GeneralizedLinearRegression

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219203482
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-215414472
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57250/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12754#discussion_r62604571
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GeneralizedLinearRegressionExample.scala ---
    @@ -0,0 +1,69 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.ml
    +
    +// $example on$
    +import org.apache.spark.ml.regression.GeneralizedLinearRegression
    +// $example off$
    +import org.apache.spark.sql.SparkSession
    +
    +object GeneralizedLinearRegressionExample {
    +
    +  def main(args: Array[String]): Unit = {
    +    val spark = SparkSession
    +      .builder
    +      .appName("GeneralizedLinearRegressionExample")
    +      .getOrCreate()
    +
    +    // $example on$
    +    // Load training data
    +    val training = spark.read.format("libsvm")
    --- End diff --
    
    Agree, updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-217307940
  
    One thing I'm curious about, for the user guide especially, is if we'll add examples for classification and other families? It's a bit weird since Generalized Linear _Regression_ can also be used for classification.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219203197
  
    **[Test build #58602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58602/consoleFull)** for PR 12754 at commit [`8b9f33a`](https://github.com/apache/spark/commit/8b9f33a0a5991959743e29e9f61175a20ce14a87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12754#discussion_r62604290
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GeneralizedLinearRegressionExample.scala ---
    @@ -0,0 +1,69 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.ml
    +
    +// $example on$
    +import org.apache.spark.ml.regression.GeneralizedLinearRegression
    +// $example off$
    +import org.apache.spark.sql.SparkSession
    +
    +object GeneralizedLinearRegressionExample {
    +
    +  def main(args: Array[String]): Unit = {
    +    val spark = SparkSession
    +      .builder
    +      .appName("GeneralizedLinearRegressionExample")
    +      .getOrCreate()
    +
    +    // $example on$
    +    // Load training data
    +    val training = spark.read.format("libsvm")
    +      .load("data/mllib/sample_linear_regression_data.txt")
    +
    +    val glr = new GeneralizedLinearRegression()
    +      .setFamily("gaussian")
    +      .setLink("identity")
    +      .setMaxIter(10)
    +      .setRegParam(0.3)
    +
    +    // Fit the model
    +    val model = glr.fit(training)
    +
    +    // Print the coefficients and intercept for generalized linear regression model
    +    println(s"Coefficients: ${model.coefficients} Intercept: ${model.intercept}")
    +
    +    // Summarize the model over the training set and print out some metrics
    +    val summary = model.summary
    --- End diff --
    
    We should keep the example succinct, so I think it does not necessary to illustrate a complete ML pipeline in the algorithm example. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219070171
  
    LGTM pending one minor comment. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12754#discussion_r62720345
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaGeneralizedLinearRegressionExample.java ---
    @@ -0,0 +1,73 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.ml;
    +
    +// $example on$
    +import java.util.Arrays;
    +
    +import org.apache.spark.ml.regression.GeneralizedLinearRegression;
    +import org.apache.spark.ml.regression.GeneralizedLinearRegressionModel;
    +import org.apache.spark.ml.regression.GeneralizedLinearRegressionTrainingSummary;
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +import org.apache.spark.sql.SparkSession;
    --- End diff --
    
    This import should not be included in the example if we don't also include the creation of the spark session. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-215414469
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219073834
  
    **[Test build #58577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58577/consoleFull)** for PR 12754 at commit [`6f88efd`](https://github.com/apache/spark/commit/6f88efd631fd47441287de0056720cb808ed77e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12754#discussion_r62371005
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GeneralizedLinearRegressionExample.scala ---
    @@ -0,0 +1,69 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.ml
    +
    +// $example on$
    +import org.apache.spark.ml.regression.GeneralizedLinearRegression
    +// $example off$
    +import org.apache.spark.sql.SparkSession
    +
    +object GeneralizedLinearRegressionExample {
    +
    +  def main(args: Array[String]): Unit = {
    +    val spark = SparkSession
    +      .builder
    +      .appName("GeneralizedLinearRegressionExample")
    +      .getOrCreate()
    +
    +    // $example on$
    +    // Load training data
    +    val training = spark.read.format("libsvm")
    --- End diff --
    
    If we don't add a test summary, can you call this `dataset` or something similar? `training` doesn't make sense, IMO, if there isn't a test set also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12754#discussion_r62270439
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GeneralizedLinearRegressionExample.scala ---
    @@ -0,0 +1,69 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.ml
    +
    +import org.apache.spark.{SparkConf, SparkContext}
    +// $example on$
    +import org.apache.spark.ml.regression.GeneralizedLinearRegression
    +// $example off$
    +import org.apache.spark.sql.SQLContext
    +
    +object GeneralizedLinearRegressionExample {
    +
    +  def main(args: Array[String]): Unit = {
    +    val conf = new SparkConf().setAppName("GeneralizedLinearRegressionExample")
    +    val sc = new SparkContext(conf)
    +    val sqlContext = new SQLContext(sc)
    --- End diff --
    
    We'll want to use SparkSession here and elsewhere now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12754


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-218044954
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12754#discussion_r63219051
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaGeneralizedLinearRegressionExample.java ---
    @@ -0,0 +1,81 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.ml;
    +
    +// $example on$
    +import java.util.Arrays;
    +
    +import org.apache.spark.ml.regression.GeneralizedLinearRegression;
    +import org.apache.spark.ml.regression.GeneralizedLinearRegressionModel;
    +import org.apache.spark.ml.regression.GeneralizedLinearRegressionTrainingSummary;
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +// $example off$
    +import org.apache.spark.sql.SparkSession;
    +
    +/**
    + * An example demonstrating generalized linear regression.
    + * Run with
    + * <pre>
    + * bin/run-example ml.JavaGeneralizedLinearRegressionExample
    + * </pre>
    + */
    +
    +public class JavaGeneralizedLinearRegressionExample {
    +
    +  public static void main(String[] args) {
    +    SparkSession spark = SparkSession
    +      .builder().appName("JavaGeneralizedLinearRegressionExample").getOrCreate();
    --- End diff --
    
    minor but the rest of the examples seem to put this on different lines:
    
    ```
    SparkSession spark = SparkSession
        .builder()
        .appName("JavaGeneralizedLinearRegressionExample")
        .getOrCreate();
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-218043919
  
    **[Test build #58198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58198/consoleFull)** for PR 12754 at commit [`faa9892`](https://github.com/apache/spark/commit/faa9892f7b6fcc245456d3b83ca4f8d03a220b64).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-217400888
  
    @sethah Thanks for your comments. Please feel free to start a separate PR for the user guide. I'm ambivalent with the issue about adding classification and other families. I'd assume new users would more prefer to use ```Linear/LogisticRegression``` directly, and more expert people would go for ```GeneralizedLinearRegression```. These expert people should understand what GLM can do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-215440275
  
    cc @mengxr @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-218998430
  
    **[Test build #58560 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58560/consoleFull)** for PR 12754 at commit [`6e0200f`](https://github.com/apache/spark/commit/6e0200f7646a9f924afd4b22c998fe5c844d64b3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-217304479
  
    Do we plan to add a section to the user guide for 2.0? If we want to do it separately from this PR, I can work on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219000529
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-217400143
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57976/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219203483
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58602/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219105609
  
    @yanboliang just a few very minor comments, pending that LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-218238870
  
    I just found a couple of minor things. I'd like to add the Python model summaries to the examples if we can get it added via [#12961](https://github.com/apache/spark/pull/12961). @yanboliang What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219076192
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-215414343
  
    **[Test build #57250 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57250/consoleFull)** for PR 12754 at commit [`0b0d969`](https://github.com/apache/spark/commit/0b0d9698a534d3a8b7b2ba42678275bc1c83401a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12754#discussion_r63218910
  
    --- Diff: examples/src/main/python/ml/generalized_linear_regression_example.py ---
    @@ -0,0 +1,66 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +from __future__ import print_function
    +
    +from pyspark.sql import SparkSession
    +# $example on$
    +from pyspark.ml.regression import GeneralizedLinearRegression
    +# $example off$
    +
    +"""
    +An example demonstrating generalized linear regression.
    +Run with:
    +  bin/spark-submit examples/src/main/python/ml/generalized_linear_regression_example.py
    +"""
    +
    +if __name__ == "__main__":
    +    spark = SparkSession\
    +        .builder\
    +        .appName("GeneralizedLinearRegressionExample")\
    +        .getOrCreate()
    +
    +    # $example on$
    +    # Load training data
    +    dataset = spark.read.format("libsvm")\
    +        .load("data/mllib/sample_linear_regression_data.txt")
    +
    +    glr = GeneralizedLinearRegression(family="gaussian", link="identity", maxIter=10, regParam=0.3)
    +
    +    # Fit the model
    +    model = glr.fit(dataset)
    +
    +    # Print the coefficients and intercept for generalized linear regression model
    +    print("Coefficients: " + str(model.coefficients))
    +    print("Intercept: " + str(model.intercept))
    +
    +    # Summarize the model over the training set and print out some metrics
    +    summary = model.summary
    +    print("Coefficient Standard Errors: " + str(summary.coefficientStandardErrors))
    +    print("T Values: " + str(summary.tValues))
    +    print("P Values: " + str(summary.pValues))
    +    print("Dispersion: " + str(summary.dispersion))
    +    print("Null Deviance: " + str(summary.nullDeviance))
    +    print("Residual Degree Of Freedom Null: " + str(summary.residualDegreeOfFreedomNull))
    +    print("Deviance: " + str(summary.deviance))
    +    print("Residual Degree Of Freedom: " + str(summary.residualDegreeOfFreedom))
    +    print("AIC: " + str(summary.aic))
    +    print("DevianceResiduals: ")
    --- End diff --
    
    `Deviance Residuals`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219203467
  
    **[Test build #58602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58602/consoleFull)** for PR 12754 at commit [`8b9f33a`](https://github.com/apache/spark/commit/8b9f33a0a5991959743e29e9f61175a20ce14a87).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12754#discussion_r62689150
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GeneralizedLinearRegressionExample.scala ---
    @@ -0,0 +1,69 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.ml
    +
    +// $example on$
    +import org.apache.spark.ml.regression.GeneralizedLinearRegression
    +// $example off$
    +import org.apache.spark.sql.SparkSession
    +
    +object GeneralizedLinearRegressionExample {
    --- End diff --
    
    In other examples, we place a comment at the top of the file explaining how to run the example. For instance:
    
    ```scala
    
    /**
     * An example demonstrating a bisecting k-means clustering.
     * Run with
     * {{{
     * bin/run-example ml.BisectingKMeansExample
     * }}}
     */
    ```
    
    We seem to be very inconsistent about this. It would be nice to choose a convention and stick to it. Thoughts? @MLnick @zhengruifeng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12754#discussion_r62370897
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GeneralizedLinearRegressionExample.scala ---
    @@ -0,0 +1,69 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.ml
    +
    +// $example on$
    +import org.apache.spark.ml.regression.GeneralizedLinearRegression
    +// $example off$
    +import org.apache.spark.sql.SparkSession
    +
    +object GeneralizedLinearRegressionExample {
    +
    +  def main(args: Array[String]): Unit = {
    +    val spark = SparkSession
    +      .builder
    +      .appName("GeneralizedLinearRegressionExample")
    +      .getOrCreate()
    +
    +    // $example on$
    +    // Load training data
    +    val training = spark.read.format("libsvm")
    +      .load("data/mllib/sample_linear_regression_data.txt")
    +
    +    val glr = new GeneralizedLinearRegression()
    +      .setFamily("gaussian")
    +      .setLink("identity")
    +      .setMaxIter(10)
    +      .setRegParam(0.3)
    +
    +    // Fit the model
    +    val model = glr.fit(training)
    +
    +    // Print the coefficients and intercept for generalized linear regression model
    +    println(s"Coefficients: ${model.coefficients} Intercept: ${model.intercept}")
    +
    +    // Summarize the model over the training set and print out some metrics
    +    val summary = model.summary
    --- End diff --
    
    I think it would be nice to show off the testing summary (`val testSummary = model.evaluate(testData)`), but I know we try to keep the examples rather brief. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-218748702
  
    I will update this PR after #12961 get merged and get in Python model summaries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219000439
  
    **[Test build #58560 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58560/consoleFull)** for PR 12754 at commit [`6e0200f`](https://github.com/apache/spark/commit/6e0200f7646a9f924afd4b22c998fe5c844d64b3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-215412571
  
    **[Test build #57250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57250/consoleFull)** for PR 12754 at commit [`0b0d969`](https://github.com/apache/spark/commit/0b0d9698a534d3a8b7b2ba42678275bc1c83401a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219076194
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58577/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12754#discussion_r63198312
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GeneralizedLinearRegressionExample.scala ---
    @@ -0,0 +1,77 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.ml
    +
    +// $example on$
    +import org.apache.spark.ml.regression.GeneralizedLinearRegression
    +// $example off$
    +import org.apache.spark.sql.SparkSession
    +
    +/**
    + * An example demonstrating generalized linear regression.
    + * Run with
    + * {{{
    + * bin/run-example ml.GeneralizedLinearRegressionExample
    + * }}}
    + */
    +
    +object GeneralizedLinearRegressionExample {
    +
    +  def main(args: Array[String]): Unit = {
    +    val spark = SparkSession
    +      .builder
    +      .appName("GeneralizedLinearRegressionExample")
    +      .getOrCreate()
    +
    +    // $example on$
    +    // Load training data
    +    val dataset = spark.read.format("libsvm")
    +      .load("data/mllib/sample_linear_regression_data.txt")
    +
    +    val glr = new GeneralizedLinearRegression()
    +      .setFamily("gaussian")
    +      .setLink("identity")
    +      .setMaxIter(10)
    +      .setRegParam(0.3)
    +
    +    // Fit the model
    +    val model = glr.fit(dataset)
    +
    +    // Print the coefficients and intercept for generalized linear regression model
    +    println(s"Coefficients: ${model.coefficients} Intercept: ${model.intercept}")
    --- End diff --
    
    Let's separate these onto two line here and the Java example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219076084
  
    **[Test build #58577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58577/consoleFull)** for PR 12754 at commit [`6f88efd`](https://github.com/apache/spark/commit/6f88efd631fd47441287de0056720cb808ed77e0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-217400066
  
    **[Test build #57976 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57976/consoleFull)** for PR 12754 at commit [`d21c7ba`](https://github.com/apache/spark/commit/d21c7bae2c4e902c8913fb77d0b6a6b651fa4a7c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219369540
  
    Merged to master/branch-2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-217521715
  
    @yanboliang on a related note, if we can get [#12961](https://github.com/apache/spark/pull/12961) merged in, then we can add it to the python example here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-218970486
  
    @12961 is merged


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-218044958
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58198/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12754#discussion_r62691393
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GeneralizedLinearRegressionExample.scala ---
    @@ -0,0 +1,69 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.ml
    +
    +// $example on$
    +import org.apache.spark.ml.regression.GeneralizedLinearRegression
    +// $example off$
    +import org.apache.spark.sql.SparkSession
    +
    +object GeneralizedLinearRegressionExample {
    --- End diff --
    
    I like the comment being there - we should try to have that consistently in
    all the examples
    On Tue, 10 May 2016 at 17:07, Seth Hendrickson <no...@github.com>
    wrote:
    
    > In
    > examples/src/main/scala/org/apache/spark/examples/ml/GeneralizedLinearRegressionExample.scala
    > <https://github.com/apache/spark/pull/12754#discussion_r62689150>:
    >
    > > + * Unless required by applicable law or agreed to in writing, software
    > > + * distributed under the License is distributed on an "AS IS" BASIS,
    > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    > > + * See the License for the specific language governing permissions and
    > > + * limitations under the License.
    > > + */
    > > +
    > > +// scalastyle:off println
    > > +package org.apache.spark.examples.ml
    > > +
    > > +// $example on$
    > > +import org.apache.spark.ml.regression.GeneralizedLinearRegression
    > > +// $example off$
    > > +import org.apache.spark.sql.SparkSession
    > > +
    > > +object GeneralizedLinearRegressionExample {
    >
    > In other examples, we place a comment at the top of the file explaining
    > how to run the example. For instance:
    >
    > /** * An example demonstrating a bisecting k-means clustering. * Run with * {{{ * bin/run-example ml.BisectingKMeansExample * }}} */
    >
    > We seem to be very inconsistent about this. It would be nice to choose a
    > convention and stick to it. Thoughts? @MLnick <https://github.com/MLnick>
    > @zhengruifeng <https://github.com/zhengruifeng>
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/12754/files/faa9892f7b6fcc245456d3b83ca4f8d03a220b64#r62689150>
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-217398707
  
    **[Test build #57976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57976/consoleFull)** for PR 12754 at commit [`d21c7ba`](https://github.com/apache/spark/commit/d21c7bae2c4e902c8913fb77d0b6a6b651fa4a7c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-217400141
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-218044896
  
    **[Test build #58198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58198/consoleFull)** for PR 12754 at commit [`faa9892`](https://github.com/apache/spark/commit/faa9892f7b6fcc245456d3b83ca4f8d03a220b64).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12754#discussion_r63218860
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/GeneralizedLinearRegressionExample.scala ---
    @@ -0,0 +1,78 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +// scalastyle:off println
    +package org.apache.spark.examples.ml
    +
    +// $example on$
    +import org.apache.spark.ml.regression.GeneralizedLinearRegression
    +// $example off$
    +import org.apache.spark.sql.SparkSession
    +
    +/**
    + * An example demonstrating generalized linear regression.
    + * Run with
    + * {{{
    + * bin/run-example ml.GeneralizedLinearRegressionExample
    + * }}}
    + */
    +
    +object GeneralizedLinearRegressionExample {
    +
    +  def main(args: Array[String]): Unit = {
    +    val spark = SparkSession
    +      .builder
    +      .appName("GeneralizedLinearRegressionExample")
    +      .getOrCreate()
    +
    +    // $example on$
    +    // Load training data
    +    val dataset = spark.read.format("libsvm")
    +      .load("data/mllib/sample_linear_regression_data.txt")
    +
    +    val glr = new GeneralizedLinearRegression()
    +      .setFamily("gaussian")
    +      .setLink("identity")
    +      .setMaxIter(10)
    +      .setRegParam(0.3)
    +
    +    // Fit the model
    +    val model = glr.fit(dataset)
    +
    +    // Print the coefficients and intercept for generalized linear regression model
    +    println(s"Coefficients: ${model.coefficients}")
    +    println(s"Intercept: ${model.intercept}")
    +
    +    // Summarize the model over the training set and print out some metrics
    +    val summary = model.summary
    +    println(s"Coefficient Standard Errors: ${summary.coefficientStandardErrors.mkString(",")}")
    +    println(s"T Values: ${summary.tValues.mkString(",")}")
    +    println(s"P Values: ${summary.pValues.mkString(",")}")
    +    println(s"Dispersion: ${summary.dispersion}")
    +    println(s"Null Deviance: ${summary.nullDeviance}")
    +    println(s"Residual Degree Of Freedom Null: ${summary.residualDegreeOfFreedomNull}")
    +    println(s"Deviance: ${summary.deviance}")
    +    println(s"Residual Degree Of Freedom: ${summary.residualDegreeOfFreedom}")
    +    println(s"AIC: ${summary.aic}")
    +    println("DevianceResiduals: ")
    --- End diff --
    
    `Deviance Residuals`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12754#issuecomment-219000532
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58560/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org