You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by hhbyyh <gi...@git.apache.org> on 2017/03/16 22:15:40 UTC

[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

GitHub user hhbyyh opened a pull request:

    https://github.com/apache/spark/pull/17324

    [SPARK-19969] [ML] Imputer doc and example

    ## What changes were proposed in this pull request?
    
    Add docs and examples for spark.ml.feature.Imputer. Currently scala and Java examples are included. Python example will be added after https://github.com/apache/spark/pull/17316
    
    ## How was this patch tested?
    
    local doc generation and example execution


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hhbyyh/spark imputerdoc

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17324.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17324
    
----
commit f2e7a69badc9d4e0352fcfe09e8d18cdfe007d9e
Author: Yuhao Yang <yu...@intel.com>
Date:   2017-03-16T22:05:56Z

    imputer doc and example

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r107300524
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,61 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +Imputation transformer for completing missing values in the dataset, either using the mean or the 
    +median of the columns in which the missing value are located. The input columns should be of
    +DoubleType or FloatType. Currently Imputer does not support categorical features and possibly
    +creates incorrect values for a categorical feature. All Null values in the input column are
    +treated as missing, and so are also imputed.
    +
    +**Examples**
    +
    +Suppose that we have a DataFrame with the column `a` and `b`:
    +
    +~~~
    +      a     |      b      
    +------------|-----------
    +     1.0    | Double.NaN
    +     2.0    | Double.NaN
    + Double.NaN |     3.0   
    +     4.0    |     4.0   
    +     5.0    |     5.0   
    +~~~
    +
    +By default, Imputer will replace all the `Double.NaN` (missing value) with the mean (strategy) from
    +other values in the corresponding columns. In our example, the surrogates for `a` and `b` are 3.0
    --- End diff --
    
    In this example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108158040
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaImputerExample.java ---
    @@ -0,0 +1,72 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.ml;
    +
    +// $example on$
    +import java.util.Arrays;
    +import java.util.List;
    +
    +import org.apache.spark.ml.feature.Imputer;
    +import org.apache.spark.ml.feature.ImputerModel;
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +import org.apache.spark.sql.RowFactory;
    +import org.apache.spark.sql.SparkSession;
    +import org.apache.spark.sql.types.*;
    +// $example off$
    +
    +import static org.apache.spark.sql.types.DataTypes.*;
    +
    +/**
    + * An example demonstrating Imputer.
    + * Run with:
    + *   bin/run-example ml.JavaImputerExample
    + */
    +public class JavaImputerExample {
    +  public static void main(String[] args) {
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaImputerExample")
    +      .getOrCreate();
    +
    +    // $example on$
    +    List<Row> data = Arrays.asList(
    +      RowFactory.create(1.0, Double.NaN),
    +      RowFactory.create(2.0, Double.NaN),
    +      RowFactory.create(Double.NaN, 3.0),
    +      RowFactory.create(4.0, 4.0),
    +      RowFactory.create(5.0, 5.0)
    +    );
    +    StructType schema = new StructType(new StructField[]{
    +      createStructField("a", DoubleType, false),
    +      createStructField("b", DoubleType, false)
    +    });
    +    Dataset<Row> df = spark.createDataFrame(data, schema);
    +
    +    Imputer imputerModel = new Imputer()
    +      .setStrategy("mean")
    --- End diff --
    
    Since we're using defaults we can remove the `setStrategy` call in all examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75271/testReport)** for PR 17324 at commit [`7df70b7`](https://github.com/apache/spark/commit/7df70b79374fa2615562a8f3205738ed60c9bb42).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75058 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75058/testReport)** for PR 17324 at commit [`4bbe2f7`](https://github.com/apache/spark/commit/4bbe2f7336c5c0b2373a811de66b2c1204fb1683).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74689/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75383/testReport)** for PR 17324 at commit [`48a1361`](https://github.com/apache/spark/commit/48a136133fe83b5e4c2408e4391c15fdefead901).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Will take a look this week


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75031/testReport)** for PR 17324 at commit [`4bbe2f7`](https://github.com/apache/spark/commit/4bbe2f7336c5c0b2373a811de66b2c1204fb1683).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75197/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75197 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75197/testReport)** for PR 17324 at commit [`a2e24c0`](https://github.com/apache/spark/commit/a2e24c0b1bd1e640a44e6da2d97c58fd1cbd0ddd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108155956
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,64 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +The `Imputer` transformer completes missing values in the dataset, either using the mean or the 
    +median of the columns in which the missing value are located. The input columns should be of
    --- End diff --
    
    "value" -> "values"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75271/testReport)** for PR 17324 at commit [`7df70b7`](https://github.com/apache/spark/commit/7df70b79374fa2615562a8f3205738ed60c9bb42).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108158366
  
    --- Diff: examples/src/main/python/ml/imputer.py ---
    @@ -0,0 +1,46 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# $example on$
    +from pyspark.ml.feature import Imputer
    +# $example off$
    +from pyspark.sql import SparkSession
    +
    +if __name__ == "__main__":
    +    spark = SparkSession\
    +        .builder\
    +        .appName("imputer example")\
    +        .getOrCreate()
    +
    +    # $example on$
    +    dataFrame = spark.createDataFrame([
    --- End diff --
    
    `dataFrame` -> `df` to be consistent with other examples


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75383/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75059/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75396/testReport)** for PR 17324 at commit [`e17f997`](https://github.com/apache/spark/commit/e17f997518782014b3c3dc1c33d69aecfcb0d38c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108158972
  
    --- Diff: examples/src/main/python/ml/imputer.py ---
    @@ -0,0 +1,46 @@
    +#
    --- End diff --
    
    Prefer filename `imputer_example.py` to be consistent with other Python examples for ML


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r107299906
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala ---
    @@ -75,6 +75,8 @@ private[feature] trait ImputerParams extends Params with HasInputCols {
     
       /** Validates and transforms the input schema. */
       protected def validateAndTransformSchema(schema: StructType): StructType = {
    +    require(get(inputCols).isDefined, "Input cols must be defined first.")
    --- End diff --
    
    As I mentioned in #17316, is this really required? Since a non-set param for these will in any case throw an exception during `transformSchema` (or `fit`, or `transform`) with "no default value found"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108157863
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,64 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +The `Imputer` transformer completes missing values in the dataset, either using the mean or the 
    +median of the columns in which the missing value are located. The input columns should be of
    +`DoubleType` or `FloatType`. Currently `Imputer` does not support categorical features and possibly
    +creates incorrect values for a categorical feature.
    +
    +**Note** all `null` values in the input columns are treated as missing, and so are also imputed.
    +
    +**Examples**
    +
    +Suppose that we have a DataFrame with the column `a` and `b`:
    +
    +~~~
    +      a     |      b      
    +------------|-----------
    +     1.0    | Double.NaN
    +     2.0    | Double.NaN
    + Double.NaN |     3.0   
    +     4.0    |     4.0   
    +     5.0    |     5.0   
    +~~~
    +
    +In this example, Imputer will replace all occurrences of Double.NaN (the default for the missing value)
    +with the mean (the default imputation strategy) from the other values in the corresponding columns.
    +In this example, the surrogate values for columns `a` and `b` are 3.0 and 4.0 respectively. After
    +transformation, the missing values in the output columns will be replaced by the surrogate value for
    +that column.
    +
    +~~~
    +      a     |      b     | out_a | out_b   
    +------------|------------|-------|-------
    +     1.0    | Double.NaN |  1.0  |  4.0 
    +     2.0    | Double.NaN |  2.0  |  4.0 
    + Double.NaN |     3.0    |  3.0  |  3.0 
    +     4.0    |     4.0    |  4.0  |  4.0
    +     5.0    |     5.0    |  5.0  |  5.0 
    +~~~
    +
    +<div class="codetabs">
    +<div data-lang="scala" markdown="1">
    +
    +Refer to the [Imputer Scala docs](api/scala/index.html#org.apache.spark.ml.feature.Imputer)
    +for more details on the API.
    +
    +{% include_example scala/org/apache/spark/examples/ml/ImputerExample.scala %}
    +</div>
    +
    +<div data-lang="java" markdown="1">
    +
    +Refer to the [Imputer Java docs](api/java/org/apache/spark/ml/feature/Imputer.html)
    +for more details on the API.
    +
    +{% include_example java/org/apache/spark/examples/ml/JavaImputerExample.java %}
    +</div>
    +</div>
    --- End diff --
    
    Need to `include_example` for the Python example here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108157473
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,64 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +The `Imputer` transformer completes missing values in the dataset, either using the mean or the 
    +median of the columns in which the missing value are located. The input columns should be of
    +`DoubleType` or `FloatType`. Currently `Imputer` does not support categorical features and possibly
    +creates incorrect values for a categorical feature.
    +
    +**Note** all `null` values in the input columns are treated as missing, and so are also imputed.
    +
    +**Examples**
    +
    +Suppose that we have a DataFrame with the column `a` and `b`:
    +
    +~~~
    +      a     |      b      
    +------------|-----------
    +     1.0    | Double.NaN
    +     2.0    | Double.NaN
    + Double.NaN |     3.0   
    +     4.0    |     4.0   
    +     5.0    |     5.0   
    +~~~
    +
    +In this example, Imputer will replace all occurrences of Double.NaN (the default for the missing value)
    +with the mean (the default imputation strategy) from the other values in the corresponding columns.
    +In this example, the surrogate values for columns `a` and `b` are 3.0 and 4.0 respectively. After
    +transformation, the missing values in the output columns will be replaced by the surrogate value for
    --- End diff --
    
    "surrogate value for the relevant column."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74690/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108158326
  
    --- Diff: examples/src/main/python/ml/imputer.py ---
    @@ -0,0 +1,46 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# $example on$
    +from pyspark.ml.feature import Imputer
    +# $example off$
    +from pyspark.sql import SparkSession
    +
    +if __name__ == "__main__":
    +    spark = SparkSession\
    +        .builder\
    +        .appName("imputer example")\
    --- End diff --
    
    Let's use "PythonImputerExample" to be consistent for app name used in other examples


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108157034
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,64 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +The `Imputer` transformer completes missing values in the dataset, either using the mean or the 
    +median of the columns in which the missing value are located. The input columns should be of
    +`DoubleType` or `FloatType`. Currently `Imputer` does not support categorical features and possibly
    +creates incorrect values for a categorical feature.
    +
    +**Note** all `null` values in the input columns are treated as missing, and so are also imputed.
    +
    +**Examples**
    +
    +Suppose that we have a DataFrame with the column `a` and `b`:
    +
    +~~~
    +      a     |      b      
    +------------|-----------
    +     1.0    | Double.NaN
    +     2.0    | Double.NaN
    + Double.NaN |     3.0   
    +     4.0    |     4.0   
    +     5.0    |     5.0   
    +~~~
    +
    +In this example, Imputer will replace all occurrences of Double.NaN (the default for the missing value)
    --- End diff --
    
    backticks around `Double.NaN`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r107299706
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/ImputerExample.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.ml
    +
    +// $example on$
    +import org.apache.spark.ml.feature.Imputer
    +// $example off$
    +import org.apache.spark.sql.SparkSession
    +
    +object ImputerExample {
    +
    +  def main(args: Array[String]): Unit = {
    +    val spark = SparkSession.builder
    +      .appName("ImputerExample")
    +      .getOrCreate()
    +
    +    // $example on$
    +    val df = spark.createDataFrame( Seq(
    --- End diff --
    
    Nit: Space in `( Seq(` should be removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r107300024
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,61 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +Imputation transformer for completing missing values in the dataset, either using the mean or the 
    --- End diff --
    
    Maybe something like "The `Imputer` transformer completes missing values in ..."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    @hhbyyh #17316 is merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #74689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74689/testReport)** for PR 17324 at commit [`f2e7a69`](https://github.com/apache/spark/commit/f2e7a69badc9d4e0352fcfe09e8d18cdfe007d9e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class JavaImputerExample `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Generally looks fine - made a few small comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108158841
  
    --- Diff: examples/src/main/python/ml/imputer.py ---
    @@ -0,0 +1,46 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# $example on$
    +from pyspark.ml.feature import Imputer
    +# $example off$
    +from pyspark.sql import SparkSession
    +
    --- End diff --
    
    While I see that not all Python examples have it, let's add the comment here too:
    
    ```python
    """
    An example demonstrating Imputer.
    Run with:
      bin/spark-submit examples/src/main/python/ml/imputer_example.py
    """
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75058 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75058/testReport)** for PR 17324 at commit [`4bbe2f7`](https://github.com/apache/spark/commit/4bbe2f7336c5c0b2373a811de66b2c1204fb1683).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Sure
    On Mon, 27 Mar 2017 at 19:53, Yuhao Yang <no...@github.com> wrote:
    
    > *@hhbyyh* commented on this pull request.
    > ------------------------------
    >
    > In examples/src/main/python/ml/imputer.py
    > <https://github.com/apache/spark/pull/17324#discussion_r108235749>:
    >
    > > +# Unless required by applicable law or agreed to in writing, software
    > +# distributed under the License is distributed on an "AS IS" BASIS,
    > +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    > +# See the License for the specific language governing permissions and
    > +# limitations under the License.
    > +#
    > +
    > +# $example on$
    > +from pyspark.ml.feature import Imputer
    > +# $example off$
    > +from pyspark.sql import SparkSession
    > +
    > +if __name__ == "__main__":
    > +    spark = SparkSession\
    > +        .builder\
    > +        .appName("imputer example")\
    >
    > Sure. For consistency, how about just keep it "ImputerExample"
    >
    > \u2014
    > You are receiving this because you commented.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/17324#discussion_r108235749>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AA_SByob4PtcqvZG_bOqxjJpXViK76eeks5rp_eggaJpZM4Mf8oS>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108234190
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaImputerExample.java ---
    @@ -0,0 +1,72 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.ml;
    +
    +// $example on$
    +import java.util.Arrays;
    +import java.util.List;
    +
    +import org.apache.spark.ml.feature.Imputer;
    +import org.apache.spark.ml.feature.ImputerModel;
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +import org.apache.spark.sql.RowFactory;
    +import org.apache.spark.sql.SparkSession;
    +import org.apache.spark.sql.types.*;
    +// $example off$
    +
    +import static org.apache.spark.sql.types.DataTypes.*;
    +
    +/**
    + * An example demonstrating Imputer.
    + * Run with:
    + *   bin/run-example ml.JavaImputerExample
    + */
    +public class JavaImputerExample {
    +  public static void main(String[] args) {
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaImputerExample")
    +      .getOrCreate();
    +
    +    // $example on$
    +    List<Row> data = Arrays.asList(
    +      RowFactory.create(1.0, Double.NaN),
    +      RowFactory.create(2.0, Double.NaN),
    +      RowFactory.create(Double.NaN, 3.0),
    +      RowFactory.create(4.0, 4.0),
    +      RowFactory.create(5.0, 5.0)
    +    );
    +    StructType schema = new StructType(new StructField[]{
    +      createStructField("a", DoubleType, false),
    +      createStructField("b", DoubleType, false)
    +    });
    +    Dataset<Row> df = spark.createDataFrame(data, schema);
    +
    +    Imputer imputerModel = new Imputer()
    +      .setStrategy("mean")
    --- End diff --
    
    For the example code, can we keep it to introduce the primary API or important paramters?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75059/testReport)** for PR 17324 at commit [`8755dde`](https://github.com/apache/spark/commit/8755dde3997e12adc22c4282cadbd1ee59e6d99d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #74690 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74690/testReport)** for PR 17324 at commit [`ac0683b`](https://github.com/apache/spark/commit/ac0683b6799e9d9090da9e2244b609c59717466b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108356925
  
    --- Diff: examples/src/main/python/ml/imputer.py ---
    @@ -0,0 +1,46 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# $example on$
    +from pyspark.ml.feature import Imputer
    +# $example off$
    +from pyspark.sql import SparkSession
    +
    +if __name__ == "__main__":
    +    spark = SparkSession\
    +        .builder\
    +        .appName("imputer example")\
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75031/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r107300477
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,61 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +Imputation transformer for completing missing values in the dataset, either using the mean or the 
    +median of the columns in which the missing value are located. The input columns should be of
    +DoubleType or FloatType. Currently Imputer does not support categorical features and possibly
    +creates incorrect values for a categorical feature. All Null values in the input column are
    +treated as missing, and so are also imputed.
    +
    +**Examples**
    +
    +Suppose that we have a DataFrame with the column `a` and `b`:
    +
    +~~~
    +      a     |      b      
    +------------|-----------
    +     1.0    | Double.NaN
    +     2.0    | Double.NaN
    + Double.NaN |     3.0   
    +     4.0    |     4.0   
    +     5.0    |     5.0   
    +~~~
    +
    +By default, Imputer will replace all the `Double.NaN` (missing value) with the mean (strategy) from
    --- End diff --
    
    Perhaps "In this example, Imputer will replace all occurrences of `Double.NaN` (the default for the missing value) with the mean (the default imputation strategy) from the other values in the corresponding columns".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108157169
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,64 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +The `Imputer` transformer completes missing values in the dataset, either using the mean or the 
    +median of the columns in which the missing value are located. The input columns should be of
    +`DoubleType` or `FloatType`. Currently `Imputer` does not support categorical features and possibly
    +creates incorrect values for a categorical feature.
    +
    +**Note** all `null` values in the input columns are treated as missing, and so are also imputed.
    +
    +**Examples**
    +
    +Suppose that we have a DataFrame with the column `a` and `b`:
    +
    +~~~
    +      a     |      b      
    +------------|-----------
    +     1.0    | Double.NaN
    +     2.0    | Double.NaN
    + Double.NaN |     3.0   
    +     4.0    |     4.0   
    +     5.0    |     5.0   
    +~~~
    +
    +In this example, Imputer will replace all occurrences of Double.NaN (the default for the missing value)
    +with the mean (the default imputation strategy) from the other values in the corresponding columns.
    --- End diff --
    
    "... computed from the other values ..."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75197 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75197/testReport)** for PR 17324 at commit [`a2e24c0`](https://github.com/apache/spark/commit/a2e24c0b1bd1e640a44e6da2d97c58fd1cbd0ddd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108156614
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,64 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +The `Imputer` transformer completes missing values in the dataset, either using the mean or the 
    +median of the columns in which the missing value are located. The input columns should be of
    +`DoubleType` or `FloatType`. Currently `Imputer` does not support categorical features and possibly
    +creates incorrect values for a categorical feature.
    --- End diff --
    
    "... creates incorrect values for columns containing categorical features."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108358270
  
    --- Diff: examples/src/main/python/ml/imputer_example.py ---
    @@ -0,0 +1,51 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# $example on$
    +from pyspark.ml.feature import Imputer
    +# $example off$
    +from pyspark.sql import SparkSession
    +
    +"""
    +An example demonstrating Imputer.
    +Run with:
    +  bin/spark-submit examples/src/main/python/ml/imputer_example.py
    +"""
    +
    +if __name__ == "__main__":
    +    spark = SparkSession\
    +        .builder\
    +        .appName("ImputerExample")\
    +        .getOrCreate()
    +
    +    # $example on$
    +    df = spark.createDataFrame([
    +        (1.0, float("nan")),
    +        (2.0, float("nan")),
    +        (float("nan"), 3.0),
    +        (4.0, 4.0),
    +        (5.0, 5.0)
    +    ], ["a", "b"])
    +
    +    imputer = Imputer(inputCols=["a", "b"], outputCols=["out_a", "out_b"])
    +    imputerModel = imputer.fit(df)
    +
    +    imputedData = imputerModel.transform(df)
    --- End diff --
    
    In the other examples we just do `model.transform(df).show()` so let's be consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Updated with python example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #74690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74690/testReport)** for PR 17324 at commit [`ac0683b`](https://github.com/apache/spark/commit/ac0683b6799e9d9090da9e2244b609c59717466b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17324


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #74689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74689/testReport)** for PR 17324 at commit [`f2e7a69`](https://github.com/apache/spark/commit/f2e7a69badc9d4e0352fcfe09e8d18cdfe007d9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r107299583
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/ImputerExample.scala ---
    @@ -0,0 +1,52 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.ml
    +
    +// $example on$
    +import org.apache.spark.ml.feature.Imputer
    +// $example off$
    +import org.apache.spark.sql.SparkSession
    +
    --- End diff --
    
    Most examples have a small doc string that includes a "Run with:" part - see e.g. the recent `MinHashLSHExample`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108357089
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaImputerExample.java ---
    @@ -0,0 +1,72 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.ml;
    +
    +// $example on$
    +import java.util.Arrays;
    +import java.util.List;
    +
    +import org.apache.spark.ml.feature.Imputer;
    +import org.apache.spark.ml.feature.ImputerModel;
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +import org.apache.spark.sql.RowFactory;
    +import org.apache.spark.sql.SparkSession;
    +import org.apache.spark.sql.types.*;
    +// $example off$
    +
    +import static org.apache.spark.sql.types.DataTypes.*;
    +
    +/**
    + * An example demonstrating Imputer.
    + * Run with:
    + *   bin/run-example ml.JavaImputerExample
    + */
    +public class JavaImputerExample {
    +  public static void main(String[] args) {
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaImputerExample")
    +      .getOrCreate();
    +
    +    // $example on$
    +    List<Row> data = Arrays.asList(
    +      RowFactory.create(1.0, Double.NaN),
    +      RowFactory.create(2.0, Double.NaN),
    +      RowFactory.create(Double.NaN, 3.0),
    +      RowFactory.create(4.0, 4.0),
    +      RowFactory.create(5.0, 5.0)
    +    );
    +    StructType schema = new StructType(new StructField[]{
    +      createStructField("a", DoubleType, false),
    +      createStructField("b", DoubleType, false)
    +    });
    +    Dataset<Row> df = spark.createDataFrame(data, schema);
    +
    +    Imputer imputerModel = new Imputer()
    +      .setStrategy("mean")
    --- End diff --
    
    It's not a big deal - still I think it's not necessary to illustrate `setStrategy("mean")` as we already mention in the user guide what the defaults are.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108880024
  
    --- Diff: examples/src/main/python/ml/imputer_example.py ---
    @@ -0,0 +1,50 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# $example on$
    +from pyspark.ml.feature import Imputer
    +# $example off$
    +from pyspark.sql import SparkSession
    +
    +"""
    +An example demonstrating Imputer.
    +Run with:
    +  bin/spark-submit examples/src/main/python/ml/imputer_example.py
    +"""
    +
    +if __name__ == "__main__":
    +    spark = SparkSession\
    +        .builder\
    +        .appName("ImputerExample")\
    +        .getOrCreate()
    +
    +    # $example on$
    +    df = spark.createDataFrame([
    +        (1.0, float("nan")),
    +        (2.0, float("nan")),
    +        (float("nan"), 3.0),
    +        (4.0, 4.0),
    +        (5.0, 5.0)
    +    ], ["a", "b"])
    +
    +    imputer = Imputer(inputCols=["a", "b"], outputCols=["out_a", "out_b"])
    +    model = imputer.fit(df)
    +
    +    model.transform(df).select("a", "b", "out_a", "out_b").show()
    --- End diff --
    
    In previous comment I wasn't totally clear, sorry! I mean let's _only_ have the `transform(df).show()` - so we can remove the `select` here as it's unnecessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108235749
  
    --- Diff: examples/src/main/python/ml/imputer.py ---
    @@ -0,0 +1,46 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# $example on$
    +from pyspark.ml.feature import Imputer
    +# $example off$
    +from pyspark.sql import SparkSession
    +
    +if __name__ == "__main__":
    +    spark = SparkSession\
    +        .builder\
    +        .appName("imputer example")\
    --- End diff --
    
    Sure. For consistency, how about just keep it "ImputerExample"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108358002
  
    --- Diff: examples/src/main/python/ml/imputer_example.py ---
    @@ -0,0 +1,51 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# $example on$
    +from pyspark.ml.feature import Imputer
    +# $example off$
    +from pyspark.sql import SparkSession
    +
    +"""
    +An example demonstrating Imputer.
    +Run with:
    +  bin/spark-submit examples/src/main/python/ml/imputer_example.py
    +"""
    +
    +if __name__ == "__main__":
    +    spark = SparkSession\
    +        .builder\
    +        .appName("ImputerExample")\
    +        .getOrCreate()
    +
    +    # $example on$
    +    df = spark.createDataFrame([
    +        (1.0, float("nan")),
    +        (2.0, float("nan")),
    +        (float("nan"), 3.0),
    +        (4.0, 4.0),
    +        (5.0, 5.0)
    +    ], ["a", "b"])
    +
    +    imputer = Imputer(inputCols=["a", "b"], outputCols=["out_a", "out_b"])
    +    imputerModel = imputer.fit(df)
    --- End diff --
    
    just `model`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108156947
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,64 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +The `Imputer` transformer completes missing values in the dataset, either using the mean or the 
    +median of the columns in which the missing value are located. The input columns should be of
    +`DoubleType` or `FloatType`. Currently `Imputer` does not support categorical features and possibly
    +creates incorrect values for a categorical feature.
    +
    +**Note** all `null` values in the input columns are treated as missing, and so are also imputed.
    +
    +**Examples**
    +
    +Suppose that we have a DataFrame with the column `a` and `b`:
    --- End diff --
    
    columns


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    The test was interrupted and need a retest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75271/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75059/testReport)** for PR 17324 at commit [`8755dde`](https://github.com/apache/spark/commit/8755dde3997e12adc22c4282cadbd1ee59e6d99d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75346/testReport)** for PR 17324 at commit [`48a1361`](https://github.com/apache/spark/commit/48a136133fe83b5e4c2408e4391c15fdefead901).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108597395
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaImputerExample.java ---
    @@ -0,0 +1,72 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.ml;
    +
    +// $example on$
    +import java.util.Arrays;
    +import java.util.List;
    +
    +import org.apache.spark.ml.feature.Imputer;
    +import org.apache.spark.ml.feature.ImputerModel;
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +import org.apache.spark.sql.RowFactory;
    +import org.apache.spark.sql.SparkSession;
    +import org.apache.spark.sql.types.*;
    +// $example off$
    +
    +import static org.apache.spark.sql.types.DataTypes.*;
    +
    +/**
    + * An example demonstrating Imputer.
    + * Run with:
    + *   bin/run-example ml.JavaImputerExample
    + */
    +public class JavaImputerExample {
    +  public static void main(String[] args) {
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaImputerExample")
    +      .getOrCreate();
    +
    +    // $example on$
    +    List<Row> data = Arrays.asList(
    +      RowFactory.create(1.0, Double.NaN),
    +      RowFactory.create(2.0, Double.NaN),
    +      RowFactory.create(Double.NaN, 3.0),
    +      RowFactory.create(4.0, 4.0),
    +      RowFactory.create(5.0, 5.0)
    +    );
    +    StructType schema = new StructType(new StructField[]{
    +      createStructField("a", DoubleType, false),
    +      createStructField("b", DoubleType, false)
    +    });
    +    Dataset<Row> df = spark.createDataFrame(data, schema);
    +
    +    Imputer imputerModel = new Imputer()
    --- End diff --
    
    Thanks for finding this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75058/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108155877
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,64 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +The `Imputer` transformer completes missing values in the dataset, either using the mean or the 
    --- End diff --
    
    "values in the dataset" -> "values in a dataset"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r107300180
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,61 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +Imputation transformer for completing missing values in the dataset, either using the mean or the 
    +median of the columns in which the missing value are located. The input columns should be of
    +DoubleType or FloatType. Currently Imputer does not support categorical features and possibly
    +creates incorrect values for a categorical feature. All Null values in the input column are
    --- End diff --
    
    Perhaps on a new line: 
    
    *Note* all null values in the input column ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75346/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r107299250
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,61 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +Imputation transformer for completing missing values in the dataset, either using the mean or the 
    +median of the columns in which the missing value are located. The input columns should be of
    +DoubleType or FloatType. Currently Imputer does not support categorical features and possibly
    +creates incorrect values for a categorical feature. All Null values in the input column are
    +treated as missing, and so are also imputed.
    +
    +**Examples**
    +
    +Suppose that we have a DataFrame with the column `a` and `b`:
    +
    +~~~
    +      a     |      b      
    +------------|-----------
    +     1.0    | Double.NaN
    +     2.0    | Double.NaN
    + Double.NaN |     3.0   
    +     4.0    |     4.0   
    +     5.0    |     5.0   
    +~~~
    +
    +By default, Imputer will replace all the `Double.NaN` (missing value) with the mean (strategy) from
    +other values in the corresponding columns. In our example, the surrogates for `a` and `b` are 3.0
    +and 4.0 respectively. After transformation, the output columns will not contain missing value anymore.
    --- End diff --
    
    Perhaps "After transformation, the missing values in the output columns will be replaced by the surrogate value computed for that column"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r107300097
  
    --- Diff: docs/ml-features.md ---
    @@ -1284,6 +1284,61 @@ for more details on the API.
     
     </div>
     
    +
    +## Imputer
    +
    +Imputation transformer for completing missing values in the dataset, either using the mean or the 
    +median of the columns in which the missing value are located. The input columns should be of
    +DoubleType or FloatType. Currently Imputer does not support categorical features and possibly
    --- End diff --
    
    Backticks for `DoubleType` and `FloatType`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75396/testReport)** for PR 17324 at commit [`e17f997`](https://github.com/apache/spark/commit/e17f997518782014b3c3dc1c33d69aecfcb0d38c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    **[Test build #75383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75383/testReport)** for PR 17324 at commit [`48a1361`](https://github.com/apache/spark/commit/48a136133fe83b5e4c2408e4391c15fdefead901).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Viewed generated docs and ran examples locally.
    
    \U0001f44d 
    
    Merged to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17324
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75396/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17324: [SPARK-19969] [ML] Imputer doc and example

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17324#discussion_r108357950
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaImputerExample.java ---
    @@ -0,0 +1,72 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.ml;
    +
    +// $example on$
    +import java.util.Arrays;
    +import java.util.List;
    +
    +import org.apache.spark.ml.feature.Imputer;
    +import org.apache.spark.ml.feature.ImputerModel;
    +import org.apache.spark.sql.Dataset;
    +import org.apache.spark.sql.Row;
    +import org.apache.spark.sql.RowFactory;
    +import org.apache.spark.sql.SparkSession;
    +import org.apache.spark.sql.types.*;
    +// $example off$
    +
    +import static org.apache.spark.sql.types.DataTypes.*;
    +
    +/**
    + * An example demonstrating Imputer.
    + * Run with:
    + *   bin/run-example ml.JavaImputerExample
    + */
    +public class JavaImputerExample {
    +  public static void main(String[] args) {
    +    SparkSession spark = SparkSession
    +      .builder()
    +      .appName("JavaImputerExample")
    +      .getOrCreate();
    +
    +    // $example on$
    +    List<Row> data = Arrays.asList(
    +      RowFactory.create(1.0, Double.NaN),
    +      RowFactory.create(2.0, Double.NaN),
    +      RowFactory.create(Double.NaN, 3.0),
    +      RowFactory.create(4.0, 4.0),
    +      RowFactory.create(5.0, 5.0)
    +    );
    +    StructType schema = new StructType(new StructField[]{
    +      createStructField("a", DoubleType, false),
    +      createStructField("b", DoubleType, false)
    +    });
    +    Dataset<Row> df = spark.createDataFrame(data, schema);
    +
    +    Imputer imputerModel = new Imputer()
    --- End diff --
    
    Sorry just noticed this `imputerModel` here and `model` below. Let's call it `imputer` and `model`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org