You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by actuaryzhang <gi...@git.apache.org> on 2017/05/04 22:27:02 UTC

[GitHub] spark pull request #17864: [SPARK-20604][ML] Allow imputer to handle numeric...

GitHub user actuaryzhang opened a pull request:

    https://github.com/apache/spark/pull/17864

    [SPARK-20604][ML] Allow imputer to handle numeric types

    ## What changes were proposed in this pull request?
    
    Imputer currently requires input column to be Double or Float, but the logic should work on any numeric data types. Many practical problems have integer  data types, and it could get very tedious to manually cast them into Double before calling imputer. This transformer could be extended to handle all numeric types.  
    
    ## How was this patch tested?
    new test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/actuaryzhang/spark imputer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17864.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17864
    
----
commit e9ab39c2bdca76dae2b5cc40f90e4f5b2f9416c8
Author: Wayne Zhang <ac...@uber.com>
Date:   2017-05-04T22:18:46Z

    allow imputer to handle numeric types

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76511/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    Any committer has a chance to take another look at this PR? Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    Shall we pay extra attention to the Int case? E.g. input column contains 
    Double.Nan, 
    1, 
    2. 
    
    The current implementation will return surrogate as 1.5. I'm not sure if it's the expectation for some users. 
    
    It's fine by me but just bring up the issue in case it's missed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17864: [SPARK-20604][ML] Allow imputer to handle numeric...

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17864#discussion_r118600408
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala ---
    @@ -94,12 +94,13 @@ private[feature] trait ImputerParams extends Params with HasInputCols {
      * :: Experimental ::
      * Imputation estimator for completing missing values, either using the mean or the median
      * of the columns in which the missing values are located. The input columns should be of
    - * DoubleType or FloatType. Currently Imputer does not support categorical features
    + * numeric type. Currently Imputer does not support categorical features
      * (SPARK-15041) and possibly creates incorrect values for a categorical feature.
      *
      * Note that the mean/median value is computed after filtering out missing values.
      * All Null values in the input columns are treated as missing, and so are also imputed. For
      * computing median, DataFrameStatFunctions.approxQuantile is used with a relative error of 0.001.
    + * The output column is always of Double type regardless of the input column type.
    --- End diff --
    
    @MLnick Here is the note on always returning Double type. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    Thanks for following up on this, Felix. 
    Still waiting for an agreement on this...
    Would like to have more direction on this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    **[Test build #76468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76468/testReport)** for PR 17864 at commit [`e9ab39c`](https://github.com/apache/spark/commit/e9ab39c2bdca76dae2b5cc40f90e4f5b2f9416c8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    @MLnick Thanks much for your comments. Yes, I think always returning Double is consistent with Python and R and also other transformers in ML. Plus, as @hhbyyh mentioned, this makes the implementation easier. Would you mind taking a look at the code and let me know if there is any suggestion for improvement? The doc is already updated to make it clear that it always returns Double regardless of the input type. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    Originally the idea behind only supporting double was as @sethah posted above - there could be some issues with handling of int casting etc. As mentioned originally, we did consider "always cast to double". The only issue with it is the potential for surprising users who may expect the type of the input column to be maintained in the imputation.
    
    Having said that I would be broadly ok with just appending a double output column, provided we update the docstrings / guide to make things very clear.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    @hhbyyh @sethah @MLnick 
    Could you take a look at the new commit? Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    So the other PR https://github.com/apache/spark/pull/11601 is really long. For reference, I am picking out the relevant discussions to this PR (also someone tell me if there's a better way to link to pr comments :)
    
    @MLnick "what do you think about handling different numeric types in input/output columns? If the input is say IntType, then strategy mode andmedian is ok but mean is somewhat problematic - or are we ok with rounding to and Int? The alternative is the Imputer always appends a Double output column.
    
    I propose we either (a) do the cast back to input type, but if the user selected "mean" and the input type is not Float or Double, log a warning; or (b) only support Float and Double type for this initial version of the Imputer."
    
    @jkbradley "Just catching up now... I like the idea of maintaining the input type. I'm imagining using an Imputer to fill in continuous features with the mean and categoricals with the mode. Later on, we could even check to see if a column is categorical (in the metadata) and throw an exception for mean.
    
    I'd prefer your option (b) to be safe."
    
    @sethah "For reference, I checked scikit-learn and the Imputer class returns floats regardless of inputs. I also checked R package "mlr" and it appears to do the same. One concern with a.) would be if the true median was something like 5.0, but approxQuantile returned 4.999999999. Then, we cast back to IntegerType and return 4. I wasn't able to produce this situation when I briefly experimented with it, and also the median is already approximate, so I'm not sure if this is really a problem."



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    **[Test build #76513 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76513/testReport)** for PR 17864 at commit [`86c8a10`](https://github.com/apache/spark/commit/86c8a1061a366f96b3db8acd2a4d1ace3bb81ee3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    I imagine most Int features will need to be converted to Double for a Vector, thus returns Double regardless the input type makes sense, which also makes the implementation more straight forward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    @sethah Thanks for summarizing the previous discussions. 
    What are you suggesting for this PR? I think it makes sense to log a warning when imputing integer types with mean. In addition, perhaps we can set "median" as the default strategy. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    what's next on this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    **[Test build #76468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76468/testReport)** for PR 17864 at commit [`e9ab39c`](https://github.com/apache/spark/commit/e9ab39c2bdca76dae2b5cc40f90e4f5b2f9416c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    Ping folks for comments/review. Many thanks. 
    @viirya @MLnick @jkbradley @hhbyyh @yanboliang @BenFradet 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76513/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    @hhbyyh Thanks for the suggestion. I have made a new commit that always casts the input to double and outputs the imputed column as double. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    @yanboliang @srowen @MLnick @jkbradley 
    
    The example below shows failure of Imputer on integer data. 
    ```
        val df = spark.createDataFrame( Seq(
          (0, 1.0, 1.0, 1.0),
          (1, 11.0, 11.0, 11.0),
          (2, 1.5, 1.5, 1.5),
          (3, Double.NaN, 4.5, 1.5)
        )).toDF("id", "value1", "expected_mean_value1", "expected_median_value1")
        val imputer = new Imputer()
          .setInputCols(Array("value1"))
          .setOutputCols(Array("out1"))
        imputer.fit(df.withColumn("value1", col("value1").cast(IntegerType)))
    
    java.lang.IllegalArgumentException: requirement failed: Column value1 must be of type equal to one of the following types: [DoubleType, FloatType] but was actually of type IntegerType.
    
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    **[Test build #76513 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76513/testReport)** for PR 17864 at commit [`86c8a10`](https://github.com/apache/spark/commit/86c8a1061a366f96b3db8acd2a4d1ace3bb81ee3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    **[Test build #76511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76511/testReport)** for PR 17864 at commit [`6479965`](https://github.com/apache/spark/commit/6479965f5d49965dfa59fd65730668f1b8f3ddd5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by actuaryzhang <gi...@git.apache.org>.
Github user actuaryzhang commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    We can log a warning or issue an error if the input column is int and the imputation is by mean.
    Would like to know if that's OK with you? @hhbyyh @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76468/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17864: [SPARK-20604][ML] Allow imputer to handle numeric types

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17864
  
    **[Test build #76511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76511/testReport)** for PR 17864 at commit [`6479965`](https://github.com/apache/spark/commit/6479965f5d49965dfa59fd65730668f1b8f3ddd5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org