You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by sr...@apache.org on 2017/10/30 07:25:22 UTC
spark git commit: Added more information to Imputer
Repository: spark
Updated Branches:
refs/heads/master 188b47e68 -> 6eda55f72
Added more information to Imputer
Often times we want to impute custom values other than 'NaN'. My addition helps people locate this function without reading the API.
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: tengpeng <te...@users.noreply.github.com>
Closes #19600 from tengpeng/patch-5.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6eda55f7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6eda55f7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6eda55f7
Branch: refs/heads/master
Commit: 6eda55f728a6f2e265ae12a7e01dae88e4172715
Parents: 188b47e
Author: tengpeng <te...@users.noreply.github.com>
Authored: Mon Oct 30 07:24:55 2017 +0000
Committer: Sean Owen <so...@cloudera.com>
Committed: Mon Oct 30 07:24:55 2017 +0000
----------------------------------------------------------------------
docs/ml-features.md | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/6eda55f7/docs/ml-features.md
----------------------------------------------------------------------
diff --git a/docs/ml-features.md b/docs/ml-features.md
index 86a0e09..7264313 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -1373,7 +1373,9 @@ for more details on the API.
The `Imputer` transformer completes missing values in a dataset, either using the mean or the
median of the columns in which the missing values are located. The input columns should be of
`DoubleType` or `FloatType`. Currently `Imputer` does not support categorical features and possibly
-creates incorrect values for columns containing categorical features.
+creates incorrect values for columns containing categorical features. Imputer can impute custom values
+other than 'NaN' by `.setMissingValue(custom_value)`. For example, `.setMissingValue(0)` will impute
+all occurrences of (0).
**Note** all `null` values in the input columns are treated as missing, and so are also imputed.
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org