You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ml...@apache.org on 2018/01/19 10:43:29 UTC

spark git commit: [SPARK-23127][DOC] Update FeatureHasher guide for categoricalCols parameter

Repository: spark
Updated Branches:
  refs/heads/master 9c4b99861 -> 60203fca6


[SPARK-23127][DOC] Update FeatureHasher guide for categoricalCols parameter

Update user guide entry for `FeatureHasher` to match the Scala / Python doc, to describe the `categoricalCols` parameter.

## How was this patch tested?

Doc only

Author: Nick Pentreath <ni...@za.ibm.com>

Closes #20293 from MLnick/SPARK-23127-catCol-userguide.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/60203fca
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/60203fca
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/60203fca

Branch: refs/heads/master
Commit: 60203fca6a605ad158184e1e0ce5187e144a3ea7
Parents: 9c4b998
Author: Nick Pentreath <ni...@za.ibm.com>
Authored: Fri Jan 19 12:43:23 2018 +0200
Committer: Nick Pentreath <ni...@za.ibm.com>
Committed: Fri Jan 19 12:43:23 2018 +0200

----------------------------------------------------------------------
 docs/ml-features.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/60203fca/docs/ml-features.md
----------------------------------------------------------------------
diff --git a/docs/ml-features.md b/docs/ml-features.md
index 7264313..10183c3 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -222,9 +222,9 @@ The `FeatureHasher` transformer operates on multiple columns. Each column may co
 numeric or categorical features. Behavior and handling of column data types is as follows:
 
 - Numeric columns: For numeric features, the hash value of the column name is used to map the
-feature value to its index in the feature vector. Numeric features are never treated as
-categorical, even when they are integers. You must explicitly convert numeric columns containing
-categorical features to strings first.
+feature value to its index in the feature vector. By default, numeric features are not treated
+as categorical (even when they are integers). To treat them as categorical, specify the relevant
+columns using the `categoricalCols` parameter.
 - String columns: For categorical features, the hash value of the string "column_name=value"
 is used to map to the vector index, with an indicator value of `1.0`. Thus, categorical features
 are "one-hot" encoded (similarly to using [OneHotEncoder](ml-features.html#onehotencoder) with


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org