You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ho...@apache.org on 2019/02/21 16:37:06 UTC

[spark] branch master updated: [DOCS] MINOR Complement the document of stringOrderType for StringIndexer in PySpark

This is an automated email from the ASF dual-hosted git repository.

holden pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 91caf0b  [DOCS] MINOR Complement the document of stringOrderType for StringIndexer in PySpark
91caf0b is described below

commit 91caf0bfce4706a264fcfe222fa500354ce69ff1
Author: Liang-Chi Hsieh <vi...@gmail.com>
AuthorDate: Thu Feb 21 08:36:48 2019 -0800

    [DOCS] MINOR Complement the document of stringOrderType for StringIndexer in PySpark
    
    ## What changes were proposed in this pull request?
    
    We revised the behavior of the param `stringOrderType` of `StringIndexer` in case of equal frequency when under frequencyDesc/Asc. This isn't reflected in PySpark's document. We should do it.
    
    ## How was this patch tested?
    
    Only document change.
    
    Closes #23849 from viirya/py-stringindexer-doc.
    
    Authored-by: Liang-Chi Hsieh <vi...@gmail.com>
    Signed-off-by: Holden Karau <ho...@pigscanfly.ca>
---
 python/pyspark/ml/feature.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index 0d1e9bd..8583046 100755
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -2299,7 +2299,10 @@ class _StringIndexerParams(JavaParams, HasHandleInvalid, HasInputCol, HasOutputC
     stringOrderType = Param(Params._dummy(), "stringOrderType",
                             "How to order labels of string column. The first label after " +
                             "ordering is assigned an index of 0. Supported options: " +
-                            "frequencyDesc, frequencyAsc, alphabetDesc, alphabetAsc.",
+                            "frequencyDesc, frequencyAsc, alphabetDesc, alphabetAsc. " +
+                            "Default is frequencyDesc. In case of equal frequency when " +
+                            "under frequencyDesc/Asc, the strings are further sorted " +
+                            "alphabetically",
                             typeConverter=TypeConverters.toString)
 
     handleInvalid = Param(Params._dummy(), "handleInvalid", "how to handle invalid data (unseen " +


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org