You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:20:15 UTC

[jira] [Updated] (SPARK-16149) API consistency discussion: CountVectorizer.{minDF -> minDocFreq, minTF -> minTermFreq}

     [ https://issues.apache.org/jira/browse/SPARK-16149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-16149:
---------------------------------
    Labels: bulk-closed  (was: )

> API consistency discussion: CountVectorizer.{minDF -> minDocFreq, minTF -> minTermFreq}
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-16149
>                 URL: https://issues.apache.org/jira/browse/SPARK-16149
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: MLlib
>    Affects Versions: 2.0.0
>            Reporter: Xiangrui Meng
>            Priority: Major
>              Labels: bulk-closed
>
> We used `minDF` and `minTF` in CountVectorizer and `minDocFreq` in IDF. It would be nice to keep the naming consistent. This was discussed in https://github.com/apache/spark/pull/7388 and the decision was made based on sklearn compatibility. However, we didn't look broadly across MLlib APIs. Maybe we can live with this small inconsistency but it would be nice to discuss the guideline (consistent with other libraries or existing ones in MLlib).
> cc: [~josephkb] [~yuhaoyan]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org