You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2016/04/05 01:30:25 UTC

[jira] [Comment Edited] (SPARK-13629) Add binary toggle Param to CountVectorizer

    [ https://issues.apache.org/jira/browse/SPARK-13629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15225254#comment-15225254 ] 

Joseph K. Bradley edited comment on SPARK-13629 at 4/4/16 11:29 PM:
--------------------------------------------------------------------

I just realized that we should have added the binary toggle Param to CountVectorizer (the Estimator) as well.  (We need all Estimators to contain the Model Params so that users can configure the whole Pipeline/Estimator before running fit.)


was (Author: josephkb):
I just realized that we should have added the binary toggle Param to CountVectorizer (the Estimator) as well.  (We need all Estimators to contain the Model Params so that users can configure the whole Pipeline/Estimator before running fit. I'll create a JIRA for that.)  I'll create and link a JIRA for this and HashingTF.

> Add binary toggle Param to CountVectorizer
> ------------------------------------------
>
>                 Key: SPARK-13629
>                 URL: https://issues.apache.org/jira/browse/SPARK-13629
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Joseph K. Bradley
>            Assignee: yuhao yang
>            Priority: Minor
>             Fix For: 2.0.0
>
>
> It would be handy to add a binary toggle Param to CountVectorizer, as in the scikit-learn one: [http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html]
> If set, then all non-zero counts will be set to 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org