You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/08/15 02:54:20 UTC

[jira] [Assigned] (SPARK-17055) add labelKFold to CrossValidator

     [ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-17055:
------------------------------------

    Assignee: Apache Spark

> add labelKFold to CrossValidator
> --------------------------------
>
>                 Key: SPARK-17055
>                 URL: https://issues.apache.org/jira/browse/SPARK-17055
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 2.0.0
>            Reporter: Vincent
>            Assignee: Apache Spark
>            Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the samples in k groups of samples. But in cases when data is gathered from different subjects and we want to avoid over-fitting, we want to hold out samples with certain labels from training data and put them into validation fold, i.e. we want to ensure that the same label is not in both testing and training sets.
> Mainstream package like Sklearn already supports such cross validation method. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org