You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sebastian Souyris (JIRA)" <ji...@apache.org> on 2016/11/01 19:19:59 UTC

[jira] [Comment Edited] (SPARK-17055) add groupKFold to CrossValidator

    [ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15626403#comment-15626403 ] 

Sebastian Souyris edited comment on SPARK-17055 at 11/1/16 7:19 PM:
--------------------------------------------------------------------

By the way, in case it is useful, the H2O Cross-Validation has the “fold_column” parameter which is a similar idea to groupKFold. 

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/cross-validation.html#


was (Author: ssouyris):
By the way, in case it is useful, the H2O Cross-Validation library has the “fold_column” parameter which is a similar idea to groupKFold. 

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/cross-validation.html#

> add groupKFold to CrossValidator
> --------------------------------
>
>                 Key: SPARK-17055
>                 URL: https://issues.apache.org/jira/browse/SPARK-17055
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Vincent
>            Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the samples in k groups of samples. But in cases when data is gathered from different subjects and we want to avoid over-fitting, we want to hold out samples with certain labels from training data and put them into validation fold, i.e. we want to ensure that the same label is not in both testing and training sets.
> Mainstream packages like Sklearn already supports such cross validation method. (http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LabelKFold.html#sklearn.cross_validation.LabelKFold)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org