You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vincent (JIRA)" <ji...@apache.org> on 2016/08/22 09:14:20 UTC

[jira] [Comment Edited] (SPARK-17055) add labelKFold to CrossValidator

    [ https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430381#comment-15430381 ] 

Vincent edited comment on SPARK-17055 at 8/22/16 9:14 AM:
----------------------------------------------------------

well, a better model will have a better cv performance on validation data with unseen labels, so the final selected model will have a relatively better capability on predicting samples with unseen categories/labels in real case.


was (Author: vincexie):
well, a better model will have a better cv performance on data with unseen labels, so the final selected model will have a relatively better capability on predicting samples with unseen categories/labels in real case.

> add labelKFold to CrossValidator
> --------------------------------
>
>                 Key: SPARK-17055
>                 URL: https://issues.apache.org/jira/browse/SPARK-17055
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Vincent
>            Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the samples in k groups of samples. But in cases when data is gathered from different subjects and we want to avoid over-fitting, we want to hold out samples with certain labels from training data and put them into validation fold, i.e. we want to ensure that the same label is not in both testing and training sets.
> Mainstream packages like Sklearn already supports such cross validation method. (http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LabelKFold.html#sklearn.cross_validation.LabelKFold)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org