You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Krishna Kalyan (JIRA)" <ji...@apache.org> on 2016/06/22 13:28:58 UTC

[jira] [Commented] (SPARK-15254) Improve ML pipeline Cross Validation Scaladoc & PyDoc

    [ https://issues.apache.org/jira/browse/SPARK-15254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344304#comment-15344304 ] 

Krishna Kalyan commented on SPARK-15254:
----------------------------------------

Can I take up this task, if no one is working on it?. 

From what I understand, 
`Scaladoc`
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.CrossValidatorModel
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.CrossValidator

`PyDoc `
http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml (Sections below)
- CrossValidator
- CrossValidatorModel

Are the locations that need to have more information. Can you please confirm  @holdenk / [~mlnick] so that I can start working on the pull request?.

CrossValidator
CrossValidator begins by splitting the dataset into a set of folds which are used as separate training and test datasets; e.g., with k=3k=3 folds, CrossValidator will generate 3 (training, test) dataset pairs, each of which uses 2/3 of the data for training and 1/3 for testing.

CrossValidatorModel
An important task in ML is model selection, or using data to find the best model or parameters for a given task. This is also called tuning. Pipelines facilitate model selection by making it easy to tune an entire Pipeline at once, rather than tuning each element in the Pipeline separately.




> Improve ML pipeline Cross Validation Scaladoc & PyDoc
> -----------------------------------------------------
>
>                 Key: SPARK-15254
>                 URL: https://issues.apache.org/jira/browse/SPARK-15254
>             Project: Spark
>          Issue Type: Documentation
>          Components: Documentation, ML
>            Reporter: holdenk
>            Priority: Minor
>
> The ML pipeline Cross Validation Scaladoc & PyDoc is very sparse - we should fill this out with a more concrete description.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org