You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2017/11/05 22:51:00 UTC
[jira] [Assigned] (SPARK-18755) Add Randomized Grid Search to Spark
ML
[ https://issues.apache.org/jira/browse/SPARK-18755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-18755:
------------------------------------
Assignee: Apache Spark
> Add Randomized Grid Search to Spark ML
> --------------------------------------
>
> Key: SPARK-18755
> URL: https://issues.apache.org/jira/browse/SPARK-18755
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Reporter: yuhao yang
> Assignee: Apache Spark
>
> Randomized Grid Search implements a randomized search over parameters, where each setting is sampled from a distribution over possible parameter values. This has two main benefits over an exhaustive search:
> 1. A budget can be chosen independent of the number of parameters and possible values.
> 2. Adding parameters that do not influence the performance does not decrease efficiency.
> Randomized Grid search usually gives similar result as exhaustive search, while the run time for randomized search is drastically lower.
> For more background, please refer to:
> sklearn: http://scikit-learn.org/stable/modules/grid_search.html
> http://blog.kaggle.com/2015/07/16/scikit-learn-video-8-efficiently-searching-for-optimal-tuning-parameters/
> http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
> https://www.r-bloggers.com/hyperparameter-optimization-in-h2o-grid-search-random-search-and-the-future/.
> There're two ways to implement this in Spark as I see:
> 1. Add searchRatio to ParamGridBuilder and conduct sampling directly during build. Only 1 new public function is required.
> 2. Add trait RadomizedSearch and create new class RandomizedCrossValidator and RandomizedTrainValiationSplit, which can be complicated since we need to deal with the models.
> I'd prefer option 1 as it's much simpler and straightforward. We can support Randomized grid search via some smallest change.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org