You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ganesh Sivalingam (JIRA)" <ji...@apache.org> on 2017/11/10 10:23:00 UTC

[jira] [Comment Edited] (SPARK-18755) Add Randomized Grid Search to Spark ML

    [ https://issues.apache.org/jira/browse/SPARK-18755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247316#comment-16247316 ] 

Ganesh Sivalingam edited comment on SPARK-18755 at 11/10/17 10:22 AM:
----------------------------------------------------------------------

[~yuhaoyan] Whilst working at Parkopedia, I have developed a solution for this and would like to submit it to the community for review. 

It is not exactly one of the two options you gave, but is closer to the first. My approach was to create a {{RandomParamGridBuilder}} class which generates the same param map data structure as {{ParamGridBuilder}}, but does that through random sampling of provided parameter distributions.

The build method takes an integer as an argument which indicates how many iterations of random search you would like to perform (and so how many parameter maps to produce). The method can be run again to generate a new set of random parameters.

This change needs no changes to the code base other than to add this single class (excluding tests).


was (Author: gnsiva):
[~yuhaoyan] Whilst working at Parkopedia, I have developed a solution for this and would like to submit it to the community for review. 

It is not exactly one of the two options you gave, but is closer to the first. My approach was to create a `RandomParamGridBuilder` class which generates the same param map data structure as `ParamGridBuilder`, but does that through random sampling of provided parameter distributions.

The build method takes an integer as an argument which indicates how many iterations of random search you would like to perform (and so how many parameter maps to produce). The method can be run again to generate a new set of random parameters.

This change needs no changes to the code base other than to add this single class (exluding tests).

> Add Randomized Grid Search to Spark ML
> --------------------------------------
>
>                 Key: SPARK-18755
>                 URL: https://issues.apache.org/jira/browse/SPARK-18755
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>            Reporter: yuhao yang
>
> Randomized Grid Search  implements a randomized search over parameters, where each setting is sampled from a distribution over possible parameter values. This has two main benefits over an exhaustive search:
> 1. A budget can be chosen independent of the number of parameters and possible values.
> 2. Adding parameters that do not influence the performance does not decrease efficiency.
> Randomized Grid search usually gives similar result as exhaustive search, while the run time for randomized search is drastically lower.
> For more background, please refer to:
> sklearn: http://scikit-learn.org/stable/modules/grid_search.html
> http://blog.kaggle.com/2015/07/16/scikit-learn-video-8-efficiently-searching-for-optimal-tuning-parameters/
> http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
> https://www.r-bloggers.com/hyperparameter-optimization-in-h2o-grid-search-random-search-and-the-future/.
> There're two ways to implement this in Spark as I see:
> 1. Add searchRatio to ParamGridBuilder and conduct sampling directly during build. Only 1 new public function is required.
> 2. Add trait RadomizedSearch and create new class RandomizedCrossValidator and RandomizedTrainValiationSplit, which can be complicated since we need to deal with the models.
> I'd prefer option 1 as it's much simpler and straightforward. We can support Randomized grid search via some smallest change.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org