You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gnsiva <gi...@git.apache.org> on 2017/11/10 10:17:57 UTC

[GitHub] spark pull request #19716: [SPARK-18755][WIP][ML] Random search implementati...

GitHub user gnsiva opened a pull request:

    https://github.com/apache/spark/pull/19716

    [SPARK-18755][WIP][ML] Random search implementation using RandomParamGridBuilder

    ## What changes were proposed in this pull request?
    
    Python `sklearn` has an implementation of random search to complement grid search. This usually allows for more efficient hyperparameter tuning.
    
    I have developed an alternative class to `ParamGridBuilder` called `RandomParamGridBuilder` which will facilitate random search in Spark. This works by creating the same data structure as the output of `ParamGridBuilder.build()` (and so is compatible with the existing `CrossValidator` class), through random sampling.
    
    The main methods by which the random sampling is used in the `sklearn` implementation are as follows:
    
    - Sampling through options e.g. `[0.01, 0.001, 0.0001]` 
      - This is handled using `addUniformChoice`
    - Sampling and integer/long/float/double between bounds 
      - `RandomParamGrid` has the `addUniformDistribution` method for this
    - Boolean sampling
      - `RandomParamGrid.addUniformDistribution` supports this as well
    - Or sampling over a more exotic distribution (e.g. beta or Cauchy)
      - Here the user can implement their own function which when called returns a value from the intended distribution, and add that using `addDistribution` thereby allowing full flexibility.
    
    ## How was this patch tested?
    
    Several unit tests have been created for the `RandomParamGridBuilder` in `RandomParamGridBuilderSuite`. 
    In `CrossValidatorSuite`, two tests were changed to run with param maps created by `RandomParamGridBuilder` as well as those from `ParamGridBuilder`. One additional test was added there as well (altered version of a `ParamGridBuilder` test). 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gnsiva/spark RandomParamGridBuilder

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19716.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19716
    
----
commit 12d194674218f5a6a076c2248faa81317e62ec82
Author: Ganesh N. Sivalingam <ga...@parkopedia.com>
Date:   2017-11-09T15:37:20Z

    Added RandomParamGridBuilder implementation

commit df47063052ae58dc4ca01398726ebbae7b727786
Author: Ganesh N. Sivalingam <ga...@parkopedia.com>
Date:   2017-11-09T15:37:50Z

    Added RandomParamGridBuilder unittests (which don't interact with CV)

commit ba58e83a0ad81ca1c98708e34daaae62b75dcc9f
Author: Ganesh N. Sivalingam <ga...@parkopedia.com>
Date:   2017-11-09T16:47:34Z

    Added 3 tests in the CrossValidatorSuite that use random search

commit f6fd40adcf4240c0d6675809ad4eddc72c23b85a
Author: Ganesh N. Sivalingam <ga...@parkopedia.com>
Date:   2017-11-09T16:55:39Z

    Simplified logistic regression test

commit cbbd0487525ddbbd7cdaaaa1e9e2fceae10f24a1
Author: Ganesh N. Sivalingam <ga...@parkopedia.com>
Date:   2017-11-09T17:02:29Z

    Style guide changes

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19716: [SPARK-18755][WIP][ML] Random search implementation usin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19716
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19716: [SPARK-18755][WIP][ML] Random search implementation usin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19716
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19716: [SPARK-18755][WIP][ML] Random search implementation usin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19716
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19716: [SPARK-18755][WIP][ML] Random search implementation usin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19716
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19716: [SPARK-18755][WIP][ML] Random search implementation usin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19716
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19716: [SPARK-18755][WIP][ML] Random search implementation usin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19716
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org