You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Adamantios Corais <ad...@gmail.com> on 2015/08/28 11:16:40 UTC

How to determine a good set of parameters for a ML grid search task?

I have a sparse dataset of size 775946 x 845372. I would like to perform a
grid search in order to tune the parameters of my LogisticRegressionWithSGD
model. I have noticed that the building of each model takes about 300 to
400 seconds. That means that in order to try all possible combinations of
parameters I have to wait for about 24 hours. Most importantly though, I am
not sure if the following combinations make sense at all. So, how should I
pick up those parameters more wisely as well as in a way that I can wait
less time?

  val numIterations = Seq(100 , 500 , 1000 , 5000 , 10000 , 50000 , 100000
> , 500000)
>   val stepSizes = Seq(10 , 50 , 100 , 500 , 1000 , 5000 , 10000 , 50000)
>   val miniBatchFractions = Seq(1.0)
>   val updaters = Seq(new SimpleUpdater , new SquaredL2Updater , new
> L1Updater)


Any advice is appreciated.


*// Adamantios*