You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Tomasz Dudek <me...@gmail.com> on 2018/01/19 11:59:08 UTC

[ML] Allow CrossValidation ParamGrid on SVMWithSGD

Hello,

is there any way to use CrossValidation's ParamGrid with SVMWithSGD?

usually, when e.g. using RandomForest you can specify a lot of parameters,
to automatise the param grid search (when used with CrossValidation)

val algorithm = new RandomForestClassifier()
val paramGrid = { new ParamGridBuilder()
  .addGrid(algorithm.impurity, Array("gini", "entropy"))
  .addGrid(algorithm.maxDepth, Array(3, 5, 10))
  .addGrid(algorithm.numTrees, Array(2, 3, 5, 15, 50))
  .addGrid(algorithm.minInfoGain, Array(0.01, 0.001))
  .addGrid(algorithm.minInstancesPerNode, Array(10, 50, 500))
  .build()
}

with SGDWIthSGD however, the parameters are inside GradientDescent. You can
explicitly tune the params, either by using SGDWithSGD's constructor or by
calling setters here:

val algorithm = new SVMWithSGD()
algorithm.optimizer.setMiniBatchFraction(256)
  .setNumIterations(200)
  .setRegParam(0.01)

those two ways however restrict me from using ParamGridBuilder correctly.

There are no such things as algorithm.optimizer.numIterations or
algorithm.optimizer.regParam, only setters(and ParamGrid requires Params,
not setters)

I could of course create each SVM model manually, create one huge Pipeline
with each model saving its result to different column and then manually
decide which performed the best. It requires a lot of coding and so far
CrossValidation's ParamGrid did that job for me instead.

Am I missing something? Is it WIP or is there any hack to do that?

Yours,
Tomasz

Re: [ML] Allow CrossValidation ParamGrid on SVMWithSGD

Posted by Nick Pentreath <ni...@gmail.com>.
SVMWithSGD sits in the older "mllib" package and is not compatible directly
with the DataFrame API. I suppose one could write a ML-API wrapper around
it.

However, there is LinearSVC in Spark 2.2.x:
http://spark.apache.org/docs/latest/ml-classification-regression.html#linear-support-vector-machine

You should use that instead I would say.

On Fri, 19 Jan 2018 at 13:59 Tomasz Dudek <me...@gmail.com>
wrote:

> Hello,
>
> is there any way to use CrossValidation's ParamGrid with SVMWithSGD?
>
> usually, when e.g. using RandomForest you can specify a lot of parameters,
> to automatise the param grid search (when used with CrossValidation)
>
> val algorithm = new RandomForestClassifier()
> val paramGrid = { new ParamGridBuilder()
>   .addGrid(algorithm.impurity, Array("gini", "entropy"))
>   .addGrid(algorithm.maxDepth, Array(3, 5, 10))
>   .addGrid(algorithm.numTrees, Array(2, 3, 5, 15, 50))
>   .addGrid(algorithm.minInfoGain, Array(0.01, 0.001))
>   .addGrid(algorithm.minInstancesPerNode, Array(10, 50, 500))
>   .build()
> }
>
> with SGDWIthSGD however, the parameters are inside GradientDescent. You
> can explicitly tune the params, either by using SGDWithSGD's constructor or
> by calling setters here:
>
> val algorithm = new SVMWithSGD()
> algorithm.optimizer.setMiniBatchFraction(256)
>   .setNumIterations(200)
>   .setRegParam(0.01)
>
> those two ways however restrict me from using ParamGridBuilder correctly.
>
> There are no such things as algorithm.optimizer.numIterations or
> algorithm.optimizer.regParam, only setters(and ParamGrid requires Params,
> not setters)
>
> I could of course create each SVM model manually, create one huge Pipeline
> with each model saving its result to different column and then manually
> decide which performed the best. It requires a lot of coding and so far
> CrossValidation's ParamGrid did that job for me instead.
>
> Am I missing something? Is it WIP or is there any hack to do that?
>
> Yours,
> Tomasz
>