You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by debasishg <gh...@gmail.com> on 2016/11/22 18:27:04 UTC
parallelizing model training ..
Hello -
I have a question on parallelization of model training in Spark ..
Suppose I have this code fragment for training a model with KMeans ..
labeledData.foreachRDD { rdd =>
val normalizedData: RDD[Vector] = normalize(rdd)
val trainedModel: KMeansModel = trainModel(normalizedData, noOfClusters)
//.. compute WCSSE
}
Here labeledData is a DStream that I fetched from Kafka.
Is there any way I can use the above fragment to train multiple models
parallely with different values of noOfClusters ? e.g.
(1 to 100).foreach { i =>
labeledData.foreachRDD { rdd =>
val normalizedData: RDD[Vector] = normalize(rdd)
val trainedModel: KMeansModel = trainModel(normalizedData, i)
//.. compute WCSSE
}
}
which will use all available CPUs parallely for the training ..
regards.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/parallelizing-model-training-tp28118.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org