You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by jglov <ja...@capsenrobotics.com> on 2016/10/20 17:11:49 UTC

Predict a single vector with the new spark.ml API to avoid groupByKey() after a flatMap()?

Is there a way to predict a single vector with the new spark.ml API, although
in my case it's because I want to do this within a map() to avoid calling
groupByKey() after a flatMap():

*Current code (pyspark):*

% Given 'model', 'rdd', and a function 'split_element' that splits an
element of the RDD into a list of elements (and assuming
% each element has both a value and a key so that groupByKey will work to
merge them later)

split_rdd = rdd.flatMap(split_element)
split_results = model.transform(split_rdd.toDF()).rdd
return split_results.groupByKey()

*Desired code:*

split_rdd = rdd.map(split_element)
split_results = split_rdd.map(lambda elem_list: [model.transformOne(elem)
for elem in elem_list])
return split_results



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Predict-a-single-vector-with-the-new-spark-ml-API-to-avoid-groupByKey-after-a-flatMap-tp27932.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org