You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by hishamm <hi...@unige.ch> on 2015/10/01 16:20:16 UTC
Decision Tree Model
Hi,
I am using SPARK 1.4.0, Python and Decision Trees to perform machine
learning classification.
I test it by creating the predictions and zip it to the test data, as
following:
*predictions = tree_model.predict(test_data.map(lambda a: a.features))
labels = test_data.map(lambda a: a.label).zip(predictions)
correct = 100 * (labels.filter(lambda (v, p): v == p).count() /
float(test_data.count()))*
I always get this error in the zipping phase:
*Can not deserialize RDD with different number of items in pair: (3, 2)*
To avoid zipping, I tried to do it in a different way, as follows:
*labels = test_data.map(lambda a: (a.label, tree_model.predict(a.features)))
correct = 100 * (labels.filter(lambda (v, p): v == p).count() /
float(test_data.count()))*
However, I always get this error:
*in __getnewargs__(self)
250 # This method is called when attempting to pickle
SparkContext, which is always an error:
251 raise Exception(
--> 252 "It appears that you are attempting to reference
SparkContext from a broadcast "
253 "variable, action, or transforamtion. SparkContext can
only be used on the driver, "
254 "not in code that it run on workers. For more
information, see SPARK-5063."
Exception: It appears that you are attempting to reference SparkContext from
a broadcast variable, action, or transforamtion. SparkContext can only be
used on the driver, not in code that it run on workers. For more
information, see SPARK-5063.*
Is the DecisionTreeModel part of the SparkContext ?!
I found that using Scala, we can apply the second approach with no problem.
So, how can I solve the two problems ?
Thanks and Regards,
Hisham
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Decision-Tree-Model-tp24899.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org