You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by hishamm <hi...@unige.ch> on 2015/10/01 16:20:16 UTC

Decision Tree Model

Hi,

I am using SPARK 1.4.0, Python and Decision Trees to perform machine
learning classification. 

I test it by creating the predictions and zip it to the test data, as
following: 


*predictions = tree_model.predict(test_data.map(lambda a: a.features))
labels = test_data.map(lambda a: a.label).zip(predictions)
correct = 100 * (labels.filter(lambda (v, p): v == p).count() /
float(test_data.count()))*

I always get this error in the zipping phase:

*Can not deserialize RDD with different number of items in pair: (3, 2)*


To avoid zipping, I tried to do it in a different way, as follows:

*labels = test_data.map(lambda a: (a.label, tree_model.predict(a.features)))
correct = 100 * (labels.filter(lambda (v, p): v == p).count() /
float(test_data.count()))*

However, I always get this error:

*in __getnewargs__(self)
    250         # This method is called when attempting to pickle
SparkContext, which is always an error:
    251         raise Exception(
--> 252             "It appears that you are attempting to reference
SparkContext from a broadcast "
    253             "variable, action, or transforamtion. SparkContext can
only be used on the driver, "
    254             "not in code that it run on workers. For more
information, see SPARK-5063."

Exception: It appears that you are attempting to reference SparkContext from
a broadcast variable, action, or transforamtion. SparkContext can only be
used on the driver, not in code that it run on workers. For more
information, see SPARK-5063.*


Is the DecisionTreeModel part of the SparkContext ?!  
I found that using Scala, we can apply the second approach with no problem. 


So, how can I solve the two problems ?

Thanks and Regards,
Hisham












--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Decision-Tree-Model-tp24899.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org