You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by rprabhu <rp...@ufl.edu> on 2014/11/12 16:20:36 UTC

Getting py4j.protocol.Py4JError: An error occurred while calling o39.predict. while doing batch prediction using decision trees

Hello,
I'm trying to run a classification task using mllib decision trees. After
successfully training the model, I was trying to test the model using some
sample rows when I hit this exception.

The code snippet that caused this error is :
model = DecisionTree.trainClassifier(parsedData, numClasses=2,
categoricalFeaturesInfo={0:3},
                                     impurity='gini', maxDepth=30,
maxBins=100)

predictions = model.predict(parsedData.map(lambda x: x.features))

which is pretty much like the example given on the website. 

I'm giving all the details that I think will help here (some of them might
not be totally useful). Please let me know if you need additional details.

Programming Language: Python
Platform: Linux (Ubuntu 14.04)
Dataset: A part of the KDD 1999 dataset with 19 attributes and 450K rows.
mllib version : The latest master. (Using master because of the issue
reported here
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLLIB-Decision-Tree-ArrayIndexOutOfBounds-Exception-td16907.html) 

 

Stack Trace
-------------
Traceback (most recent call last):
  File "/home/rprabhu/Coding/github/SDNDDoS/classification/DecisionTree.py",
line 49, in <module>
    predictions = model.predict(parsedData.map(lambda x: x.features))
  File "/home/rprabhu/Software/spark/python/pyspark/mllib/tree.py", line 42,
in predict
    return self.call("predict", x.map(_convert_to_vector))
  File "/home/rprabhu/Software/spark/python/pyspark/mllib/common.py", line
140, in call
    return callJavaFunc(self._sc, getattr(self._java_model, name), *a)
  File "/home/rprabhu/Software/spark/python/pyspark/mllib/common.py", line
117, in callJavaFunc
    return _java2py(sc, func(*args))
  File
"/home/rprabhu/Software/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
  File
"/home/rprabhu/Software/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 304, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o39.predict. Trace:
py4j.Py4JException: Method predict([class
org.apache.spark.api.java.JavaRDD]) does not exist
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
	at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
	at py4j.Gateway.invoke(Gateway.java:252)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:207)
	at java.lang.Thread.run(Thread.java:745)

Note: I am not hitting this issue when I try to predict with just one row.
predictions = model.predict(row)

Can anyone let me know what is going wrong here?

Thanks,
Rahul



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Getting-py4j-protocol-Py4JError-An-error-occurred-while-calling-o39-predict-while-doing-batch-predics-tp18730.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Getting py4j.protocol.Py4JError: An error occurred while calling o39.predict. while doing batch prediction using decision trees

Posted by rprabhu <rp...@ufl.edu>.
Hey

Thanks for responding so fast.
I ran the code with the fix and it works great.

Regards,
Rahul



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Getting-py4j-protocol-Py4JError-An-error-occurred-while-calling-o39-predict-while-doing-batch-predics-tp18730p18788.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Getting py4j.protocol.Py4JError: An error occurred while calling o39.predict. while doing batch prediction using decision trees

Posted by Davies Liu <da...@databricks.com>.
This is a bug, will be fixed by https://github.com/apache/spark/pull/3230

On Wed, Nov 12, 2014 at 7:20 AM, rprabhu <rp...@ufl.edu> wrote:
> Hello,
> I'm trying to run a classification task using mllib decision trees. After
> successfully training the model, I was trying to test the model using some
> sample rows when I hit this exception.
>
> The code snippet that caused this error is :
> model = DecisionTree.trainClassifier(parsedData, numClasses=2,
> categoricalFeaturesInfo={0:3},
>                                      impurity='gini', maxDepth=30,
> maxBins=100)
>
> predictions = model.predict(parsedData.map(lambda x: x.features))
>
> which is pretty much like the example given on the website.
>
> I'm giving all the details that I think will help here (some of them might
> not be totally useful). Please let me know if you need additional details.
>
> Programming Language: Python
> Platform: Linux (Ubuntu 14.04)
> Dataset: A part of the KDD 1999 dataset with 19 attributes and 450K rows.
> mllib version : The latest master. (Using master because of the issue
> reported here
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-MLLIB-Decision-Tree-ArrayIndexOutOfBounds-Exception-td16907.html)
>
>
>
> Stack Trace
> -------------
> Traceback (most recent call last):
>   File "/home/rprabhu/Coding/github/SDNDDoS/classification/DecisionTree.py",
> line 49, in <module>
>     predictions = model.predict(parsedData.map(lambda x: x.features))
>   File "/home/rprabhu/Software/spark/python/pyspark/mllib/tree.py", line 42,
> in predict
>     return self.call("predict", x.map(_convert_to_vector))
>   File "/home/rprabhu/Software/spark/python/pyspark/mllib/common.py", line
> 140, in call
>     return callJavaFunc(self._sc, getattr(self._java_model, name), *a)
>   File "/home/rprabhu/Software/spark/python/pyspark/mllib/common.py", line
> 117, in callJavaFunc
>     return _java2py(sc, func(*args))
>   File
> "/home/rprabhu/Software/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
>   File
> "/home/rprabhu/Software/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
> line 304, in get_return_value
> py4j.protocol.Py4JError: An error occurred while calling o39.predict. Trace:
> py4j.Py4JException: Method predict([class
> org.apache.spark.api.java.JavaRDD]) does not exist
>         at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
>         at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
>         at py4j.Gateway.invoke(Gateway.java:252)
>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>         at py4j.GatewayConnection.run(GatewayConnection.java:207)
>         at java.lang.Thread.run(Thread.java:745)
>
> Note: I am not hitting this issue when I try to predict with just one row.
> predictions = model.predict(row)
>
> Can anyone let me know what is going wrong here?
>
> Thanks,
> Rahul
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Getting-py4j-protocol-Py4JError-An-error-occurred-while-calling-o39-predict-while-doing-batch-predics-tp18730.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org