You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Dirk Nachbar <di...@gmail.com> on 2015/07/28 10:04:04 UTC
pyspark/py4j tree error
I am using pyspark and I want to test the sql function. I get this Java
tree error. Any ideas.
iwaggDF.registerTempTable('iwagg')
hierDF.registerTempTable('hier')
res3=sqlc.sql('select name, sum(amount) as amount from iwagg a left
join hier b on a.segm=b.segm group by name order by sum(amount)
desc').collect()
File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/sql/dataframe.py",
line 281, in collect
port = self._sc._jvm.PythonRDD.collectAndServe(self._jdf.javaToPython().rdd())
File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/py4j/java_gateway.py",
line 538, in __call__
self.target_id, self.name)
File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/py4j/protocol.py",
line 300, in get_return_value
format(target_id, '.', name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o75.javaToPython.
: org.apache.spark.sql.catalyst.errors.package$*TreeNodeException*: sort, tree:
Sort [SUM(amount#326) DESC], true
Exchange (RangePartitioning 200)
Aggregate false, [name#6], [name#6,CombineSum(PartialSum#328) AS amount#326]
Exchange (HashPartitioning 200)
Aggregate true, [name#6], [name#6,SUM(CAST(amount#3, DoubleType))
AS PartialSum#328]
Project [name#6,amount#3]
HashOuterJoin [segm#1], [segm#5], LeftOuter, None
Exchange (HashPartitioning 200)
Project [amount#3,segm#1]
PhysicalRDD [cust#0,segm#1,wsd#2,amount#3,trips#4],
MapPartitionsRDD[7] at applySchemaToPythonRDD at
NativeMethodAccessorImpl.java:-2
Exchange (HashPartitioning 200)
PhysicalRDD [segm#5,name#6], MapPartitionsRDD[15] at
applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2
Dirk