You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Dirk Nachbar <di...@gmail.com> on 2015/07/28 10:04:04 UTC

pyspark/py4j tree error

I am using pyspark and I want to test the sql function. I get this Java
tree error. Any ideas.

iwaggDF.registerTempTable('iwagg')
hierDF.registerTempTable('hier')

res3=sqlc.sql('select name, sum(amount) as amount from iwagg a left
join hier b on a.segm=b.segm group by name order by sum(amount)
desc').collect()
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/pyspark/sql/dataframe.py",
line 281, in collect
    port = self._sc._jvm.PythonRDD.collectAndServe(self._jdf.javaToPython().rdd())
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/py4j/java_gateway.py",
line 538, in __call__
    self.target_id, self.name)
  File "/home/dirk/spark-1.4.1-bin-hadoop2.6/python/py4j/protocol.py",
line 300, in get_return_value
    format(target_id, '.', name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o75.javaToPython.
: org.apache.spark.sql.catalyst.errors.package$*TreeNodeException*: sort, tree:
Sort [SUM(amount#326) DESC], true
 Exchange (RangePartitioning 200)
  Aggregate false, [name#6], [name#6,CombineSum(PartialSum#328) AS amount#326]
   Exchange (HashPartitioning 200)
    Aggregate true, [name#6], [name#6,SUM(CAST(amount#3, DoubleType))
AS PartialSum#328]
     Project [name#6,amount#3]
      HashOuterJoin [segm#1], [segm#5], LeftOuter, None
       Exchange (HashPartitioning 200)
        Project [amount#3,segm#1]
         PhysicalRDD [cust#0,segm#1,wsd#2,amount#3,trips#4],
MapPartitionsRDD[7] at applySchemaToPythonRDD at
NativeMethodAccessorImpl.java:-2
       Exchange (HashPartitioning 200)
        PhysicalRDD [segm#5,name#6], MapPartitionsRDD[15] at
applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2

Dirk