You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Lian Jiang <ji...@gmail.com> on 2019/10/07 16:55:48 UTC
pandas_udf throws "Unsupported class file major version 56"
Hi,
from pyspark.sql.functions import pandas_udf, PandasUDFType
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(
[(1, True, 1.0, 'aa'), (1, False, 2.0, 'aa'), (2, True, 3.0, 'aa'), (2,
True, 5.0, 'aa'), (2, True, 10.0, 'aa')],
("id", "is_deleted", "v", "a"))
@pandas_udf("id long, is_deleted boolean, v double, a string",
PandasUDFType.GROUPED_MAP)
def subtract_mean(pdf):
v = pdf.v
return pdf.assign(v=v - v.mean())
df.groupby("id").apply(subtract_mean).show() ----> works fine.
@pandas_udf('double', PandasUDFType.SCALAR)
def pandas_plus_one(v):
return v + 1
df.withColumn('v2', pandas_plus_one(df.v)).show() --->* throw error*
Error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File
"/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/pyspark/sql/dataframe.py",
line 380, in show
print(self._jdf.showString(n, 20, vertical))
File
"/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/py4j/java_gateway.py",
line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File
"/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/pyspark/sql/utils.py",
line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
IllegalArgumentException: u'Unsupported class file major version 56'
Any idea? Appreciate any help!
https://stackoverflow.com/questions/53583199/pyspark-error-unsupported-class-file-major-version-55
does not help since I have already been using java 1.8.
I am using:
pyspark 2.4.4
pandas 0.23.4
numpy 1.14.5
pyarrow 0.10.0
Note that I got numpy, pyarrow, pandas versions from
https://stackoverflow.com/questions/51713705/python-pandas-udf-spark-error
which help to make UDAF subtract_mean work.
Re: pandas_udf throws "Unsupported class file major version 56"
Posted by Lian Jiang <ji...@gmail.com>.
I figured out. Thanks.
On Mon, Oct 7, 2019 at 9:55 AM Lian Jiang <ji...@gmail.com> wrote:
> Hi,
>
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> import pyspark
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.getOrCreate()
>
> df = spark.createDataFrame(
> [(1, True, 1.0, 'aa'), (1, False, 2.0, 'aa'), (2, True, 3.0, 'aa'),
> (2, True, 5.0, 'aa'), (2, True, 10.0, 'aa')],
> ("id", "is_deleted", "v", "a"))
>
>
> @pandas_udf("id long, is_deleted boolean, v double, a string",
> PandasUDFType.GROUPED_MAP)
> def subtract_mean(pdf):
> v = pdf.v
> return pdf.assign(v=v - v.mean())
>
> df.groupby("id").apply(subtract_mean).show() ----> works fine.
>
>
>
> @pandas_udf('double', PandasUDFType.SCALAR)
> def pandas_plus_one(v):
> return v + 1
>
> df.withColumn('v2', pandas_plus_one(df.v)).show() --->* throw error*
>
>
> Error:
> Traceback (most recent call last):
> File "<input>", line 1, in <module>
> File
> "/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/pyspark/sql/dataframe.py",
> line 380, in show
> print(self._jdf.showString(n, 20, vertical))
> File
> "/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/py4j/java_gateway.py",
> line 1257, in __call__
> answer, self.gateway_client, self.target_id, self.name)
> File
> "/Users/lianj/repo/historical-data-storage-clean/lib/python2.7/site-packages/pyspark/sql/utils.py",
> line 79, in deco
> raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
> IllegalArgumentException: u'Unsupported class file major version 56'
>
>
> Any idea? Appreciate any help!
>
>
>
>
> https://stackoverflow.com/questions/53583199/pyspark-error-unsupported-class-file-major-version-55
> does not help since I have already been using java 1.8.
>
> I am using:
> pyspark 2.4.4
> pandas 0.23.4
> numpy 1.14.5
> pyarrow 0.10.0
> Note that I got numpy, pyarrow, pandas versions from
> https://stackoverflow.com/questions/51713705/python-pandas-udf-spark-error
> which help to make UDAF subtract_mean work.
>
>
>
>
>
>
>
>
>