You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ben (JIRA)" <ji...@apache.org> on 2017/10/16 09:11:00 UTC

[jira] [Comment Edited] (SPARK-22284) Code of class \"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\" grows beyond 64 KB

    [ https://issues.apache.org/jira/browse/SPARK-22284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205601#comment-16205601 ] 

Ben edited comment on SPARK-22284 at 10/16/17 9:10 AM:
-------------------------------------------------------

If I upgrade to 2.2.0 then I, for myself at least, would not need a backport anymore, although I would be happy to let you know if it works in 2.2.0 IF I could try it.
But the problem is exactly the fact that I cannot simply upgrade, at least for now, and wanted to ask if there is any general recommendation or workaround that I could try first, a configuration parameter or else.
I don't know what more details I can provide here to investigate. If there is, let me know.


was (Author: someonehere15):
If I upgrade to 2.2.0 then I, for myself at least, would not need a backport anymore, although I would be happy to let you know if it works in 2.2.0 IF I could try it.
But the problem is exactly the fact that I cannot simply upgrade, at least for now, and wanted to ask if there is any general recommendation or workaround that I could try first, a configuration parameter or else.

> Code of class \"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\" grows beyond 64 KB
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-22284
>                 URL: https://issues.apache.org/jira/browse/SPARK-22284
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer, PySpark, SQL
>    Affects Versions: 2.1.0
>            Reporter: Ben
>
> I am using pySpark 2.1.0 in a production environment, and trying to join two DataFrames, one of which is very large and has complex nested structures.
> Basically, I load both DataFrames and cache them.
> Then, in the large DataFrame, I extract 3 nested values and save them as direct columns.
> Finally, I join on these three columns with the smaller DataFrame.
> This would be a short code for this:
> {code}
> dataFrame.read......cache()
> dataFrameSmall.read.......cache()
> dataFrame = dataFrame.selectExpr(['*','nested.Value1 AS Value1','nested.Value2 AS Value2','nested.Value3 AS Value3'])
> dataFrame = dataFrame.dropDuplicates().join(dataFrameSmall, ['Value1','Value2',Value3'])
> dataFrame.count()
> {code}
> And this is the error I get when it gets to the count():
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 11 in stage 7.0 failed 4 times, most recent failure: Lost task 11.3 in stage 7.0 (TID 11234, somehost.com, executor 10): java.util.concurrent.ExecutionException: java.lang.Exception: failed to compile: org.codehaus.janino.JaninoRuntimeException: Code of method \"apply_1$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V\" of class \"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection\" grows beyond 64 KB
> {code}
> I have seen many tickets with similar issues here, but no proper solution. Most of the fixes are until Spark 2.1.0 so I don't know if running it on Spark 2.2.0 would fix it. In any case I cannot change the version of Spark since it is in production.
> I have also tried setting spark.sql.codegen.wholeStage=false but still the same error.
> The job worked well up to now, also with large datasets, but apparently this batch got larger, and that is the only thing that changed. Is there any workaround for this?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org