You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marco Veluscek (JIRA)" <ji...@apache.org> on 2017/09/21 13:16:01 UTC
[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

    [ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174727#comment-16174727 ] 

Marco Veluscek commented on SPARK-16845:
----------------------------------------

Hello, 
I have just encountered a similar issue when doing _except_ on two large dataframes.
My code executed with Spark 2.1.0 fails with an exception. The same code with Spark 2.2.0 works, but logs several exceptions. 
Since, I have to work with 2.1.0 because of company policies, I would like to know whether there is a way to fix or to work around this issue in 2.1.0?

Here are more details about the problem.
On my company cluster, I am working with Spark version 2.1.0.cloudera1 using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112).

The two dataframes have about 1 million rows and 467 columns.
When I do the _except_ {{dataframe1.except(dataframe2)}} I get the following exception:
{code:title=Exception_with_2.1.0}
scheduler.TaskSetManager: Lost task 10.0 in stage 80.0 (TID 4146, cdhworker05.itec.lab, executor 4): java.util.concurrent.ExecutionException: java.lang.Exception: failed to compile: org.co
dehaus.janino.JaninoRuntimeException: Code of method "eval(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" grows beyond 64 KB
{code}

Then the logs show the generated code for the class {{SpecificPredicate}} which has more than 5000 rows.

I wrote a small script to reproduce the error:
{code:title=testExcept.scala}
import org.apache.spark.sql.functions._

import spark.implicits._

import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{DoubleType, StructField, StructType, IntegerType}

import scala.util.Random

def start(rows: Int, cols: Int, col: String, spark: SparkSession) = {

     val data = (1 to rows).map(_ => Seq.fill(cols)(1))

     val colNames = (1 to cols).mkString(",")
     val sch = StructType(colNames.split(",").map(fieldName => StructField(fieldName, IntegerType, true)))

     val rdd = spark.sparkContext.parallelize(data.map(x => Row(x:_*)))
     spark.sqlContext.createDataFrame(rdd, sch)
}

val dataframe1 = start(1000, 500, "column", spark)
val dataframe2 = start(1000, 500, "column", spark)

val res = dataframe1.except(dataframe2)

res.count()
{code}

I have also tried with a local Spark installation, version 2.2.0 using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131).
With this Spark version, the code does not fail but it logs several exceptions all saying the below:
{code:title=Exception_with_2.2.0}
17/09/21 12:42:26 ERROR CodeGenerator: failed to compile: org.codehaus.janino.JaninoRuntimeException: Code of method "eval(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" grows beyond 64 KB
{code}
Then the same generated code is logged.

In addition, this line is also logged several times:
{code}
17/09/21 12:46:20 WARN SortMergeJoinExec: Codegen disabled for this expression: (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((...
{code}

Since I have to work with Spark 2.1.0, is there a way to work around this problem? Maybe disabling the code gen?

Thank you for your help.


> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-16845
>                 URL: https://issues.apache.org/jira/browse/SPARK-16845
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: hejie
>            Assignee: Liwei Lin
>             Fix For: 1.6.4, 2.0.3, 2.1.1, 2.2.0
>
>         Attachments: error.txt.zip
>
>
> I have a wide table(400 columns), when I try fitting the traindata on all columns,  the fatal error occurs. 
> 	... 46 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB
> 	at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941)
> 	at org.codehaus.janino.CodeContext.write(CodeContext.java:854)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org