You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sen Fang (JIRA)" <ji...@apache.org> on 2015/06/18 18:01:00 UTC

[jira] [Created] (SPARK-8443) GenerateMutableProjection Exceeds JVM Code Size Limits

Sen Fang created SPARK-8443:
-------------------------------

             Summary: GenerateMutableProjection Exceeds JVM Code Size Limits
                 Key: SPARK-8443
                 URL: https://issues.apache.org/jira/browse/SPARK-8443
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.4.0
            Reporter: Sen Fang


GenerateMutableProjection put all expressions columns into a single apply function. When there are a lot of columns, the apply function code size exceeds the 64kb limit, which is a hard limit on jvm that cannot change.

This comes up when we were aggregating about 100 columns using codegen and unsafe feature.

I wrote an unit test that reproduces this issue. 
https://github.com/saurfang/spark/blob/codegen_size_limit/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala

This test currently fails at 2048 expressions. It seems the master is more tolerant than branch-1.4 about this because code is more concise.

While the code on master has changed since branch-1.4, I am able to reproduce the problem in master. For now I hacked my way in branch-1.4 to workaround this problem by wrapping each expression with a separate function then call those functions sequentially in apply. The proper way is probably check the length of the projectCode and break it up as necessary. (This seems to be easier in master actually since we are building code by string rather than quasiquote)

Let me know if anyone has additional thoughts on this, I'm happy to contribute a pull request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org