You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Olivier Girardot <o....@lateral-thoughts.com> on 2017/06/15 13:04:26 UTC

Nested "struct" fonction call creates a compilation error in Spark SQL

Hi everyone,
when we create recursive calls to "struct" (up to 5 levels) for extending a
complex datastructure we end up with the following compilation error :

org.codehaus.janino.JaninoRuntimeException: Code of method
"(I[Lscala/collection/Iterator;)V" of class
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator"
grows beyond 64 KB

The CreateStruct code itself is properly using the ctx.splitExpression
command but the "end result" of the df.select( struct(struct(struct(....)
))) ends up being too much.

Should I open a JIRA or is there a workaround ?

Regards,

-- 
*Olivier Girardot* | Associé
o.girardot@lateral-thoughts.com

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

Posted by Michael Armbrust <mi...@databricks.com>.

You might also try with a newer version.  Several instance of code
generation failures have been fixed since 2.0.

On Thu, Jun 15, 2017 at 1:15 PM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> Hi Michael,
> Spark 2.0.2 - but I have a very interesting test case actually
> The optimiser seems to be at fault in a way, I've joined to this email the
> explain when I limit myself to 2 levels of struct mutation and when it goes
> to 5.
> As you can see the optimiser seems to be doing a lot more in the later
> case.
> After further investigation, the code is not "failing" per se - spark is
> trying the whole stage codegen, the compilation is failing due to the
> compilation error and I think it's falling back to the "non codegen" way.
>
> I'll try to create a simpler test case to reproduce this if I can, what do
> you think ?
>
> Regards,
>
> Olivier.
>
>
> 2017-06-15 21:08 GMT+02:00 Michael Armbrust <mi...@databricks.com>:
>
>> Which version of Spark?  If its recent I'd open a JIRA.
>>
>> On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot <
>> o.girardot@lateral-thoughts.com> wrote:
>>
>>> Hi everyone,
>>> when we create recursive calls to "struct" (up to 5 levels) for
>>> extending a complex datastructure we end up with the following compilation
>>> error :
>>>
>>> org.codehaus.janino.JaninoRuntimeException: Code of method
>>> "(I[Lscala/collection/Iterator;)V" of class
>>> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator"
>>> grows beyond 64 KB
>>>
>>> The CreateStruct code itself is properly using the ctx.splitExpression
>>> command but the "end result" of the df.select( struct(struct(struct(....)
>>> ))) ends up being too much.
>>>
>>> Should I open a JIRA or is there a workaround ?
>>>
>>> Regards,
>>>
>>> --
>>> *Olivier Girardot* | Associé
>>> o.girardot@lateral-thoughts.com
>>>
>>
>>
>
>
> --
> *Olivier Girardot* | Associé
> o.girardot@lateral-thoughts.com
> +33 6 24 09 17 94
>

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

Posted by Michael Armbrust <mi...@databricks.com>.

You might also try with a newer version.  Several instance of code
generation failures have been fixed since 2.0.

On Thu, Jun 15, 2017 at 1:15 PM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> Hi Michael,
> Spark 2.0.2 - but I have a very interesting test case actually
> The optimiser seems to be at fault in a way, I've joined to this email the
> explain when I limit myself to 2 levels of struct mutation and when it goes
> to 5.
> As you can see the optimiser seems to be doing a lot more in the later
> case.
> After further investigation, the code is not "failing" per se - spark is
> trying the whole stage codegen, the compilation is failing due to the
> compilation error and I think it's falling back to the "non codegen" way.
>
> I'll try to create a simpler test case to reproduce this if I can, what do
> you think ?
>
> Regards,
>
> Olivier.
>
>
> 2017-06-15 21:08 GMT+02:00 Michael Armbrust <mi...@databricks.com>:
>
>> Which version of Spark?  If its recent I'd open a JIRA.
>>
>> On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot <
>> o.girardot@lateral-thoughts.com> wrote:
>>
>>> Hi everyone,
>>> when we create recursive calls to "struct" (up to 5 levels) for
>>> extending a complex datastructure we end up with the following compilation
>>> error :
>>>
>>> org.codehaus.janino.JaninoRuntimeException: Code of method
>>> "(I[Lscala/collection/Iterator;)V" of class
>>> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator"
>>> grows beyond 64 KB
>>>
>>> The CreateStruct code itself is properly using the ctx.splitExpression
>>> command but the "end result" of the df.select( struct(struct(struct(....)
>>> ))) ends up being too much.
>>>
>>> Should I open a JIRA or is there a workaround ?
>>>
>>> Regards,
>>>
>>> --
>>> *Olivier Girardot* | Associé
>>> o.girardot@lateral-thoughts.com
>>>
>>
>>
>
>
> --
> *Olivier Girardot* | Associé
> o.girardot@lateral-thoughts.com
> +33 6 24 09 17 94
>

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

Posted by Olivier Girardot <o....@lateral-thoughts.com>.

Hi Michael,
Spark 2.0.2 - but I have a very interesting test case actually
The optimiser seems to be at fault in a way, I've joined to this email the
explain when I limit myself to 2 levels of struct mutation and when it goes
to 5.
As you can see the optimiser seems to be doing a lot more in the later case.
After further investigation, the code is not "failing" per se - spark is
trying the whole stage codegen, the compilation is failing due to the
compilation error and I think it's falling back to the "non codegen" way.

I'll try to create a simpler test case to reproduce this if I can, what do
you think ?

Regards,

Olivier.

2017-06-15 21:08 GMT+02:00 Michael Armbrust <mi...@databricks.com>:

> Which version of Spark?  If its recent I'd open a JIRA.
>
> On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot <
> o.girardot@lateral-thoughts.com> wrote:
>
>> Hi everyone,
>> when we create recursive calls to "struct" (up to 5 levels) for extending
>> a complex datastructure we end up with the following compilation error :
>>
>> org.codehaus.janino.JaninoRuntimeException: Code of method
>> "(I[Lscala/collection/Iterator;)V" of class
>> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator"
>> grows beyond 64 KB
>>
>> The CreateStruct code itself is properly using the ctx.splitExpression
>> command but the "end result" of the df.select( struct(struct(struct(....)
>> ))) ends up being too much.
>>
>> Should I open a JIRA or is there a workaround ?
>>
>> Regards,
>>
>> --
>> *Olivier Girardot* | Associé
>> o.girardot@lateral-thoughts.com
>>
>
>

-- 
*Olivier Girardot* | Associé
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

Posted by Olivier Girardot <o....@lateral-thoughts.com>.

Hi Michael,
Spark 2.0.2 - but I have a very interesting test case actually
The optimiser seems to be at fault in a way, I've joined to this email the
explain when I limit myself to 2 levels of struct mutation and when it goes
to 5.
As you can see the optimiser seems to be doing a lot more in the later case.
After further investigation, the code is not "failing" per se - spark is
trying the whole stage codegen, the compilation is failing due to the
compilation error and I think it's falling back to the "non codegen" way.

I'll try to create a simpler test case to reproduce this if I can, what do
you think ?

Regards,

Olivier.

2017-06-15 21:08 GMT+02:00 Michael Armbrust <mi...@databricks.com>:

> Which version of Spark?  If its recent I'd open a JIRA.
>
> On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot <
> o.girardot@lateral-thoughts.com> wrote:
>
>> Hi everyone,
>> when we create recursive calls to "struct" (up to 5 levels) for extending
>> a complex datastructure we end up with the following compilation error :
>>
>> org.codehaus.janino.JaninoRuntimeException: Code of method
>> "(I[Lscala/collection/Iterator;)V" of class
>> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator"
>> grows beyond 64 KB
>>
>> The CreateStruct code itself is properly using the ctx.splitExpression
>> command but the "end result" of the df.select( struct(struct(struct(....)
>> ))) ends up being too much.
>>
>> Should I open a JIRA or is there a workaround ?
>>
>> Regards,
>>
>> --
>> *Olivier Girardot* | Associé
>> o.girardot@lateral-thoughts.com
>>
>
>

-- 
*Olivier Girardot* | Associé
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

Posted by Michael Armbrust <mi...@databricks.com>.

Which version of Spark?  If its recent I'd open a JIRA.

On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> Hi everyone,
> when we create recursive calls to "struct" (up to 5 levels) for extending
> a complex datastructure we end up with the following compilation error :
>
> org.codehaus.janino.JaninoRuntimeException: Code of method
> "(I[Lscala/collection/Iterator;)V" of class "org.apache.spark.sql.
> catalyst.expressions.GeneratedClass$GeneratedIterator" grows beyond 64 KB
>
> The CreateStruct code itself is properly using the ctx.splitExpression
> command but the "end result" of the df.select( struct(struct(struct(....)
> ))) ends up being too much.
>
> Should I open a JIRA or is there a workaround ?
>
> Regards,
>
> --
> *Olivier Girardot* | Associé
> o.girardot@lateral-thoughts.com
>

Re: Nested "struct" fonction call creates a compilation error in Spark SQL

Posted by Michael Armbrust <mi...@databricks.com>.

Which version of Spark?  If its recent I'd open a JIRA.

On Thu, Jun 15, 2017 at 6:04 AM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> Hi everyone,
> when we create recursive calls to "struct" (up to 5 levels) for extending
> a complex datastructure we end up with the following compilation error :
>
> org.codehaus.janino.JaninoRuntimeException: Code of method
> "(I[Lscala/collection/Iterator;)V" of class "org.apache.spark.sql.
> catalyst.expressions.GeneratedClass$GeneratedIterator" grows beyond 64 KB
>
> The CreateStruct code itself is properly using the ctx.splitExpression
> command but the "end result" of the df.select( struct(struct(struct(....)
> ))) ends up being too much.
>
> Should I open a JIRA or is there a workaround ?
>
> Regards,
>
> --
> *Olivier Girardot* | Associé
> o.girardot@lateral-thoughts.com
>