You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Jacek Laskowski <ja...@japila.pl> on 2017/12/10 21:16:47 UTC

GenerateExec, CodegenSupport and supportCodegen flag off?!

Hi,

I'm wondering why a physical operator like GenerateExec would
extend CodegenSupport [1], but had the supportCodegen flag turned off?

What's the meaning of such a combination -- be a CodegenSupport with
supportCodegen off?

[1]
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L58-L64

[2]
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L125

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

Re: GenerateExec, CodegenSupport and supportCodegen flag off?!

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

It appears that there's already a discussion about why GenerateExec
operator has the flag off.

1. https://issues.apache.org/jira/browse/SPARK-21657 Spark has exponential
time complexity to explode(array of structs) which is in progress
2. And more importantly @rxin has turned that off because --> "Disable
generate codegen since it fails my workload." - Wished he included the
workload to showcase the issue :(

Looks like there are a bunch of wise people already on it so I'll just
listen...

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

On Mon, Dec 11, 2017 at 10:15 PM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> After another day trying to get my head around WholeStageCodegenExec
> and InputAdapter and CollapseCodegenStages optimization rule I came to
> conclusion that it may have something to do with UnsafeRow vs
> GenericInternalRow/InternalRow so when a physical operator wants to
> _somehow_ participate in whole-stage codegen it can extend CodegenSupport
> trait and enable accessing GenericInternalRow by turning supportCodegen
> flag off.
>
> I can understand how badly that can read, but without help from Spark SQL
> devs that's all I can figure out myself. Any help appreciated.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> On Sun, Dec 10, 2017 at 10:34 PM, Stephen Boesch <ja...@gmail.com>
> wrote:
>
>> A relevant observation:  there was a closed/executed jira last year to
>> remove the option to disable the codegen flag (and unsafe flag as well):
>> https://issues.apache.org/jira/browse/SPARK-11644
>>
>> 2017-12-10 13:16 GMT-08:00 Jacek Laskowski <ja...@japila.pl>:
>>
>>> Hi,
>>>
>>> I'm wondering why a physical operator like GenerateExec would
>>> extend CodegenSupport [1], but had the supportCodegen flag turned off?
>>>
>>> What's the meaning of such a combination -- be a CodegenSupport with
>>> supportCodegen off?
>>>
>>> [1] https://github.com/apache/spark/blob/master/sql/core/src
>>> /main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L58-L64
>>>
>>> [2] https://github.com/apache/spark/blob/master/sql/core/src
>>> /main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L125
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://about.me/JacekLaskowski
>>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>>> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>
>>
>

Re: GenerateExec, CodegenSupport and supportCodegen flag off?!

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

After another day trying to get my head around WholeStageCodegenExec
and InputAdapter and CollapseCodegenStages optimization rule I came to
conclusion that it may have something to do with UnsafeRow vs
GenericInternalRow/InternalRow so when a physical operator wants to
_somehow_ participate in whole-stage codegen it can extend CodegenSupport
trait and enable accessing GenericInternalRow by turning supportCodegen
flag off.

I can understand how badly that can read, but without help from Spark SQL
devs that's all I can figure out myself. Any help appreciated.

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

On Sun, Dec 10, 2017 at 10:34 PM, Stephen Boesch <ja...@gmail.com> wrote:

> A relevant observation:  there was a closed/executed jira last year to
> remove the option to disable the codegen flag (and unsafe flag as well):
> https://issues.apache.org/jira/browse/SPARK-11644
>
> 2017-12-10 13:16 GMT-08:00 Jacek Laskowski <ja...@japila.pl>:
>
>> Hi,
>>
>> I'm wondering why a physical operator like GenerateExec would
>> extend CodegenSupport [1], but had the supportCodegen flag turned off?
>>
>> What's the meaning of such a combination -- be a CodegenSupport with
>> supportCodegen off?
>>
>> [1] https://github.com/apache/spark/blob/master/sql/core/src
>> /main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L58-L64
>>
>> [2] https://github.com/apache/spark/blob/master/sql/core/src
>> /main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L125
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://about.me/JacekLaskowski
>> Spark Structured Streaming https://bit.ly/spark-structured-streaming
>> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>
>

Re: GenerateExec, CodegenSupport and supportCodegen flag off?!

Posted by Stephen Boesch <ja...@gmail.com>.
A relevant observation:  there was a closed/executed jira last year to
remove the option to disable the codegen flag (and unsafe flag as well):
https://issues.apache.org/jira/browse/SPARK-11644

2017-12-10 13:16 GMT-08:00 Jacek Laskowski <ja...@japila.pl>:

> Hi,
>
> I'm wondering why a physical operator like GenerateExec would
> extend CodegenSupport [1], but had the supportCodegen flag turned off?
>
> What's the meaning of such a combination -- be a CodegenSupport with
> supportCodegen off?
>
> [1] https://github.com/apache/spark/blob/master/sql/core/
> src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L58-L64
>
> [2] https://github.com/apache/spark/blob/master/sql/core/
> src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala#L125
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>