You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Koert Kuipers <ko...@tresata.com> on 2017/04/04 20:10:55 UTC

how do i force unit test to do whole stage codegen

i wrote my own expression with eval and doGenCode, but doGenCode never gets
called in tests.

also as a test i ran this in a unit test:
spark.range(10).select('id as 'asId).where('id === 4).explain
according to
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html
this is supposed to show:

== Physical Plan ==WholeStageCodegen
:  +- Project [id#0L AS asId#3L]
:     +- Filter (id#0L = 4)
:        +- Range 0, 1, 8, 10, [id#0L]

but it doesn't. instead it shows:

== Physical Plan ==
*Project [id#12L AS asId#15L]
+- *Filter (id#12L = 4)
   +- *Range (0, 10, step=1, splits=Some(4))

so i am again missing the WholeStageCodegen. any idea why?

i create spark session for unit tests simply as:
val session = SparkSession.builder
  .master("local[*]")
  .appName("test")
  .config("spark.sql.shuffle.partitions", 4)
  .getOrCreate()

Re: how do i force unit test to do whole stage codegen

Posted by Koert Kuipers <ko...@tresata.com>.

got it. thats good to know. thanks!

On Wed, Apr 5, 2017 at 12:07 AM, Kazuaki Ishizaki <IS...@jp.ibm.com>
wrote:

> Hi,
> The page in the URL explains the old style of physical plan output.
> The current style adds "*" as a prefix of each operation that the
> whole-stage codegen can be apply to.
>
> So, in your test case, whole-stage codegen has been already enabled!!
>
> FYI. I think that it is a good topic for dev@spark.apache.org.
>
> Kazuaki Ishizaki
>
>
>
> From:        Koert Kuipers <ko...@tresata.com>
> To:        "user@spark.apache.org" <us...@spark.apache.org>
> Date:        2017/04/05 05:12
> Subject:        how do i force unit test to do whole stage codegen
> ------------------------------
>
>
>
> i wrote my own expression with eval and doGenCode, but doGenCode never
> gets called in tests.
>
> also as a test i ran this in a unit test:
> spark.range(10).select('id as 'asId).where('id === 4).explain
> according to
>
> *https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html*
> <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html>
> this is supposed to show:
> == Physical Plan ==
> WholeStageCodegen
> :  +- Project [id#0L AS asId#3L]
> :     +- Filter (id#0L = 4)
> :        +- Range 0, 1, 8, 10, [id#0L]
>
> but it doesn't. instead it shows:
>
> == Physical Plan ==
> *Project [id#12L AS asId#15L]
> +- *Filter (id#12L = 4)
>   +- *Range (0, 10, step=1, splits=Some(4))
>
> so i am again missing the WholeStageCodegen. any idea why?
>
> i create spark session for unit tests simply as:
> val session = SparkSession.builder
>  .master("local[*]")
>  .appName("test")
>  .config("spark.sql.shuffle.partitions", 4)
>  .getOrCreate()
>
>
>

Re: how do i force unit test to do whole stage codegen

Posted by Jacek Laskowski <ja...@japila.pl>.

Thanks Koert for the kind words. That part however is easy to fix and
was surprised to have seen the old style referenced (!)

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Wed, Apr 5, 2017 at 6:14 PM, Koert Kuipers <ko...@tresata.com> wrote:
> its pretty much impossible to be fully up to date with spark given how fast
> it moves!
>
> the book is a very helpful reference
>
> On Wed, Apr 5, 2017 at 11:15 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>> Hi,
>>
>> I'm very sorry for not being up to date with the current style (and
>> "promoting" the old style) and am going to review that part soon. I'm very
>> close to touch it again since I'm with Optimizer these days.
>>
>> Jacek
>>
>> On 5 Apr 2017 6:08 a.m., "Kazuaki Ishizaki" <IS...@jp.ibm.com> wrote:
>>>
>>> Hi,
>>> The page in the URL explains the old style of physical plan output.
>>> The current style adds "*" as a prefix of each operation that the
>>> whole-stage codegen can be apply to.
>>>
>>> So, in your test case, whole-stage codegen has been already enabled!!
>>>
>>> FYI. I think that it is a good topic for dev@spark.apache.org.
>>>
>>> Kazuaki Ishizaki
>>>
>>>
>>>
>>> From:        Koert Kuipers <ko...@tresata.com>
>>> To:        "user@spark.apache.org" <us...@spark.apache.org>
>>> Date:        2017/04/05 05:12
>>> Subject:        how do i force unit test to do whole stage codegen
>>> ________________________________
>>>
>>>
>>>
>>> i wrote my own expression with eval and doGenCode, but doGenCode never
>>> gets called in tests.
>>>
>>> also as a test i ran this in a unit test:
>>> spark.range(10).select('id as 'asId).where('id === 4).explain
>>> according to
>>>
>>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html
>>> this is supposed to show:
>>> == Physical Plan ==
>>> WholeStageCodegen
>>> :  +- Project [id#0L AS asId#3L]
>>> :     +- Filter (id#0L = 4)
>>> :        +- Range 0, 1, 8, 10, [id#0L]
>>>
>>> but it doesn't. instead it shows:
>>>
>>> == Physical Plan ==
>>> *Project [id#12L AS asId#15L]
>>> +- *Filter (id#12L = 4)
>>>   +- *Range (0, 10, step=1, splits=Some(4))
>>>
>>> so i am again missing the WholeStageCodegen. any idea why?
>>>
>>> i create spark session for unit tests simply as:
>>> val session = SparkSession.builder
>>>  .master("local[*]")
>>>  .appName("test")
>>>  .config("spark.sql.shuffle.partitions", 4)
>>>  .getOrCreate()
>>>
>>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: how do i force unit test to do whole stage codegen

Posted by Koert Kuipers <ko...@tresata.com>.

its pretty much impossible to be fully up to date with spark given how fast
it moves!

the book is a very helpful reference

On Wed, Apr 5, 2017 at 11:15 AM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> I'm very sorry for not being up to date with the current style (and
> "promoting" the old style) and am going to review that part soon. I'm very
> close to touch it again since I'm with Optimizer these days.
>
> Jacek
>
> On 5 Apr 2017 6:08 a.m., "Kazuaki Ishizaki" <IS...@jp.ibm.com> wrote:
>
>> Hi,
>> The page in the URL explains the old style of physical plan output.
>> The current style adds "*" as a prefix of each operation that the
>> whole-stage codegen can be apply to.
>>
>> So, in your test case, whole-stage codegen has been already enabled!!
>>
>> FYI. I think that it is a good topic for dev@spark.apache.org.
>>
>> Kazuaki Ishizaki
>>
>>
>>
>> From:        Koert Kuipers <ko...@tresata.com>
>> To:        "user@spark.apache.org" <us...@spark.apache.org>
>> Date:        2017/04/05 05:12
>> Subject:        how do i force unit test to do whole stage codegen
>> ------------------------------
>>
>>
>>
>> i wrote my own expression with eval and doGenCode, but doGenCode never
>> gets called in tests.
>>
>> also as a test i ran this in a unit test:
>> spark.range(10).select('id as 'asId).where('id === 4).explain
>> according to
>>
>> *https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html*
>> <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html>
>> this is supposed to show:
>> == Physical Plan ==
>> WholeStageCodegen
>> :  +- Project [id#0L AS asId#3L]
>> :     +- Filter (id#0L = 4)
>> :        +- Range 0, 1, 8, 10, [id#0L]
>>
>> but it doesn't. instead it shows:
>>
>> == Physical Plan ==
>> *Project [id#12L AS asId#15L]
>> +- *Filter (id#12L = 4)
>>   +- *Range (0, 10, step=1, splits=Some(4))
>>
>> so i am again missing the WholeStageCodegen. any idea why?
>>
>> i create spark session for unit tests simply as:
>> val session = SparkSession.builder
>>  .master("local[*]")
>>  .appName("test")
>>  .config("spark.sql.shuffle.partitions", 4)
>>  .getOrCreate()
>>
>>
>>

Re: how do i force unit test to do whole stage codegen

Posted by Jacek Laskowski <ja...@japila.pl>.

Hi,

I'm very sorry for not being up to date with the current style (and
"promoting" the old style) and am going to review that part soon. I'm very
close to touch it again since I'm with Optimizer these days.

Jacek

On 5 Apr 2017 6:08 a.m., "Kazuaki Ishizaki" <IS...@jp.ibm.com> wrote:

> Hi,
> The page in the URL explains the old style of physical plan output.
> The current style adds "*" as a prefix of each operation that the
> whole-stage codegen can be apply to.
>
> So, in your test case, whole-stage codegen has been already enabled!!
>
> FYI. I think that it is a good topic for dev@spark.apache.org.
>
> Kazuaki Ishizaki
>
>
>
> From:        Koert Kuipers <ko...@tresata.com>
> To:        "user@spark.apache.org" <us...@spark.apache.org>
> Date:        2017/04/05 05:12
> Subject:        how do i force unit test to do whole stage codegen
> ------------------------------
>
>
>
> i wrote my own expression with eval and doGenCode, but doGenCode never
> gets called in tests.
>
> also as a test i ran this in a unit test:
> spark.range(10).select('id as 'asId).where('id === 4).explain
> according to
>
> *https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html*
> <https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html>
> this is supposed to show:
> == Physical Plan ==
> WholeStageCodegen
> :  +- Project [id#0L AS asId#3L]
> :     +- Filter (id#0L = 4)
> :        +- Range 0, 1, 8, 10, [id#0L]
>
> but it doesn't. instead it shows:
>
> == Physical Plan ==
> *Project [id#12L AS asId#15L]
> +- *Filter (id#12L = 4)
>   +- *Range (0, 10, step=1, splits=Some(4))
>
> so i am again missing the WholeStageCodegen. any idea why?
>
> i create spark session for unit tests simply as:
> val session = SparkSession.builder
>  .master("local[*]")
>  .appName("test")
>  .config("spark.sql.shuffle.partitions", 4)
>  .getOrCreate()
>
>
>

Re: how do i force unit test to do whole stage codegen

Posted by Kazuaki Ishizaki <IS...@jp.ibm.com>.

Hi,
The page in the URL explains the old style of physical plan output.
The current style adds "*" as a prefix of each operation that the 
whole-stage codegen can be apply to.

So, in your test case, whole-stage codegen has been already enabled!!

FYI. I think that it is a good topic for dev@spark.apache.org.

Kazuaki Ishizaki



From:   Koert Kuipers <ko...@tresata.com>
To:     "user@spark.apache.org" <us...@spark.apache.org>
Date:   2017/04/05 05:12
Subject:        how do i force unit test to do whole stage codegen



i wrote my own expression with eval and doGenCode, but doGenCode never 
gets called in tests.

also as a test i ran this in a unit test:
spark.range(10).select('id as 'asId).where('id === 4).explain
according to
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html
this is supposed to show:
== Physical Plan ==
WholeStageCodegen
:  +- Project [id#0L AS asId#3L]
:     +- Filter (id#0L = 4)
:        +- Range 0, 1, 8, 10, [id#0L]

but it doesn't. instead it shows:

== Physical Plan ==
*Project [id#12L AS asId#15L]
+- *Filter (id#12L = 4)
   +- *Range (0, 10, step=1, splits=Some(4))

so i am again missing the WholeStageCodegen. any idea why?

i create spark session for unit tests simply as:
val session = SparkSession.builder
  .master("local[*]")
  .appName("test")
  .config("spark.sql.shuffle.partitions", 4)
  .getOrCreate()