You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/08/02 19:23:01 UTC
[jira] [Commented] (SPARK-21603) The wholestage codegen will be
much slower then wholestage codegen is closed when the function is too long
[ https://issues.apache.org/jira/browse/SPARK-21603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111596#comment-16111596 ]
Hyukjin Kwon commented on SPARK-21603:
--------------------------------------
User 'eatoncys' has created a pull request for this issue:
https://github.com/apache/spark/pull/18810
> The wholestage codegen will be much slower then wholestage codegen is closed when the function is too long
> ----------------------------------------------------------------------------------------------------------
>
> Key: SPARK-21603
> URL: https://issues.apache.org/jira/browse/SPARK-21603
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: eaton
>
> A benchmark test result is 10x slower when the generated function is too long :
> ignore("max function length of wholestagecodegen") {
> val N = 20 << 15
> val benchmark = new Benchmark("max function length of wholestagecodegen", N)
> def f(): Unit = sparkSession.range(N)
> .selectExpr(
> "id",
> "(id & 1023) as k1",
> "cast(id & 1023 as double) as k2",
> "cast(id & 1023 as int) as k3",
> "case when id > 100 and id <= 200 then 1 else 0 end as v1",
> "case when id > 200 and id <= 300 then 1 else 0 end as v2",
> "case when id > 300 and id <= 400 then 1 else 0 end as v3",
> "case when id > 400 and id <= 500 then 1 else 0 end as v4",
> "case when id > 500 and id <= 600 then 1 else 0 end as v5",
> "case when id > 600 and id <= 700 then 1 else 0 end as v6",
> "case when id > 700 and id <= 800 then 1 else 0 end as v7",
> "case when id > 800 and id <= 900 then 1 else 0 end as v8",
> "case when id > 900 and id <= 1000 then 1 else 0 end as v9",
> "case when id > 1000 and id <= 1100 then 1 else 0 end as v10",
> "case when id > 1100 and id <= 1200 then 1 else 0 end as v11",
> "case when id > 1200 and id <= 1300 then 1 else 0 end as v12",
> "case when id > 1300 and id <= 1400 then 1 else 0 end as v13",
> "case when id > 1400 and id <= 1500 then 1 else 0 end as v14",
> "case when id > 1500 and id <= 1600 then 1 else 0 end as v15",
> "case when id > 1600 and id <= 1700 then 1 else 0 end as v16",
> "case when id > 1700 and id <= 1800 then 1 else 0 end as v17",
> "case when id > 1800 and id <= 1900 then 1 else 0 end as v18")
> .groupBy("k1", "k2", "k3")
> .sum()
> .collect()
> benchmark.addCase(s"codegen = F") { iter =>
> sparkSession.conf.set("spark.sql.codegen.wholeStage", "false")
> f()
> }
> benchmark.addCase(s"codegen = T") { iter =>
> sparkSession.conf.set("spark.sql.codegen.wholeStage", "true")
> sparkSession.conf.set("spark.sql.codegen.MaxFunctionLength", "10000")
> f()
> }
> benchmark.run()
> /*
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_111-b14 on Windows 7 6.1
> Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
> max function length of wholestagecodegen: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> codegen = F 443 / 507 1.5 676.0 1.0X
> codegen = T 3279 / 3283 0.2 5002.6 0.1X
> */
> }
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org