You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ya...@apache.org on 2020/03/19 11:54:58 UTC
[spark] branch branch-3.0 updated: [SPARK-31187][SQL] Sort the
whole-stage codegen debug output by codegenStageId
This is an automated email from the ASF dual-hosted git repository.
yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push:
new a8c08b1 [SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId
a8c08b1 is described below
commit a8c08b1d81aefd1e3d7f4616b76e2285f9981cc7
Author: Kris Mok <kr...@databricks.com>
AuthorDate: Thu Mar 19 20:53:01 2020 +0900
[SPARK-31187][SQL] Sort the whole-stage codegen debug output by codegenStageId
### What changes were proposed in this pull request?
Spark SQL's whole-stage codegen (WSCG) supports dumping the generated code to help with debugging. One way to get the generated code is through `df.queryExecution.debug.codegen`, or SQL `EXPLAIN CODEGEN` statement.
The generated code is currently printed without specific ordering, which can make debugging a bit annoying. This PR makes a minor improvement to sort the codegen dump by the `codegenStageId`, ascending.
After this change, the following query:
```scala
spark.range(10).agg(sum('id)).queryExecution.debug.codegen
```
will always dump the generated code in a natural, stable order. A version of this example with shorter output is:
```
spark.range(10).agg(sum('id)).queryExecution.debug.codegenToSeq.map(_._1).foreach(println)
*(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], output=[sum#15L])
+- *(1) Range (0, 10, step=1, splits=16)
*(2) HashAggregate(keys=[], functions=[sum(id#8L)], output=[sum(id)#12L])
+- Exchange SinglePartition, true, [id=#30]
+- *(1) HashAggregate(keys=[], functions=[partial_sum(id#8L)], output=[sum#15L])
+- *(1) Range (0, 10, step=1, splits=16)
```
The number of codegen stages within a single SQL query tends to be very small, most likely < 50, so the overhead of adding the sorting shouldn't be significant.
### Why are the changes needed?
Minor improvement to aid WSCG debugging.
### Does this PR introduce any user-facing change?
No user-facing change for end-users; minor change for developers who debug WSCG generated code.
### How was this patch tested?
Manually tested the output; all other tests still pass.
Closes #27955 from rednaxelafx/codegen.
Authored-by: Kris Mok <kr...@databricks.com>
Signed-off-by: Takeshi Yamamuro <ya...@apache.org>
(cherry picked from commit a1776288f48d450fea28f50fef78fd6aa10a8160)
Signed-off-by: Takeshi Yamamuro <ya...@apache.org>
---
.../src/main/scala/org/apache/spark/sql/execution/debug/package.scala | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
index 6a57ef2..6c40104 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
@@ -113,7 +113,7 @@ package object debug {
s
case s => s
}
- codegenSubtrees.toSeq.map { subtree =>
+ codegenSubtrees.toSeq.sortBy(_.codegenStageId).map { subtree =>
val (_, source) = subtree.doCodeGen()
val codeStats = try {
CodeGenerator.compile(source)._2
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org