You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/09 23:46:55 UTC

[GitHub] [spark] sunchao commented on a change in pull request #34642: [SPARK-37369][SQL] Avoid redundant ColumnarToRow transistion on InMemoryTableScan

sunchao commented on a change in pull request #34642:
URL: https://github.com/apache/spark/pull/34642#discussion_r766214720



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala
##########
@@ -71,6 +71,11 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ
 
   val id: Int = SparkPlan.newPlanId()
 
+  /**
+   * Return true if this stage of the plan supports row-based execution.

Review comment:
       Maybe add some explanation why we need both this and `supportsColumnar`? it's a bit confusing when reading this code.
   
   Also I'm wondering if something like `prefersColumnar` is better, so that we have:
   - `supportsColumnar`: this plan can support columnar output, alongside the default row-based output which every plan supports.
   - `prefersColumnar`: this plan prefers to output columnar batches even though it is not explicitly requested (e.g., `outputsColumnar` is false).
   

##########
File path: sql/core/benchmarks/InMemoryColumnarBenchmark-results.txt
##########
@@ -0,0 +1,12 @@
+================================================================================================

Review comment:
       nit: ideally we should generate result using the GitHub workflow

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
##########
@@ -256,7 +256,8 @@ case class CachedRDDBuilder(
   }
 
   private def buildBuffers(): RDD[CachedBatch] = {
-    val cb = if (cachedPlan.supportsColumnar) {
+    val cb = if (cachedPlan.supportsColumnar &&
+        serializer.supportsColumnarInput(cachedPlan.output)) {

Review comment:
       hmm why this is necessary? shouldn't `cachedPlan.supportsColumnar` already covers this? for instance in `InMemoryTableScanExec`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org