You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "vinodkc (via GitHub)" <gi...@apache.org> on 2023/07/05 14:11:57 UTC

[GitHub] [spark] vinodkc commented on a diff in pull request #41839: [SPARK-44287][SQL] Use PartitionEvaluator API for RowToColumnarExec & ColumnarToRowExec SQL operators.

vinodkc commented on code in PR #41839:
URL: https://github.com/apache/spark/pull/41839#discussion_r1253172548


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala:
##########
@@ -89,15 +87,18 @@ case class ColumnarToRowExec(child: SparkPlan) extends ColumnarToRowTransition w
   override def doExecute(): RDD[InternalRow] = {
     val numOutputRows = longMetric("numOutputRows")
     val numInputBatches = longMetric("numInputBatches")
-    // This avoids calling `output` in the RDD closure, so that we don't need to include the entire
-    // plan (this) in the closure.
-    val localOutput = this.output
-    child.executeColumnar().mapPartitionsInternal { batches =>
-      val toUnsafe = UnsafeProjection.create(localOutput, localOutput)
-      batches.flatMap { batch =>
-        numInputBatches += 1
-        numOutputRows += batch.numRows()
-        batch.rowIterator().asScala.map(toUnsafe)
+    val evaluatorFactory =
+      new ColumnarToRowEvaluatorFactory(
+        child.output,
+        numOutputRows,
+        numInputBatches)
+
+    if (conf.usePartitionEvaluator) {
+      child.executeColumnar().mapPartitionsWithEvaluator(evaluatorFactory)
+    } else {
+      child.executeColumnar().mapPartitionsInternal { batches =>
+        val evaluator = evaluatorFactory.createEvaluator()
+        evaluator.eval(0, batches)

Review Comment:
   In the original code, the index was not used as `mapPartitionsInternal` is called



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org