You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "vinodkc (via GitHub)" <gi...@apache.org> on 2023/07/27 16:10:49 UTC

[GitHub] [spark] vinodkc commented on a diff in pull request #41839: [SPARK-44287][SQL] Use PartitionEvaluator API in RowToColumnarExec & ColumnarToRowExec SQL operators.

vinodkc commented on code in PR #41839:
URL: https://github.com/apache/spark/pull/41839#discussion_r1276514924


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala:
##########
@@ -89,15 +87,18 @@ case class ColumnarToRowExec(child: SparkPlan) extends ColumnarToRowTransition w
   override def doExecute(): RDD[InternalRow] = {
     val numOutputRows = longMetric("numOutputRows")
     val numInputBatches = longMetric("numInputBatches")
-    // This avoids calling `output` in the RDD closure, so that we don't need to include the entire
-    // plan (this) in the closure.
-    val localOutput = this.output
-    child.executeColumnar().mapPartitionsInternal { batches =>
-      val toUnsafe = UnsafeProjection.create(localOutput, localOutput)
-      batches.flatMap { batch =>
-        numInputBatches += 1
-        numOutputRows += batch.numRows()
-        batch.rowIterator().asScala.map(toUnsafe)
+    val evaluatorFactory =
+      new ColumnarToRowEvaluatorFactory(
+        child.output,
+        numOutputRows,
+        numInputBatches)
+
+    if (conf.usePartitionEvaluator) {
+      child.executeColumnar().mapPartitionsWithEvaluator(evaluatorFactory)
+    } else {
+      child.executeColumnar().mapPartitionsInternal { batches =>
+        val evaluator = evaluatorFactory.createEvaluator()
+        evaluator.eval(0, batches)

Review Comment:
   @cloud-fan , Thanks for the fix,
   I can fix similar issues in other merged PRs
   https://github.com/apache/spark/pull/42024/files
   https://github.com/apache/spark/pull/42025/files
   https://github.com/apache/spark/pull/41884/files



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org