You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "dtenedor (via GitHub)" <gi...@apache.org> on 2023/08/01 00:19:24 UTC

[GitHub] [spark] dtenedor commented on a diff in pull request #42174: [SPARK-44503][SQL] Add analysis and planning for PARTITION BY and ORDER BY clause after TABLE arguments for TVF calls

dtenedor commented on code in PR #42174:
URL: https://github.com/apache/spark/pull/42174#discussion_r1279997355


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala:
##########
@@ -61,5 +75,36 @@ case class FunctionTableSubqueryArgumentExpression(
   final override def nodePatternsInternal: Seq[TreePattern] =
     Seq(FUNCTION_TABLE_RELATION_ARGUMENT_EXPRESSION)
 
-  lazy val evaluable: LogicalPlan = Project(Seq(Alias(CreateStruct(plan.output), "c")()), plan)
+  def hasRepartitioning: Boolean = withSinglePartition || partitionByExpressions.nonEmpty
+
+  lazy val evaluable: LogicalPlan = {
+    val subquery = if (hasRepartitioning) {
+      // If the TABLE argument includes the WITH SINGLE PARTITION or PARTITION BY or ORDER BY
+      // clause(s), add a corresponding logical operator to represent the repartitioning operation
+      // in the query plan.
+      RepartitionForTableFunctionCall(

Review Comment:
   Good idea, I updated the code to use this, now we don't need to add a new operator!



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala:
##########
@@ -32,11 +32,22 @@ import org.apache.spark.sql.types.DataType
  * table in the catalog. In the latter case, the relation argument comprises
  * a table subquery that may itself refer to one or more tables in its own
  * FROM clause.
+ *
+ * Each TABLE argument may also optionally include a PARTITION BY clause. If present, these indicate
+ * how to logically split up the input relation such that the table-valued function evaluates
+ * exactly once for each partition, and returns the union of all results. If no partitioning list is
+ * present, this splitting of the input relation is undefined. Furthermore, if the PARTITION BY
+ * clause includes a following ORDER BY clause, Catalyst will sort the rows in each partition such
+ * that the table-valued function receives them one-by-one in the requested order. Otherwise, if no
+ * such ordering is specified, the ordering of rows within each partition is undefined.

Review Comment:
   Sounds good, done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org