You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/19 09:10:05 UTC

[GitHub] [spark] EnricoMi commented on a diff in pull request #37211: [SPARK-39644][SQL] Add RangePartitioning reporting for V2 DataSources

EnricoMi commented on code in PR #37211:
URL: https://github.com/apache/spark/pull/37211#discussion_r949983359


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala:
##########
@@ -119,13 +119,16 @@ case class DataSourceV2Relation(
  * @param output the output attributes of this relation
  * @param keyGroupedPartitioning if set, the partitioning expressions that are used to split the
  *                               rows in the scan across different partitions
- * @param ordering if set, the ordering provided by the scan
+ * @param rangePartitioning if set, the range partitioning expressions that are used to split the
+ *                               rows in the scan across different partitions
+ * @param ordering if set, the in-partition ordering provided by the scan
  */
 case class DataSourceV2ScanRelation(
     relation: DataSourceV2Relation,
     scan: Scan,
     output: Seq[AttributeReference],
     keyGroupedPartitioning: Option[Seq[Expression]] = None,
+    rangePartitioning: Option[Seq[SortOrder]] = None,

Review Comment:
   Those semantics are not introduced by this change, Spark already has the notion of global order and in-partition order (so 2.). I agree, this has to be documented explicitly and carefully wherever this is relevant in the code.
   
   Can you give an example of "incompatible" global and in-partition ordering? I think they are orthogonal if you think of range partitioning as a key-grouped partitioning with ordered keys while data inside partitions can be ordered arbitrarily.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org