You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/14 20:24:33 UTC

[GitHub] [spark] HeartSaVioR commented on a change in pull request #35512: [SPARK-38124][SS][FOLLOWUP] Document the current challenge on fixing distribution of stateful operator

HeartSaVioR commented on a change in pull request #35512:
URL: https://github.com/apache/spark/pull/35512#discussion_r806210501



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
##########
@@ -101,6 +101,14 @@ case class ClusteredDistribution(
  * Since this distribution relies on [[HashPartitioning]] on the physical partitioning of the
  * stateful operator, only [[HashPartitioning]] (and HashPartitioning in
  * [[PartitioningCollection]]) can satisfy this distribution.
+ *
+ * NOTE: This is applied only stream-stream join as of now. For other stateful operators, we have
+ * been using ClusteredDistribution, which could construct the physical partitioning of the state
+ * in different way. (ClusteredDistribution requires relaxed condition and multiple
+ * partitionings can satisfy the requirement.) We need to construct the way to fix this with
+ * minimizing possibility to break the existing checkpoints.
+ *
+ * TODO: SPARK-38204 to address above note.

Review comment:
       We seem to use both, but I see more usages on () so I'll follow it. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org