You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2022/02/17 09:02:00 UTC

[jira] [Created] (SPARK-38237) Rename back StatefulOpClusteredDistribution to HashClusteredDistribution

Jungtaek Lim created SPARK-38237:
------------------------------------

             Summary: Rename back StatefulOpClusteredDistribution to HashClusteredDistribution
                 Key: SPARK-38237
                 URL: https://issues.apache.org/jira/browse/SPARK-38237
             Project: Spark
          Issue Type: Task
          Components: SQL, Structured Streaming
    Affects Versions: 3.3.0
            Reporter: Jungtaek Lim


We still find HashClusteredDistribution be useful for batch query as well. For example, we had a case with lower parallelism than expected due to the fact ClusteredDistribution is used for aggregation which matches with HashPartitioning with sub-key groups (where the parallelism also depends on cardinality).

We propose to rename back HashClusteredDistribution with retaining NOTE for stateful operator. The distribution should not be still touched anyway due to the requirement of stateful operator, but can be co-used with batch case if needed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org