You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ohad Raviv (Jira)" <ji...@apache.org> on 2022/11/27 13:39:00 UTC

[jira] [Created] (SPARK-41277) Save and leverage shuffle key in tblproperties

Ohad Raviv created SPARK-41277:
----------------------------------

             Summary: Save and leverage shuffle key in tblproperties
                 Key: SPARK-41277
                 URL: https://issues.apache.org/jira/browse/SPARK-41277
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.1
            Reporter: Ohad Raviv


I'm not sure if I'm not missing anything trivial.

In a typical process, many datasets get materialized and many of them after a shuffle (e.g join). then they would again be involved in further actions and often use the same key.

Wouldn't it make sense to save the shuffle key along with the table to avoid unnecessary shuffles?

Also, the implementation seems quite straightforward - to just leverage the bucketing mechanism.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org