You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Terry Kim (Jira)" <ji...@apache.org> on 2023/11/03 23:04:00 UTC

[jira] [Updated] (SPARK-45784) Introduce clustering mechanism to Spark

     [ https://issues.apache.org/jira/browse/SPARK-45784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Terry Kim updated SPARK-45784:
------------------------------
    Description: This proposes to introduce a clustering mechanism such that different data sources (e.g., Delta, Iceberg, etc.) can implement format specific clustering.  (was: This proposes to introduce CLUSTER BY clause to CREATE/REPLACE SQL syntax:
{code:java}
CREATE TABLE tbl(a int, b string) CLUSTER BY (a, b){code}
There will not be an implementation, but it's up to the catalog implementation to utilize the clustering information (e.g., Delta, Iceberg, etc.).

Note that specifying CLUSTER BY will throw an exception if the table being created is for v1 source or session catalog (e.g., v2 source w/ session catalog).)

> Introduce clustering mechanism to Spark
> ---------------------------------------
>
>                 Key: SPARK-45784
>                 URL: https://issues.apache.org/jira/browse/SPARK-45784
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Terry Kim
>            Priority: Major
>
> This proposes to introduce a clustering mechanism such that different data sources (e.g., Delta, Iceberg, etc.) can implement format specific clustering.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org