You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reece Robinson (Jira)" <ji...@apache.org> on 2023/10/03 22:57:00 UTC

[jira] [Updated] (SPARK-27943) Implement default constraint with Column for Hive table

     [ https://issues.apache.org/jira/browse/SPARK-27943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reece Robinson updated SPARK-27943:
-----------------------------------
    Attachment: Screenshot 2023-10-04 at 11.11.28 AM.png

> Implement default constraint with Column for Hive table
> -------------------------------------------------------
>
>                 Key: SPARK-27943
>                 URL: https://issues.apache.org/jira/browse/SPARK-27943
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Jiaan Geng
>            Priority: Major
>         Attachments: Screenshot 2023-10-04 at 11.11.28 AM.png
>
>
>  
>  *Background*
> Default constraint with column is ANSI standard.
> Hive 3.0+ has supported default constraint ref:https://issues.apache.org/jira/browse/HIVE-18726
> But Spark SQL implement this feature not yet.
> *Design*
> Hive is widely used in production environments and is the standard in the field of big data in fact.
> But Hive exists many version used in production and the feature between each version are different.
> Spark SQL need to implement default constraint, but there are three points to pay attention to in design:
> _First_, Spark SQL should reduce coupling with Hive.
> _Second_, default constraint could compatible with different versions of Hive.
> _Thrid_, Which expression of default constraint should Spark SQL support? I think should support `literal`, `current_date()`, `current_timestamp()`. Maybe other expression should also supported, like `Cast(1 as float)`, `1 + 2` and so on.
> We want to save the metadata of default constraint into properties of Hive table, and then we restore metadata from the properties after client gets newest metadata.The implement is the same as other metadata (e.g. partition,bucket,statistics).
> Because default constraint is part of column, so I think could reuse the metadata of StructField. The default constraint will cached by metadata of StructField.
>  
> *Tasks*
> This is a big work, wo I want to split this work into some sub tasks, as follows:
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org