You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Denis Krivenko (Jira)" <ji...@apache.org> on 2021/08/11 10:09:00 UTC
[jira] [Updated] (SPARK-36472) Improve SQL syntax for MERGE

     [ https://issues.apache.org/jira/browse/SPARK-36472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Denis Krivenko updated SPARK-36472:
-----------------------------------
    Description: 
Existing SQL syntax for *MEGRE* (see Delta Lake examples [here|https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge] and [here|https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge-into]) could be improved by adding an alternative for {{<merge_condition>}}

*Main assumption*
 In common cases target and source tables have the same column names used in {{<merge_condition>}} as merge keys, for example:
{code:sql}
ON target.key1 = source.key1 AND target.key2 = source.key2{code}
It would be more convenient to use a syntax similar to:
{code:sql}
ON COLUMNS (key1, key2)
-- or
ON MATCHING (key1, key2)
{code}
The same approach is used for [JOIN|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html] where {{join_criteria}} syntax is
{code:sql}
ON boolean_expression | USING ( column_name [ , ... ] )
{code}
*Improvement proposal*
 Syntax
{code:sql}
MERGE INTO target_table_identifier [AS target_alias]
USING source_table_identifier [<time_travel_version>] [AS source_alias]
ON { <merge_condition> | COLUMNS ( column_name [ , ... ] ) }
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN NOT MATCHED [ AND <condition> ]  THEN <not_matched_action> ]
{code}
Example
{code:sql}
MERGE INTO target
USING source
ON COLUMNS (key1, key2)
WHEN MATCHED THEN
    UPDATE SET *
WHEN NOT MATCHED THEN
    INSERT *
{code}

  was:
Existing SQL syntax for *MEGRE* (see Delta Lake examples [here|https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge] and [here|https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge-into]) could be improved by adding an alternative for {{<merge_condition>}}

*Main assumption*
 In common cases target and source tables have the same column names used in {{<merge_condition>}} as merge keys, for example:
{code:sql}
ON target.key1 = source.key1 AND target.key2 = source.key2{code}
It would be more convenient to use a syntax similar to:
{code:sql}
ON COLUMNS (key1, key2)
{code}
The same approach is used for [JOIN|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html] where {{join_criteria}} syntax is
{code:sql}
ON boolean_expression | USING ( column_name [ , ... ] )
{code}
*Improvement proposal*
 Syntax
{code:sql}
MERGE INTO target_table_identifier [AS target_alias]
USING source_table_identifier [<time_travel_version>] [AS source_alias]
ON { <merge_condition> | COLUMNS ( column_name [ , ... ] ) }
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN NOT MATCHED [ AND <condition> ]  THEN <not_matched_action> ]
{code}
Example
{code:sql}
MERGE INTO target
USING source
ON COLUMNS (key1, key2)
WHEN MATCHED THEN
    UPDATE SET *
WHEN NOT MATCHED THEN
    INSERT *
{code}


> Improve SQL syntax for MERGE
> ----------------------------
>
>                 Key: SPARK-36472
>                 URL: https://issues.apache.org/jira/browse/SPARK-36472
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.2
>            Reporter: Denis Krivenko
>            Priority: Trivial
>
> Existing SQL syntax for *MEGRE* (see Delta Lake examples [here|https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge] and [here|https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge-into]) could be improved by adding an alternative for {{<merge_condition>}}
> *Main assumption*
>  In common cases target and source tables have the same column names used in {{<merge_condition>}} as merge keys, for example:
> {code:sql}
> ON target.key1 = source.key1 AND target.key2 = source.key2{code}
> It would be more convenient to use a syntax similar to:
> {code:sql}
> ON COLUMNS (key1, key2)
> -- or
> ON MATCHING (key1, key2)
> {code}
> The same approach is used for [JOIN|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html] where {{join_criteria}} syntax is
> {code:sql}
> ON boolean_expression | USING ( column_name [ , ... ] )
> {code}
> *Improvement proposal*
>  Syntax
> {code:sql}
> MERGE INTO target_table_identifier [AS target_alias]
> USING source_table_identifier [<time_travel_version>] [AS source_alias]
> ON { <merge_condition> | COLUMNS ( column_name [ , ... ] ) }
> [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
> [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
> [ WHEN NOT MATCHED [ AND <condition> ]  THEN <not_matched_action> ]
> {code}
> Example
> {code:sql}
> MERGE INTO target
> USING source
> ON COLUMNS (key1, key2)
> WHEN MATCHED THEN
>     UPDATE SET *
> WHEN NOT MATCHED THEN
>     INSERT *
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org