You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Denis Krivenko (Jira)" <ji...@apache.org> on 2021/08/11 10:09:00 UTC
[jira] [Updated] (SPARK-36472) Improve SQL syntax for MERGE
[ https://issues.apache.org/jira/browse/SPARK-36472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denis Krivenko updated SPARK-36472:
-----------------------------------
Description:
Existing SQL syntax for *MEGRE* (see Delta Lake examples [here|https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge] and [here|https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge-into]) could be improved by adding an alternative for {{<merge_condition>}}
*Main assumption*
In common cases target and source tables have the same column names used in {{<merge_condition>}} as merge keys, for example:
{code:sql}
ON target.key1 = source.key1 AND target.key2 = source.key2{code}
It would be more convenient to use a syntax similar to:
{code:sql}
ON COLUMNS (key1, key2)
-- or
ON MATCHINGÂ (key1, key2)
{code}
The same approach is used for [JOIN|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html] where {{join_criteria}} syntax is
{code:sql}
ON boolean_expression | USING ( column_name [ , ... ] )
{code}
*Improvement proposal*
Syntax
{code:sql}
MERGE INTO target_table_identifier [AS target_alias]
USING source_table_identifier [<time_travel_version>] [AS source_alias]
ON { <merge_condition> | COLUMNS ( column_name [ , ... ] ) }
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN NOT MATCHED [ AND <condition> ] THEN <not_matched_action> ]
{code}
Example
{code:sql}
MERGE INTO target
USING source
ON COLUMNS (key1, key2)
WHEN MATCHED THEN
UPDATE SET *
WHEN NOT MATCHED THEN
INSERT *
{code}
was:
Existing SQL syntax for *MEGRE* (see Delta Lake examples [here|https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge] and [here|https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge-into]) could be improved by adding an alternative for {{<merge_condition>}}
*Main assumption*
In common cases target and source tables have the same column names used in {{<merge_condition>}} as merge keys, for example:
{code:sql}
ON target.key1 = source.key1 AND target.key2 = source.key2{code}
It would be more convenient to use a syntax similar to:
{code:sql}
ON COLUMNS (key1, key2)
{code}
The same approach is used for [JOIN|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html] where {{join_criteria}} syntax is
{code:sql}
ON boolean_expression | USING ( column_name [ , ... ] )
{code}
*Improvement proposal*
Syntax
{code:sql}
MERGE INTO target_table_identifier [AS target_alias]
USING source_table_identifier [<time_travel_version>] [AS source_alias]
ON { <merge_condition> | COLUMNS ( column_name [ , ... ] ) }
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN NOT MATCHED [ AND <condition> ] THEN <not_matched_action> ]
{code}
Example
{code:sql}
MERGE INTO target
USING source
ON COLUMNS (key1, key2)
WHEN MATCHED THEN
UPDATE SET *
WHEN NOT MATCHED THEN
INSERT *
{code}
> Improve SQL syntax for MERGE
> ----------------------------
>
> Key: SPARK-36472
> URL: https://issues.apache.org/jira/browse/SPARK-36472
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.2
> Reporter: Denis Krivenko
> Priority: Trivial
>
> Existing SQL syntax for *MEGRE* (see Delta Lake examples [here|https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge] and [here|https://docs.microsoft.com/en-us/azure/databricks/spark/latest/spark-sql/language-manual/delta-merge-into]) could be improved by adding an alternative for {{<merge_condition>}}
> *Main assumption*
> In common cases target and source tables have the same column names used in {{<merge_condition>}} as merge keys, for example:
> {code:sql}
> ON target.key1 = source.key1 AND target.key2 = source.key2{code}
> It would be more convenient to use a syntax similar to:
> {code:sql}
> ON COLUMNS (key1, key2)
> -- or
> ON MATCHINGÂ (key1, key2)
> {code}
> The same approach is used for [JOIN|https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html] where {{join_criteria}} syntax is
> {code:sql}
> ON boolean_expression | USING ( column_name [ , ... ] )
> {code}
> *Improvement proposal*
> Syntax
> {code:sql}
> MERGE INTO target_table_identifier [AS target_alias]
> USING source_table_identifier [<time_travel_version>] [AS source_alias]
> ON { <merge_condition> | COLUMNS ( column_name [ , ... ] ) }
> [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
> [ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
> [ WHEN NOT MATCHED [ AND <condition> ] THEN <not_matched_action> ]
> {code}
> Example
> {code:sql}
> MERGE INTO target
> USING source
> ON COLUMNS (key1, key2)
> WHEN MATCHED THEN
> UPDATE SET *
> WHEN NOT MATCHED THEN
> INSERT *
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org