You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Anton Okolnychyi (Jira)" <ji...@apache.org> on 2022/02/01 21:52:00 UTC

[jira] [Commented] (SPARK-35801) SPIP: Row-level operations in Data Source V2

    [ https://issues.apache.org/jira/browse/SPARK-35801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17485502#comment-17485502 ] 

Anton Okolnychyi commented on SPARK-35801:
------------------------------------------

[~viirya], shall we keep this one open until the implementation is done or can we close it now? The community has already voted on this SPIP.

> SPIP: Row-level operations in Data Source V2
> --------------------------------------------
>
>                 Key: SPARK-35801
>                 URL: https://issues.apache.org/jira/browse/SPARK-35801
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Anton Okolnychyi
>            Priority: Major
>              Labels: SPIP
>
> Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more important for modern Big Data workflows. Use cases include but are not limited to deleting a set of records for regulatory compliance, updating a set of records to fix an issue in the ingestion pipeline, applying changes in a transaction log to a fact table. Row-level operations allow users to easily express their use cases that would otherwise require much more SQL. Common patterns for updating partitions are to read, union, and overwrite or read, diff, and append. Using commands like MERGE, these operations are easier to express and can be more efficient to run.
> Hive supports [MERGE|https://blog.cloudera.com/update-hive-tables-easy-way/] and Spark should implement similar support.
> SPIP: https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org