You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Anton Okolnychyi <ao...@gmail.com> on 2021/06/24 23:53:32 UTC

[DISCUSS] SPIP: Row-level operations in Data Source V2

Hey everyone,

I'd like to start a discussion on adding support for executing row-level
operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
execution should be the same across data sources and the best way to do
that is to implement it in Spark.

Right now, Spark can only parse and to some extent analyze DELETE, UPDATE,
MERGE commands. Data sources that support row-level changes have to build
custom Spark extensions to execute such statements. The goal of this effort
is to come up with a flexible and easy-to-use API that will work across
data sources.

Design doc:
https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/

PR for handling DELETE statements:
https://github.com/apache/spark/pull/33008

Any feedback is more than welcome.

Liang-Chi was kind enough to shepherd this effort. Thanks!

- Anton

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

Posted by Anton Okolnychyi <ao...@gmail.com>.

I agree with the idea to start getting parts as soon as possible to make
sure the APIs are well-defined and the implementation is generic.
I have everything ready from my side.

- Anton

пт, 12 лист. 2021 о 17:47 L. C. Hsieh <vi...@gmail.com> пише:

> Hi all,
>
> I think mostly we are in favor for the SPIP as I've seen.
>
> If not more comments or discussion on the SPIP doc, I will raise a vote
> soon.
> Thanks.
>
> On Tue, Nov 2, 2021 at 9:58 AM L. C. Hsieh <vi...@gmail.com> wrote:
> >
> > +1 for the idea to commit the work earlier.
> >
> > I think we will raise the voting soon. Once it is passed, we can submit
> the PRs.
> >
> > What do you think? Anton.
> >
> > On Mon, Nov 1, 2021 at 7:59 AM Wenchen Fan <cl...@gmail.com> wrote:
> > >
> > > The general idea looks great. This is indeed a complicated API and we
> probably need more time to evaluate the API design. It's better to commit
> this work earlier so that we have more time to verify it before the 3.3
> release. Maybe we can commit the group-based API first, then the
> delta-based one, as the delta-based API is significantly more convoluted.
> > >
> > > On Thu, Oct 28, 2021 at 12:53 AM L. C. Hsieh <vi...@apache.org>
> wrote:
> > >>
> > >>
> > >> Thanks for the initial feedback.
> > >>
> > >> I think previously the community is busy on the works related to
> Spark 3.2 release.
> > >> As 3.2 release was done, I'd like to bring this up to the surface
> again and seek for more discussion and feedback.
> > >>
> > >> Thanks.
> > >>
> > >> On 2021/06/25 15:49:49, huaxin gao <hu...@gmail.com> wrote:
> > >> > I took a quick look at the PR and it looks like a great feature to
> have. It
> > >> > provides unified APIs for data sources to perform the commonly used
> > >> > operations easily and efficiently, so users don't have to implement
> > >> > customer extensions on their own. Thanks Anton for the work!
> > >> >
> > >> > On Thu, Jun 24, 2021 at 9:42 PM L. C. Hsieh <vi...@apache.org>
> wrote:
> > >> >
> > >> > > Thanks Anton. I'm voluntarily to be the shepherd of the SPIP.
> This is also
> > >> > > my first time to shepherd a SPIP, so please let me know if
> anything I can
> > >> > > improve.
> > >> > >
> > >> > > This looks great features and the rationale claimed by the
> proposal makes
> > >> > > sense. These operations are getting more common and more
> important in big
> > >> > > data workloads. Instead of building custom extensions by
> individual data
> > >> > > sources, it makes more sense to support the API from Spark.
> > >> > >
> > >> > > Please provide your thoughts about the proposal and the design.
> Appreciate
> > >> > > your feedback. Thank you!
> > >> > >
> > >> > > On 2021/06/24 23:53:32, Anton Okolnychyi <ao...@gmail.com>
> wrote:
> > >> > > > Hey everyone,
> > >> > > >
> > >> > > > I'd like to start a discussion on adding support for executing
> row-level
> > >> > > > operations such as DELETE, UPDATE, MERGE for v2 tables
> (SPARK-35801). The
> > >> > > > execution should be the same across data sources and the best
> way to do
> > >> > > > that is to implement it in Spark.
> > >> > > >
> > >> > > > Right now, Spark can only parse and to some extent analyze
> DELETE,
> > >> > > UPDATE,
> > >> > > > MERGE commands. Data sources that support row-level changes
> have to build
> > >> > > > custom Spark extensions to execute such statements. The goal of
> this
> > >> > > effort
> > >> > > > is to come up with a flexible and easy-to-use API that will
> work across
> > >> > > > data sources.
> > >> > > >
> > >> > > > Design doc:
> > >> > > >
> > >> > >
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
> > >> > > >
> > >> > > > PR for handling DELETE statements:
> > >> > > > https://github.com/apache/spark/pull/33008
> > >> > > >
> > >> > > > Any feedback is more than welcome.
> > >> > > >
> > >> > > > Liang-Chi was kind enough to shepherd this effort. Thanks!
> > >> > > >
> > >> > > > - Anton
> > >> > > >
> > >> > >
> > >> > >
> ---------------------------------------------------------------------
> > >> > > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> > >> > >
> > >> > >
> > >> >
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> > >>
>

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

Posted by "L. C. Hsieh" <vi...@gmail.com>.

Hi all,

I think mostly we are in favor for the SPIP as I've seen.

If not more comments or discussion on the SPIP doc, I will raise a vote soon.
Thanks.

On Tue, Nov 2, 2021 at 9:58 AM L. C. Hsieh <vi...@gmail.com> wrote:
>
> +1 for the idea to commit the work earlier.
>
> I think we will raise the voting soon. Once it is passed, we can submit the PRs.
>
> What do you think? Anton.
>
> On Mon, Nov 1, 2021 at 7:59 AM Wenchen Fan <cl...@gmail.com> wrote:
> >
> > The general idea looks great. This is indeed a complicated API and we probably need more time to evaluate the API design. It's better to commit this work earlier so that we have more time to verify it before the 3.3 release. Maybe we can commit the group-based API first, then the delta-based one, as the delta-based API is significantly more convoluted.
> >
> > On Thu, Oct 28, 2021 at 12:53 AM L. C. Hsieh <vi...@apache.org> wrote:
> >>
> >>
> >> Thanks for the initial feedback.
> >>
> >> I think previously the community is busy on the works related to Spark 3.2 release.
> >> As 3.2 release was done, I'd like to bring this up to the surface again and seek for more discussion and feedback.
> >>
> >> Thanks.
> >>
> >> On 2021/06/25 15:49:49, huaxin gao <hu...@gmail.com> wrote:
> >> > I took a quick look at the PR and it looks like a great feature to have. It
> >> > provides unified APIs for data sources to perform the commonly used
> >> > operations easily and efficiently, so users don't have to implement
> >> > customer extensions on their own. Thanks Anton for the work!
> >> >
> >> > On Thu, Jun 24, 2021 at 9:42 PM L. C. Hsieh <vi...@apache.org> wrote:
> >> >
> >> > > Thanks Anton. I'm voluntarily to be the shepherd of the SPIP. This is also
> >> > > my first time to shepherd a SPIP, so please let me know if anything I can
> >> > > improve.
> >> > >
> >> > > This looks great features and the rationale claimed by the proposal makes
> >> > > sense. These operations are getting more common and more important in big
> >> > > data workloads. Instead of building custom extensions by individual data
> >> > > sources, it makes more sense to support the API from Spark.
> >> > >
> >> > > Please provide your thoughts about the proposal and the design. Appreciate
> >> > > your feedback. Thank you!
> >> > >
> >> > > On 2021/06/24 23:53:32, Anton Okolnychyi <ao...@gmail.com> wrote:
> >> > > > Hey everyone,
> >> > > >
> >> > > > I'd like to start a discussion on adding support for executing row-level
> >> > > > operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
> >> > > > execution should be the same across data sources and the best way to do
> >> > > > that is to implement it in Spark.
> >> > > >
> >> > > > Right now, Spark can only parse and to some extent analyze DELETE,
> >> > > UPDATE,
> >> > > > MERGE commands. Data sources that support row-level changes have to build
> >> > > > custom Spark extensions to execute such statements. The goal of this
> >> > > effort
> >> > > > is to come up with a flexible and easy-to-use API that will work across
> >> > > > data sources.
> >> > > >
> >> > > > Design doc:
> >> > > >
> >> > > https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
> >> > > >
> >> > > > PR for handling DELETE statements:
> >> > > > https://github.com/apache/spark/pull/33008
> >> > > >
> >> > > > Any feedback is more than welcome.
> >> > > >
> >> > > > Liang-Chi was kind enough to shepherd this effort. Thanks!
> >> > > >
> >> > > > - Anton
> >> > > >
> >> > >
> >> > > ---------------------------------------------------------------------
> >> > > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >> > >
> >> > >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

Posted by "L. C. Hsieh" <vi...@gmail.com>.

+1 for the idea to commit the work earlier.

I think we will raise the voting soon. Once it is passed, we can submit the PRs.

What do you think? Anton.

On Mon, Nov 1, 2021 at 7:59 AM Wenchen Fan <cl...@gmail.com> wrote:
>
> The general idea looks great. This is indeed a complicated API and we probably need more time to evaluate the API design. It's better to commit this work earlier so that we have more time to verify it before the 3.3 release. Maybe we can commit the group-based API first, then the delta-based one, as the delta-based API is significantly more convoluted.
>
> On Thu, Oct 28, 2021 at 12:53 AM L. C. Hsieh <vi...@apache.org> wrote:
>>
>>
>> Thanks for the initial feedback.
>>
>> I think previously the community is busy on the works related to Spark 3.2 release.
>> As 3.2 release was done, I'd like to bring this up to the surface again and seek for more discussion and feedback.
>>
>> Thanks.
>>
>> On 2021/06/25 15:49:49, huaxin gao <hu...@gmail.com> wrote:
>> > I took a quick look at the PR and it looks like a great feature to have. It
>> > provides unified APIs for data sources to perform the commonly used
>> > operations easily and efficiently, so users don't have to implement
>> > customer extensions on their own. Thanks Anton for the work!
>> >
>> > On Thu, Jun 24, 2021 at 9:42 PM L. C. Hsieh <vi...@apache.org> wrote:
>> >
>> > > Thanks Anton. I'm voluntarily to be the shepherd of the SPIP. This is also
>> > > my first time to shepherd a SPIP, so please let me know if anything I can
>> > > improve.
>> > >
>> > > This looks great features and the rationale claimed by the proposal makes
>> > > sense. These operations are getting more common and more important in big
>> > > data workloads. Instead of building custom extensions by individual data
>> > > sources, it makes more sense to support the API from Spark.
>> > >
>> > > Please provide your thoughts about the proposal and the design. Appreciate
>> > > your feedback. Thank you!
>> > >
>> > > On 2021/06/24 23:53:32, Anton Okolnychyi <ao...@gmail.com> wrote:
>> > > > Hey everyone,
>> > > >
>> > > > I'd like to start a discussion on adding support for executing row-level
>> > > > operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
>> > > > execution should be the same across data sources and the best way to do
>> > > > that is to implement it in Spark.
>> > > >
>> > > > Right now, Spark can only parse and to some extent analyze DELETE,
>> > > UPDATE,
>> > > > MERGE commands. Data sources that support row-level changes have to build
>> > > > custom Spark extensions to execute such statements. The goal of this
>> > > effort
>> > > > is to come up with a flexible and easy-to-use API that will work across
>> > > > data sources.
>> > > >
>> > > > Design doc:
>> > > >
>> > > https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
>> > > >
>> > > > PR for handling DELETE statements:
>> > > > https://github.com/apache/spark/pull/33008
>> > > >
>> > > > Any feedback is more than welcome.
>> > > >
>> > > > Liang-Chi was kind enough to shepherd this effort. Thanks!
>> > > >
>> > > > - Anton
>> > > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> > >
>> > >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

Posted by Wenchen Fan <cl...@gmail.com>.

The general idea looks great. This is indeed a complicated API and we
probably need more time to evaluate the API design. It's better to commit
this work earlier so that we have more time to verify it before the 3.3
release. Maybe we can commit the group-based API first, then the
delta-based one, as the delta-based API is significantly more convoluted.

On Thu, Oct 28, 2021 at 12:53 AM L. C. Hsieh <vi...@apache.org> wrote:

>
> Thanks for the initial feedback.
>
> I think previously the community is busy on the works related to Spark 3.2
> release.
> As 3.2 release was done, I'd like to bring this up to the surface again
> and seek for more discussion and feedback.
>
> Thanks.
>
> On 2021/06/25 15:49:49, huaxin gao <hu...@gmail.com> wrote:
> > I took a quick look at the PR and it looks like a great feature to have.
> It
> > provides unified APIs for data sources to perform the commonly used
> > operations easily and efficiently, so users don't have to implement
> > customer extensions on their own. Thanks Anton for the work!
> >
> > On Thu, Jun 24, 2021 at 9:42 PM L. C. Hsieh <vi...@apache.org> wrote:
> >
> > > Thanks Anton. I'm voluntarily to be the shepherd of the SPIP. This is
> also
> > > my first time to shepherd a SPIP, so please let me know if anything I
> can
> > > improve.
> > >
> > > This looks great features and the rationale claimed by the proposal
> makes
> > > sense. These operations are getting more common and more important in
> big
> > > data workloads. Instead of building custom extensions by individual
> data
> > > sources, it makes more sense to support the API from Spark.
> > >
> > > Please provide your thoughts about the proposal and the design.
> Appreciate
> > > your feedback. Thank you!
> > >
> > > On 2021/06/24 23:53:32, Anton Okolnychyi <ao...@gmail.com>
> wrote:
> > > > Hey everyone,
> > > >
> > > > I'd like to start a discussion on adding support for executing
> row-level
> > > > operations such as DELETE, UPDATE, MERGE for v2 tables
> (SPARK-35801). The
> > > > execution should be the same across data sources and the best way to
> do
> > > > that is to implement it in Spark.
> > > >
> > > > Right now, Spark can only parse and to some extent analyze DELETE,
> > > UPDATE,
> > > > MERGE commands. Data sources that support row-level changes have to
> build
> > > > custom Spark extensions to execute such statements. The goal of this
> > > effort
> > > > is to come up with a flexible and easy-to-use API that will work
> across
> > > > data sources.
> > > >
> > > > Design doc:
> > > >
> > >
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
> > > >
> > > > PR for handling DELETE statements:
> > > > https://github.com/apache/spark/pull/33008
> > > >
> > > > Any feedback is more than welcome.
> > > >
> > > > Liang-Chi was kind enough to shepherd this effort. Thanks!
> > > >
> > > > - Anton
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

Posted by "L. C. Hsieh" <vi...@apache.org>.

Thanks for the initial feedback.

I think previously the community is busy on the works related to Spark 3.2 release.
As 3.2 release was done, I'd like to bring this up to the surface again and seek for more discussion and feedback.

Thanks.

On 2021/06/25 15:49:49, huaxin gao <hu...@gmail.com> wrote: 
> I took a quick look at the PR and it looks like a great feature to have. It
> provides unified APIs for data sources to perform the commonly used
> operations easily and efficiently, so users don't have to implement
> customer extensions on their own. Thanks Anton for the work!
> 
> On Thu, Jun 24, 2021 at 9:42 PM L. C. Hsieh <vi...@apache.org> wrote:
> 
> > Thanks Anton. I'm voluntarily to be the shepherd of the SPIP. This is also
> > my first time to shepherd a SPIP, so please let me know if anything I can
> > improve.
> >
> > This looks great features and the rationale claimed by the proposal makes
> > sense. These operations are getting more common and more important in big
> > data workloads. Instead of building custom extensions by individual data
> > sources, it makes more sense to support the API from Spark.
> >
> > Please provide your thoughts about the proposal and the design. Appreciate
> > your feedback. Thank you!
> >
> > On 2021/06/24 23:53:32, Anton Okolnychyi <ao...@gmail.com> wrote:
> > > Hey everyone,
> > >
> > > I'd like to start a discussion on adding support for executing row-level
> > > operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
> > > execution should be the same across data sources and the best way to do
> > > that is to implement it in Spark.
> > >
> > > Right now, Spark can only parse and to some extent analyze DELETE,
> > UPDATE,
> > > MERGE commands. Data sources that support row-level changes have to build
> > > custom Spark extensions to execute such statements. The goal of this
> > effort
> > > is to come up with a flexible and easy-to-use API that will work across
> > > data sources.
> > >
> > > Design doc:
> > >
> > https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
> > >
> > > PR for handling DELETE statements:
> > > https://github.com/apache/spark/pull/33008
> > >
> > > Any feedback is more than welcome.
> > >
> > > Liang-Chi was kind enough to shepherd this effort. Thanks!
> > >
> > > - Anton
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

Posted by huaxin gao <hu...@gmail.com>.

I took a quick look at the PR and it looks like a great feature to have. It
provides unified APIs for data sources to perform the commonly used
operations easily and efficiently, so users don't have to implement
customer extensions on their own. Thanks Anton for the work!

On Thu, Jun 24, 2021 at 9:42 PM L. C. Hsieh <vi...@apache.org> wrote:

> Thanks Anton. I'm voluntarily to be the shepherd of the SPIP. This is also
> my first time to shepherd a SPIP, so please let me know if anything I can
> improve.
>
> This looks great features and the rationale claimed by the proposal makes
> sense. These operations are getting more common and more important in big
> data workloads. Instead of building custom extensions by individual data
> sources, it makes more sense to support the API from Spark.
>
> Please provide your thoughts about the proposal and the design. Appreciate
> your feedback. Thank you!
>
> On 2021/06/24 23:53:32, Anton Okolnychyi <ao...@gmail.com> wrote:
> > Hey everyone,
> >
> > I'd like to start a discussion on adding support for executing row-level
> > operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
> > execution should be the same across data sources and the best way to do
> > that is to implement it in Spark.
> >
> > Right now, Spark can only parse and to some extent analyze DELETE,
> UPDATE,
> > MERGE commands. Data sources that support row-level changes have to build
> > custom Spark extensions to execute such statements. The goal of this
> effort
> > is to come up with a flexible and easy-to-use API that will work across
> > data sources.
> >
> > Design doc:
> >
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
> >
> > PR for handling DELETE statements:
> > https://github.com/apache/spark/pull/33008
> >
> > Any feedback is more than welcome.
> >
> > Liang-Chi was kind enough to shepherd this effort. Thanks!
> >
> > - Anton
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

Posted by "L. C. Hsieh" <vi...@apache.org>.

Thanks Anton. I'm voluntarily to be the shepherd of the SPIP. This is also my first time to shepherd a SPIP, so please let me know if anything I can improve.

This looks great features and the rationale claimed by the proposal makes sense. These operations are getting more common and more important in big data workloads. Instead of building custom extensions by individual data sources, it makes more sense to support the API from Spark.

Please provide your thoughts about the proposal and the design. Appreciate your feedback. Thank you!

On 2021/06/24 23:53:32, Anton Okolnychyi <ao...@gmail.com> wrote: 
> Hey everyone,
> 
> I'd like to start a discussion on adding support for executing row-level
> operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
> execution should be the same across data sources and the best way to do
> that is to implement it in Spark.
> 
> Right now, Spark can only parse and to some extent analyze DELETE, UPDATE,
> MERGE commands. Data sources that support row-level changes have to build
> custom Spark extensions to execute such statements. The goal of this effort
> is to come up with a flexible and easy-to-use API that will work across
> data sources.
> 
> Design doc:
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
> 
> PR for handling DELETE statements:
> https://github.com/apache/spark/pull/33008
> 
> Any feedback is more than welcome.
> 
> Liang-Chi was kind enough to shepherd this effort. Thanks!
> 
> - Anton
> 

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

Posted by Jungtaek Lim <ka...@gmail.com>.

Meta question: this doesn't target Spark 3.2, right? Many folks have been
working on branch cut for Spark 3.2, so might be less active to jump in new
feature proposals right now.

On Fri, Jun 25, 2021 at 9:00 AM Holden Karau <ho...@pigscanfly.ca> wrote:

> I took an initial look at the PRs this morning and I’ll go through the
> design doc in more detail but I think these features look great. It’s
> especially important with the CA regulation changes to make this easier for
> folks to implement.
>
> On Thu, Jun 24, 2021 at 4:54 PM Anton Okolnychyi <ao...@gmail.com>
> wrote:
>
>> Hey everyone,
>>
>> I'd like to start a discussion on adding support for executing row-level
>> operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
>> execution should be the same across data sources and the best way to do
>> that is to implement it in Spark.
>>
>> Right now, Spark can only parse and to some extent analyze DELETE,
>> UPDATE, MERGE commands. Data sources that support row-level changes have to
>> build custom Spark extensions to execute such statements. The goal of this
>> effort is to come up with a flexible and easy-to-use API that will work
>> across data sources.
>>
>> Design doc:
>>
>> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
>>
>> PR for handling DELETE statements:
>> https://github.com/apache/spark/pull/33008
>>
>> Any feedback is more than welcome.
>>
>> Liang-Chi was kind enough to shepherd this effort. Thanks!
>>
>> - Anton
>>
>>
>>
>>
>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Re: [DISCUSS] SPIP: Row-level operations in Data Source V2

Posted by Holden Karau <ho...@pigscanfly.ca>.

I took an initial look at the PRs this morning and I’ll go through the
design doc in more detail but I think these features look great. It’s
especially important with the CA regulation changes to make this easier for
folks to implement.

On Thu, Jun 24, 2021 at 4:54 PM Anton Okolnychyi <ao...@gmail.com>
wrote:

> Hey everyone,
>
> I'd like to start a discussion on adding support for executing row-level
> operations such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
> execution should be the same across data sources and the best way to do
> that is to implement it in Spark.
>
> Right now, Spark can only parse and to some extent analyze DELETE, UPDATE,
> MERGE commands. Data sources that support row-level changes have to build
> custom Spark extensions to execute such statements. The goal of this effort
> is to come up with a flexible and easy-to-use API that will work across
> data sources.
>
> Design doc:
>
> https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60/
>
> PR for handling DELETE statements:
> https://github.com/apache/spark/pull/33008
>
> Any feedback is more than welcome.
>
> Liang-Chi was kind enough to shepherd this effort. Thanks!
>
> - Anton
>
>
>
>
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau