You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iceberg.apache.org by Ryan Blue <rb...@netflix.com.INVALID> on 2020/03/03 00:52:53 UTC

Re: upsert base on copy on write mode

It should be possible to build an implementation of MERGE INTO in Spark
now, using the validation that Anton added in #351
<https://github.com/apache/incubator-iceberg/pull/351>. I think he can
provide some more context.

On Wed, Feb 26, 2020 at 7:42 AM Junjie Chen <ch...@gmail.com>
wrote:

> Hi devs
>
> We are working on row level delete milestone for upsert feature in merge
> on read mode. In the meantime, I think it may be useful to have a copy on
> write implementation. For example, we can implement upsert with spark, so
> that we can finalize the common APIs that upsert may need and also we could
> discover some capabilities that spark should provide. What do you think?
>
> --
> Best Regards
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: upsert base on copy on write mode

Posted by 俊杰陈 <cj...@gmail.com>.

Typo: Maybe the discussion is very clear before.-> Maybe the
discussion is NOT very clear before.

Thanks OpenInx. I was thinking to add an interface DeleteRows and an
API to table so that one could do things like:

DeleteRows deleteRows = table.newRowLevelDelete();
deleteRows.deleteByFilter(Filter filter);
deleteRows.commit();

On Tue, Mar 3, 2020 at 5:07 PM OpenInx <op...@gmail.com> wrote:
>
> I think we should abstract the API firstly, then implement the MOR.
> COW is also a necessary implementation, but it's easy to implement
> and no so urgent.
>
> On Tue, Mar 3, 2020 at 3:45 PM Junjie Chen <ch...@gmail.com> wrote:
>>
>> Thanks, Ryan
>>
>> Maybe the discussion is very clear before. Actually, we have built an internal implementation for update and delete via copy on write mode. Some others may also have their internal implementation as well. What I propose is to provide a general framework or APIs set that support both copy on write and merge on read, then people could share their COW implementation to community and prepare some job for MOR as well. For example, we could define row level update, mergeinto APIs and a table property indicates the underlying mode, then one could share implementation under the cow branch according to table property.
>>
>> There should have other ways to build the general framework, just want to know that do we want both COW and MOR implementation or just keep the MOR?
>>
>>
>> On Tue, Mar 3, 2020 at 8:53 AM Ryan Blue <rb...@netflix.com.invalid> wrote:
>>>
>>> It should be possible to build an implementation of MERGE INTO in Spark now, using the validation that Anton added in #351. I think he can provide some more context.
>>>
>>> On Wed, Feb 26, 2020 at 7:42 AM Junjie Chen <ch...@gmail.com> wrote:
>>>>
>>>> Hi devs
>>>>
>>>> We are working on row level delete milestone for upsert feature in merge on read mode. In the meantime, I think it may be useful to have a copy on write implementation. For example, we can implement upsert with spark, so that we can finalize the common APIs that upsert may need and also we could discover some capabilities that spark should provide. What do you think?
>>>>
>>>> --
>>>> Best Regards
>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>
>>
>>
>> --
>> Best Regards



-- 
Thanks & Best Regards

Re: upsert base on copy on write mode

Posted by OpenInx <op...@gmail.com>.

I think we should abstract the API firstly, then implement the MOR.
COW is also a necessary implementation, but it's easy to implement
and no so urgent.

On Tue, Mar 3, 2020 at 3:45 PM Junjie Chen <ch...@gmail.com> wrote:

> Thanks, Ryan
>
> Maybe the discussion is very clear before. Actually, we have built an
> internal implementation for update and delete via copy on write mode. Some
> others may also have their internal implementation as well. What I propose
> is to provide a general framework or APIs set that support both copy on
> write and merge on read, then people could share their COW implementation
> to community and prepare some job for MOR as well. For example, we could
> define row level update, mergeinto APIs and a table property indicates the
> underlying mode, then one could share implementation under the cow branch
> according to table property.
>
> There should have other ways to build the general framework, just want to
> know that do we want both COW and MOR implementation or just keep the MOR?
>
>
> On Tue, Mar 3, 2020 at 8:53 AM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> It should be possible to build an implementation of MERGE INTO in Spark
>> now, using the validation that Anton added in #351
>> <https://github.com/apache/incubator-iceberg/pull/351>. I think he can
>> provide some more context.
>>
>> On Wed, Feb 26, 2020 at 7:42 AM Junjie Chen <ch...@gmail.com>
>> wrote:
>>
>>> Hi devs
>>>
>>> We are working on row level delete milestone for upsert feature in merge
>>> on read mode. In the meantime, I think it may be useful to have a copy on
>>> write implementation. For example, we can implement upsert with spark, so
>>> that we can finalize the common APIs that upsert may need and also we could
>>> discover some capabilities that spark should provide. What do you think?
>>>
>>> --
>>> Best Regards
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>
> --
> Best Regards
>

Re: upsert base on copy on write mode

Posted by Junjie Chen <ch...@gmail.com>.

Thanks, Ryan

Maybe the discussion is very clear before. Actually, we have built an
internal implementation for update and delete via copy on write mode. Some
others may also have their internal implementation as well. What I propose
is to provide a general framework or APIs set that support both copy on
write and merge on read, then people could share their COW implementation
to community and prepare some job for MOR as well. For example, we could
define row level update, mergeinto APIs and a table property indicates the
underlying mode, then one could share implementation under the cow branch
according to table property.

There should have other ways to build the general framework, just want to
know that do we want both COW and MOR implementation or just keep the MOR?

On Tue, Mar 3, 2020 at 8:53 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

> It should be possible to build an implementation of MERGE INTO in Spark
> now, using the validation that Anton added in #351
> <https://github.com/apache/incubator-iceberg/pull/351>. I think he can
> provide some more context.
>
> On Wed, Feb 26, 2020 at 7:42 AM Junjie Chen <ch...@gmail.com>
> wrote:
>
>> Hi devs
>>
>> We are working on row level delete milestone for upsert feature in merge
>> on read mode. In the meantime, I think it may be useful to have a copy on
>> write implementation. For example, we can implement upsert with spark, so
>> that we can finalize the common APIs that upsert may need and also we could
>> discover some capabilities that spark should provide. What do you think?
>>
>> --
>> Best Regards
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

-- 
Best Regards