You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Saisai Shao <sa...@gmail.com> on 2019/08/07 11:24:50 UTC

Any plan to support update, delete and others

Hi team,

Delta lake project recently announced version 0.3.0, which added several
new features in API level, like update, delete, merge, vacuum, etc. May I
ask is there any plan to add such features in Iceberg?

Thanks
Saisai

Re: Any plan to support update, delete and others

Posted by Saisai Shao <sa...@gmail.com>.
Got it. Thanks a lot for the reply.

Best regards,
Saisai

Ryan Blue <rb...@netflix.com> 于2019年8月9日周五 上午6:36写道:

> We've actually been doing all of our API work in upstream Spark instead of
> adding APIs to Iceberg for row-level data manipulation. That's why I'm
> involved in the DataSourceV2 work.
>
> I think for Delta, this is probably an effort to get some features out
> earlier. I think that's easier for Delta because it deeply integrates with
> Spark and adds new plans -- last I checked, some of the project had to be
> located in Spark packages because they use internal classes.
>
> I think that this API will probably be contributed to Spark itself when
> Spark supports update and merge operations. That's probably a good time for
> Iceberg to pick it up because Iceberg still needs to update the format for
> those.
>
> Otherwise, Spark supports the latest features available in DataSourceV2,
> and will continue to. In fact, we're adding features to DSv2 based on what
> we've built internally at Netflix to support Iceberg.
>
> On Wed, Aug 7, 2019 at 7:03 PM Saisai Shao <sa...@gmail.com> wrote:
>
>> Thanks a lot Ryan, that would be very helpful!
>>
>> Delta lake recently adds support for such operations in API level (
>> https://github.com/delta-io/delta/blob/master/src/main/scala/io/delta/tables/DeltaTable.scala).
>> I was thinking that in the API level the goal of Iceberg is similar, maybe
>> we could take that as a reference.
>>
>> Besides directly using Iceberg API to manipulate data is not so
>> straightforward, so it would be great if we could also have a DF API/SQL
>> support later on.
>>
>> Best regards
>> Saisai
>>
>> Ryan Blue <rb...@netflix.com> 于2019年8月8日周四 上午1:22写道:
>>
>>> Hi Saisai,
>>>
>>> We are working on adding row-level delete support to Iceberg, where the
>>> deletes are applied when data is read. We’ve had a few good design
>>> discussions and have come up with a good way to integrate these into the
>>> format. Erik has written a good document on it:
>>> https://docs.google.com/document/d/1FMKh_SQ6xSUUmoCA8LerTkzIxDUN5JbStQp5Hzot4eo/edit#heading=h.p74qmh3a6ets
>>>
>>> I’ve also started a milestone to track this work:
>>> https://github.com/apache/incubator-iceberg/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Row-level+Delete%22
>>>
>>> That’s assuming that you’re talking about row-level deletes. Iceberg
>>> already supports file-level delete, overwrite, etc.
>>>
>>> Iceberg also already supports a vacuum operation using ExpireSnapshots
>>> <http://iceberg.apache.org/javadoc/master/index.html?org/apache/iceberg/ExpireSnapshots.html>.
>>> But, Spark (and other engines) don’t have a way to call this yet. Same for MERGE
>>> INTO, open source Spark doesn’t support the operation yet. We’re also
>>> working on building support into Spark as we go.
>>>
>>> I hope that helps!
>>>
>>> On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao <sa...@gmail.com>
>>> wrote:
>>>
>>>> Hi team,
>>>>
>>>> Delta lake project recently announced version 0.3.0, which added
>>>> several new features in API level, like update, delete, merge, vacuum, etc.
>>>> May I ask is there any plan to add such features in Iceberg?
>>>>
>>>> Thanks
>>>> Saisai
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Any plan to support update, delete and others

Posted by Ryan Blue <rb...@netflix.com>.
We've actually been doing all of our API work in upstream Spark instead of
adding APIs to Iceberg for row-level data manipulation. That's why I'm
involved in the DataSourceV2 work.

I think for Delta, this is probably an effort to get some features out
earlier. I think that's easier for Delta because it deeply integrates with
Spark and adds new plans -- last I checked, some of the project had to be
located in Spark packages because they use internal classes.

I think that this API will probably be contributed to Spark itself when
Spark supports update and merge operations. That's probably a good time for
Iceberg to pick it up because Iceberg still needs to update the format for
those.

Otherwise, Spark supports the latest features available in DataSourceV2,
and will continue to. In fact, we're adding features to DSv2 based on what
we've built internally at Netflix to support Iceberg.

On Wed, Aug 7, 2019 at 7:03 PM Saisai Shao <sa...@gmail.com> wrote:

> Thanks a lot Ryan, that would be very helpful!
>
> Delta lake recently adds support for such operations in API level (
> https://github.com/delta-io/delta/blob/master/src/main/scala/io/delta/tables/DeltaTable.scala).
> I was thinking that in the API level the goal of Iceberg is similar, maybe
> we could take that as a reference.
>
> Besides directly using Iceberg API to manipulate data is not so
> straightforward, so it would be great if we could also have a DF API/SQL
> support later on.
>
> Best regards
> Saisai
>
> Ryan Blue <rb...@netflix.com> 于2019年8月8日周四 上午1:22写道:
>
>> Hi Saisai,
>>
>> We are working on adding row-level delete support to Iceberg, where the
>> deletes are applied when data is read. We’ve had a few good design
>> discussions and have come up with a good way to integrate these into the
>> format. Erik has written a good document on it:
>> https://docs.google.com/document/d/1FMKh_SQ6xSUUmoCA8LerTkzIxDUN5JbStQp5Hzot4eo/edit#heading=h.p74qmh3a6ets
>>
>> I’ve also started a milestone to track this work:
>> https://github.com/apache/incubator-iceberg/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Row-level+Delete%22
>>
>> That’s assuming that you’re talking about row-level deletes. Iceberg
>> already supports file-level delete, overwrite, etc.
>>
>> Iceberg also already supports a vacuum operation using ExpireSnapshots
>> <http://iceberg.apache.org/javadoc/master/index.html?org/apache/iceberg/ExpireSnapshots.html>.
>> But, Spark (and other engines) don’t have a way to call this yet. Same for MERGE
>> INTO, open source Spark doesn’t support the operation yet. We’re also
>> working on building support into Spark as we go.
>>
>> I hope that helps!
>>
>> On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao <sa...@gmail.com>
>> wrote:
>>
>>> Hi team,
>>>
>>> Delta lake project recently announced version 0.3.0, which added several
>>> new features in API level, like update, delete, merge, vacuum, etc. May I
>>> ask is there any plan to add such features in Iceberg?
>>>
>>> Thanks
>>> Saisai
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Any plan to support update, delete and others

Posted by Saisai Shao <sa...@gmail.com>.
Thanks a lot Ryan, that would be very helpful!

Delta lake recently adds support for such operations in API level (
https://github.com/delta-io/delta/blob/master/src/main/scala/io/delta/tables/DeltaTable.scala).
I was thinking that in the API level the goal of Iceberg is similar, maybe
we could take that as a reference.

Besides directly using Iceberg API to manipulate data is not so
straightforward, so it would be great if we could also have a DF API/SQL
support later on.

Best regards
Saisai

Ryan Blue <rb...@netflix.com> 于2019年8月8日周四 上午1:22写道:

> Hi Saisai,
>
> We are working on adding row-level delete support to Iceberg, where the
> deletes are applied when data is read. We’ve had a few good design
> discussions and have come up with a good way to integrate these into the
> format. Erik has written a good document on it:
> https://docs.google.com/document/d/1FMKh_SQ6xSUUmoCA8LerTkzIxDUN5JbStQp5Hzot4eo/edit#heading=h.p74qmh3a6ets
>
> I’ve also started a milestone to track this work:
> https://github.com/apache/incubator-iceberg/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Row-level+Delete%22
>
> That’s assuming that you’re talking about row-level deletes. Iceberg
> already supports file-level delete, overwrite, etc.
>
> Iceberg also already supports a vacuum operation using ExpireSnapshots
> <http://iceberg.apache.org/javadoc/master/index.html?org/apache/iceberg/ExpireSnapshots.html>.
> But, Spark (and other engines) don’t have a way to call this yet. Same for MERGE
> INTO, open source Spark doesn’t support the operation yet. We’re also
> working on building support into Spark as we go.
>
> I hope that helps!
>
> On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao <sa...@gmail.com> wrote:
>
>> Hi team,
>>
>> Delta lake project recently announced version 0.3.0, which added several
>> new features in API level, like update, delete, merge, vacuum, etc. May I
>> ask is there any plan to add such features in Iceberg?
>>
>> Thanks
>> Saisai
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Any plan to support update, delete and others

Posted by Ryan Blue <rb...@netflix.com>.
Hi Saisai,

We are working on adding row-level delete support to Iceberg, where the
deletes are applied when data is read. We’ve had a few good design
discussions and have come up with a good way to integrate these into the
format. Erik has written a good document on it:
https://docs.google.com/document/d/1FMKh_SQ6xSUUmoCA8LerTkzIxDUN5JbStQp5Hzot4eo/edit#heading=h.p74qmh3a6ets

I’ve also started a milestone to track this work:
https://github.com/apache/incubator-iceberg/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Row-level+Delete%22

That’s assuming that you’re talking about row-level deletes. Iceberg
already supports file-level delete, overwrite, etc.

Iceberg also already supports a vacuum operation using ExpireSnapshots
<http://iceberg.apache.org/javadoc/master/index.html?org/apache/iceberg/ExpireSnapshots.html>.
But, Spark (and other engines) don’t have a way to call this yet. Same
for MERGE
INTO, open source Spark doesn’t support the operation yet. We’re also
working on building support into Spark as we go.

I hope that helps!

On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao <sa...@gmail.com> wrote:

> Hi team,
>
> Delta lake project recently announced version 0.3.0, which added several
> new features in API level, like update, delete, merge, vacuum, etc. May I
> ask is there any plan to add such features in Iceberg?
>
> Thanks
> Saisai
>


-- 
Ryan Blue
Software Engineer
Netflix