You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Chen Song <ch...@gmail.com> on 2020/09/14 16:21:58 UTC

Re: Iceberg V2 Spec

I want to follow up on this. Is there an official consolidated design
doc/proposal (even wip) on V2 spec?

I saw Streaming CDC in Iceberg
<https://docs.google.com/document/d/1bBKDD4l-pQFXaMb4nOyVK-Sl3N2NTTG37uOCQx8rKVc/edit#heading=h.2u29lq1ekp5r>
in
a few update emails related, but it only covers one part.

Chen

On Thu, Jul 2, 2020 at 9:53 PM OpenInx <op...@gmail.com> wrote:

> Sounds good to me.
>
> Thanks.
>
> On Fri, Jul 3, 2020 at 12:58 AM Ryan Blue <rb...@netflix.com> wrote:
>
>> I'd like to get 0.9.0 out as soon as possible. I expect to get an early
>> RC out next week, once we have more tests committed. That way, people can
>> start trying it out and reporting back where it doesn't work.
>>
>> I'd rather not block 0.9.0 to wait on Flink connector components. There's
>> still a lot of work to get in, so I think it would be good to keep these
>> decoupled. That said, I think it would make sense to have a release once
>> the Flink connector is ready, just like we would do for Spark 3 support.
>>
>> Does that sound reasonable?
>>
>> On Wed, Jul 1, 2020 at 7:39 PM OpenInx <op...@gmail.com> wrote:
>>
>>> Hi Ryan:
>>>
>>> Just curious when do we plan to release 0.9.0 ?  I expect that the flink
>>> connector could be included in release 0.9.0.
>>>
>>> Thanks.
>>>
>>> On Thu, Jul 2, 2020 at 12:14 AM Ryan Blue <rb...@netflix.com.invalid>
>>> wrote:
>>>
>>>> Hi Chen,
>>>>
>>>> Right now, the main parts of the v2 spec are the addition of sequence
>>>> numbers and delete files. We're also making some other requirements more
>>>> strict, but those are mainly cleaning up problems and not related to
>>>> row-level deletes.
>>>>
>>>> Upserts would be encoded as a delete and an insert. Deletes are stored
>>>> in delete files, and inserts are normal data files. Delete files are valid
>>>> within a partition, and apply to all data files with the same or lower
>>>> sequence number.
>>>>
>>>> I'm planning on updating what's currently in the spec now that we have
>>>> sequence numbers and delete file metadata committed in master, but right
>>>> now I'm working on getting the 0.9.0 release out with support for Spark 3.
>>>> The documentation should be coming in the next couple of weeks.
>>>>
>>>> rb
>>>>
>>>> On Wed, Jul 1, 2020 at 6:28 AM Chen Song <ch...@gmail.com>
>>>> wrote:
>>>>
>>>>> I saw Table Spec V2
>>>>> <https://iceberg.apache.org/spec/#version-2-row-level-deletes> was
>>>>> mentioned in the official iceberg doc. I know it is incomplete and wip. Is
>>>>> there any to-be-reviewed or proposed version for public view? I am
>>>>> interested to understand how row level upserts are supported?
>>>>>
>>>>> Thanks
>>>>> --
>>>>> Chen Song
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Chen Song

Re: Iceberg V2 Spec

Posted by OpenInx <op...@gmail.com>.
Thanks for the great work from Ryan,  I would be glad to be the reviewer !

On Sat, Sep 19, 2020 at 7:37 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

> I'm working on an update to the spec. We've completed the Java library
> implementation end-to-end, so now we have working code that will be
> released in 0.10.0. Next step is the spec update to document everything now
> that we're confident that it works as expected.
>
> Look for a PR in the next few days. It would be great to have more
> reviewers!
>
> rb
>
> On Mon, Sep 14, 2020 at 9:22 AM Chen Song <ch...@gmail.com> wrote:
>
>> I want to follow up on this. Is there an official consolidated design
>> doc/proposal (even wip) on V2 spec?
>>
>> I saw Streaming CDC in Iceberg
>> <https://docs.google.com/document/d/1bBKDD4l-pQFXaMb4nOyVK-Sl3N2NTTG37uOCQx8rKVc/edit#heading=h.2u29lq1ekp5r> in
>> a few update emails related, but it only covers one part.
>>
>> Chen
>>
>> On Thu, Jul 2, 2020 at 9:53 PM OpenInx <op...@gmail.com> wrote:
>>
>>> Sounds good to me.
>>>
>>> Thanks.
>>>
>>> On Fri, Jul 3, 2020 at 12:58 AM Ryan Blue <rb...@netflix.com> wrote:
>>>
>>>> I'd like to get 0.9.0 out as soon as possible. I expect to get an early
>>>> RC out next week, once we have more tests committed. That way, people can
>>>> start trying it out and reporting back where it doesn't work.
>>>>
>>>> I'd rather not block 0.9.0 to wait on Flink connector components.
>>>> There's still a lot of work to get in, so I think it would be good to keep
>>>> these decoupled. That said, I think it would make sense to have a release
>>>> once the Flink connector is ready, just like we would do for Spark 3
>>>> support.
>>>>
>>>> Does that sound reasonable?
>>>>
>>>> On Wed, Jul 1, 2020 at 7:39 PM OpenInx <op...@gmail.com> wrote:
>>>>
>>>>> Hi Ryan:
>>>>>
>>>>> Just curious when do we plan to release 0.9.0 ?  I expect that the
>>>>> flink connector could be included in release 0.9.0.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> On Thu, Jul 2, 2020 at 12:14 AM Ryan Blue <rb...@netflix.com.invalid>
>>>>> wrote:
>>>>>
>>>>>> Hi Chen,
>>>>>>
>>>>>> Right now, the main parts of the v2 spec are the addition of sequence
>>>>>> numbers and delete files. We're also making some other requirements more
>>>>>> strict, but those are mainly cleaning up problems and not related to
>>>>>> row-level deletes.
>>>>>>
>>>>>> Upserts would be encoded as a delete and an insert. Deletes are
>>>>>> stored in delete files, and inserts are normal data files. Delete files are
>>>>>> valid within a partition, and apply to all data files with the same or
>>>>>> lower sequence number.
>>>>>>
>>>>>> I'm planning on updating what's currently in the spec now that we
>>>>>> have sequence numbers and delete file metadata committed in master, but
>>>>>> right now I'm working on getting the 0.9.0 release out with support for
>>>>>> Spark 3. The documentation should be coming in the next couple of weeks.
>>>>>>
>>>>>> rb
>>>>>>
>>>>>> On Wed, Jul 1, 2020 at 6:28 AM Chen Song <ch...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I saw Table Spec V2
>>>>>>> <https://iceberg.apache.org/spec/#version-2-row-level-deletes> was
>>>>>>> mentioned in the official iceberg doc. I know it is incomplete and wip. Is
>>>>>>> there any to-be-reviewed or proposed version for public view? I am
>>>>>>> interested to understand how row level upserts are supported?
>>>>>>>
>>>>>>> Thanks
>>>>>>> --
>>>>>>> Chen Song
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Software Engineer
>>>>>> Netflix
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>
>> --
>> Chen Song
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Iceberg V2 Spec

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I'm working on an update to the spec. We've completed the Java library
implementation end-to-end, so now we have working code that will be
released in 0.10.0. Next step is the spec update to document everything now
that we're confident that it works as expected.

Look for a PR in the next few days. It would be great to have more
reviewers!

rb

On Mon, Sep 14, 2020 at 9:22 AM Chen Song <ch...@gmail.com> wrote:

> I want to follow up on this. Is there an official consolidated design
> doc/proposal (even wip) on V2 spec?
>
> I saw Streaming CDC in Iceberg
> <https://docs.google.com/document/d/1bBKDD4l-pQFXaMb4nOyVK-Sl3N2NTTG37uOCQx8rKVc/edit#heading=h.2u29lq1ekp5r> in
> a few update emails related, but it only covers one part.
>
> Chen
>
> On Thu, Jul 2, 2020 at 9:53 PM OpenInx <op...@gmail.com> wrote:
>
>> Sounds good to me.
>>
>> Thanks.
>>
>> On Fri, Jul 3, 2020 at 12:58 AM Ryan Blue <rb...@netflix.com> wrote:
>>
>>> I'd like to get 0.9.0 out as soon as possible. I expect to get an early
>>> RC out next week, once we have more tests committed. That way, people can
>>> start trying it out and reporting back where it doesn't work.
>>>
>>> I'd rather not block 0.9.0 to wait on Flink connector components.
>>> There's still a lot of work to get in, so I think it would be good to keep
>>> these decoupled. That said, I think it would make sense to have a release
>>> once the Flink connector is ready, just like we would do for Spark 3
>>> support.
>>>
>>> Does that sound reasonable?
>>>
>>> On Wed, Jul 1, 2020 at 7:39 PM OpenInx <op...@gmail.com> wrote:
>>>
>>>> Hi Ryan:
>>>>
>>>> Just curious when do we plan to release 0.9.0 ?  I expect that the
>>>> flink connector could be included in release 0.9.0.
>>>>
>>>> Thanks.
>>>>
>>>> On Thu, Jul 2, 2020 at 12:14 AM Ryan Blue <rb...@netflix.com.invalid>
>>>> wrote:
>>>>
>>>>> Hi Chen,
>>>>>
>>>>> Right now, the main parts of the v2 spec are the addition of sequence
>>>>> numbers and delete files. We're also making some other requirements more
>>>>> strict, but those are mainly cleaning up problems and not related to
>>>>> row-level deletes.
>>>>>
>>>>> Upserts would be encoded as a delete and an insert. Deletes are stored
>>>>> in delete files, and inserts are normal data files. Delete files are valid
>>>>> within a partition, and apply to all data files with the same or lower
>>>>> sequence number.
>>>>>
>>>>> I'm planning on updating what's currently in the spec now that we have
>>>>> sequence numbers and delete file metadata committed in master, but right
>>>>> now I'm working on getting the 0.9.0 release out with support for Spark 3.
>>>>> The documentation should be coming in the next couple of weeks.
>>>>>
>>>>> rb
>>>>>
>>>>> On Wed, Jul 1, 2020 at 6:28 AM Chen Song <ch...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I saw Table Spec V2
>>>>>> <https://iceberg.apache.org/spec/#version-2-row-level-deletes> was
>>>>>> mentioned in the official iceberg doc. I know it is incomplete and wip. Is
>>>>>> there any to-be-reviewed or proposed version for public view? I am
>>>>>> interested to understand how row level upserts are supported?
>>>>>>
>>>>>> Thanks
>>>>>> --
>>>>>> Chen Song
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>> Software Engineer
>>>>> Netflix
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Chen Song
>
>

-- 
Ryan Blue
Software Engineer
Netflix