You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Ryan Blue <rb...@netflix.com.INVALID> on 2020/09/14 22:37:49 UTC

Iceberg sync notes - 9 September 2020

Hi everyone,

I just update the Iceberg sync doc
<https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.u83a18ycylzz>
with my notes. Feel free to add corrections or additional context!

There was quite a bit of discussion, so I want to highlight a few things
that we talked about for more discussion on the dev list:

1. 0.10.0 blocker issues
    - Java 11 flaky tests (Fixed in PR #1446
<https://github.com/apache/iceberg/pull/1446>)
    - Flink checkpoint Java serialization errors (PR #1438
<https://github.com/apache/iceberg/issues/1438>)
    - Probably will *not* wait for Hive projection
    - Please bring up any other blockers!
2. The general consensus was that adding a time offset parameter (PR #1368
<https://github.com/apache/iceberg/pull/1368>) is not a good solution.
Instead we should consider using hourly partitioning or adding custom
partition functions.
3. We discussed trying to make snapshot timestamps monotonically
increasing, but though that it was probably not worth pursuing (already
mentioned on the dev list thread).

rb

-- 
Ryan Blue
Software Engineer
Netflix

Re: Iceberg sync notes - 9 September 2020

Posted by Owen O'Malley <ow...@gmail.com>.
As I mentioned in the meetup, ORC 1.6.4
<https://orc.apache.org/news/2020/09/14/ORC-1.6.4/> was pending and has
been released. It should be available on Maven central tomorrow.

.. Owen

On Mon, Sep 14, 2020 at 10:38 PM Ryan Blue <rb...@netflix.com.invalid>
wrote:

> Hi everyone,
>
> I just update the Iceberg sync doc
> <https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.u83a18ycylzz>
> with my notes. Feel free to add corrections or additional context!
>
> There was quite a bit of discussion, so I want to highlight a few things
> that we talked about for more discussion on the dev list:
>
> 1. 0.10.0 blocker issues
>     - Java 11 flaky tests (Fixed in PR #1446
> <https://github.com/apache/iceberg/pull/1446>)
>     - Flink checkpoint Java serialization errors (PR #1438
> <https://github.com/apache/iceberg/issues/1438>)
>     - Probably will *not* wait for Hive projection
>     - Please bring up any other blockers!
> 2. The general consensus was that adding a time offset parameter (PR #1368
> <https://github.com/apache/iceberg/pull/1368>) is not a good solution.
> Instead we should consider using hourly partitioning or adding custom
> partition functions.
> 3. We discussed trying to make snapshot timestamps monotonically
> increasing, but though that it was probably not worth pursuing (already
> mentioned on the dev list thread).
>
> rb
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Iceberg sync notes - 9 September 2020

Posted by Mass Dosage <ma...@gmail.com>.
I'm fine with not waiting for Hive projection. What is in master now is
enough to do an end-to-end Hive read, I'd prefer to have that out there
sooner so we can start trying it out as opposed to delaying this release
for the projection.

Thanks,

Adrian

On Mon, 14 Sep 2020 at 23:38, Ryan Blue <rb...@netflix.com.invalid> wrote:

> Hi everyone,
>
> I just update the Iceberg sync doc
> <https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#heading=h.u83a18ycylzz>
> with my notes. Feel free to add corrections or additional context!
>
> There was quite a bit of discussion, so I want to highlight a few things
> that we talked about for more discussion on the dev list:
>
> 1. 0.10.0 blocker issues
>     - Java 11 flaky tests (Fixed in PR #1446
> <https://github.com/apache/iceberg/pull/1446>)
>     - Flink checkpoint Java serialization errors (PR #1438
> <https://github.com/apache/iceberg/issues/1438>)
>     - Probably will *not* wait for Hive projection
>     - Please bring up any other blockers!
> 2. The general consensus was that adding a time offset parameter (PR #1368
> <https://github.com/apache/iceberg/pull/1368>) is not a good solution.
> Instead we should consider using hourly partitioning or adding custom
> partition functions.
> 3. We discussed trying to make snapshot timestamps monotonically
> increasing, but though that it was probably not worth pursuing (already
> mentioned on the dev list thread).
>
> rb
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>