You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iceberg.apache.org by Jungtaek Lim <ka...@gmail.com> on 2020/08/01 00:29:12 UTC

Re: [DISCUSS] 0.9.1 release

If we still have some more days I think #1280
<https://github.com/apache/iceberg/pull/1280>: "fix serialization issue in
BaseCombinedScanTask with Kyro" is a good candidate to be included. The bug
affects both Spark and Flink (according to #1279
<https://github.com/apache/iceberg/pull/1279>).

On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <bl...@apache.org> wrote:

> Hi everyone,
>
> We’ve accumulated a few bug fixes in the last couple of weeks and I think
> it might make sense to get some of them out in an 0.9.1 release since they
> make it harder to work with Iceberg. Here are the ones I know about:
>
>    - #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot
>    fails for binary and unary predicates
>    - #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import from
>    commons-compress causes query failures
>    - #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more
>    imports from non-Iceberg Guava
>    - #1283 <https://github.com/apache/iceberg/pull/1283>: Query
>    descriptions fail when IN predicates are pushed
>    - #1228 <https://github.com/apache/iceberg/pull/1228>: Data imports
>    fail when paths include whitespace
>    - #1194 <https://github.com/apache/iceberg/pull/1194>: USING should
>    set format when used in a CTAS command
>    - #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache
>    should not expire
>
> If there are no objections, I’ll get started and create a release branch.
> And please reply if there are other issues you’ve seen that should also be
> included in a patch release.
>
> rb
> --
> Ryan Blue
>

Re: [DISCUSS] 0.9.1 release

Posted by OpenInx <op...@gmail.com>.

> Does anyone know if we can recover existing data affected by it?

In the PR #1271, there are two data types which have correctness bugs:
decimal18 and timestampZone.

For decimal18,  we actually write the correct decimal value, but read it in
an incorrect way. saying the decimal(10,3) and value = 10.100, the orc
writer will store it in file as  101*10^(-1),  while before this patch we
will read it as 101*10^(-3).  If we use the scale=-1 to construct the
BigDecimal and then adjust to scale=3, then in theory we could still get
the correct decimal 10100*10^(-3).

For timestampZone,  I'd say that we've stored the wrong value in the file,
the error range between the written timestamp and correct timestamp should
be less than a few seconds.  Because here [1]  for negative value,  -5 / 2
= -2,  floorDiv(-5, 2) = -3,  the error range should be less than 1,  the
nanoseconds of timestamp is the value that is less than one second.  While
I did not get the way to recover the existing data.

1.
https://github.com/apache/iceberg/pull/1271/files#diff-5aa4840155ec70fdf7f725e122cde7b7L218



On Tue, Aug 4, 2020 at 3:08 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Yes, we should get #1269 into a patch release as well since it is a
> correctness bug.
>
> Does anyone know if we can recover existing data affected by it?
>
> On Mon, Aug 3, 2020 at 11:08 AM Anton Okolnychyi <ao...@apple.com>
> wrote:
>
>> I see a few open issues for ORC. Some of them seem critical (like issue
>> #1269). Do we want to fix those before the release? Or is ORC support still
>> experimental?
>>
>> - Anton
>>
>> On 1 Aug 2020, at 20:04, Jungtaek Lim <ka...@gmail.com>
>> wrote:
>>
>> Sure! I just submitted #1285
>> <https://github.com/apache/iceberg/pull/1285> to exclude the refactor.
>> Once #1285 is merged I'll rebase the existing PR to do the refactor. Thanks
>> for the input!
>>
>> On Sun, Aug 2, 2020 at 4:41 AM Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>>> Thanks, Jungtaek! I agree it would be great to fix that problem. I took
>>> a quick look at the PR and it is a little big to go into a patch release
>>> since it refactors quite a few places to consolidate the list copy. What do
>>> you think about making a PR that just fixes the problem with
>>> BaseCombinedScanTask and Kryo, then doing the remainder of the refactor in
>>> master?
>>>
>>> On Fri, Jul 31, 2020 at 5:29 PM Jungtaek Lim <
>>> kabhwan.opensource@gmail.com> wrote:
>>>
>>>> If we still have some more days I think #1280
>>>> <https://github.com/apache/iceberg/pull/1280>: "fix serialization
>>>> issue in BaseCombinedScanTask with Kyro" is a good candidate to be
>>>> included. The bug affects both Spark and Flink (according to #1279
>>>> <https://github.com/apache/iceberg/pull/1279>).
>>>>
>>>> On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <bl...@apache.org> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> We’ve accumulated a few bug fixes in the last couple of weeks and I
>>>>> think it might make sense to get some of them out in an 0.9.1 release since
>>>>> they make it harder to work with Iceberg. Here are the ones I know about:
>>>>>
>>>>>    - #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot
>>>>>    fails for binary and unary predicates
>>>>>    - #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import
>>>>>    from commons-compress causes query failures
>>>>>    - #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more
>>>>>    imports from non-Iceberg Guava
>>>>>    - #1283 <https://github.com/apache/iceberg/pull/1283>: Query
>>>>>    descriptions fail when IN predicates are pushed
>>>>>    - #1228 <https://github.com/apache/iceberg/pull/1228>: Data
>>>>>    imports fail when paths include whitespace
>>>>>    - #1194 <https://github.com/apache/iceberg/pull/1194>: USING
>>>>>    should set format when used in a CTAS command
>>>>>    - #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache
>>>>>    should not expire
>>>>>
>>>>> If there are no objections, I’ll get started and create a release
>>>>> branch. And please reply if there are other issues you’ve seen that should
>>>>> also be included in a patch release.
>>>>>
>>>>> rb
>>>>> --
>>>>> Ryan Blue
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: [DISCUSS] 0.9.1 release

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

Yes, we should get #1269 into a patch release as well since it is a
correctness bug.

Does anyone know if we can recover existing data affected by it?

On Mon, Aug 3, 2020 at 11:08 AM Anton Okolnychyi <ao...@apple.com>
wrote:

> I see a few open issues for ORC. Some of them seem critical (like issue
> #1269). Do we want to fix those before the release? Or is ORC support still
> experimental?
>
> - Anton
>
> On 1 Aug 2020, at 20:04, Jungtaek Lim <ka...@gmail.com>
> wrote:
>
> Sure! I just submitted #1285 <https://github.com/apache/iceberg/pull/1285>
> to exclude the refactor. Once #1285 is merged I'll rebase the existing PR
> to do the refactor. Thanks for the input!
>
> On Sun, Aug 2, 2020 at 4:41 AM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> Thanks, Jungtaek! I agree it would be great to fix that problem. I took a
>> quick look at the PR and it is a little big to go into a patch release
>> since it refactors quite a few places to consolidate the list copy. What do
>> you think about making a PR that just fixes the problem with
>> BaseCombinedScanTask and Kryo, then doing the remainder of the refactor in
>> master?
>>
>> On Fri, Jul 31, 2020 at 5:29 PM Jungtaek Lim <
>> kabhwan.opensource@gmail.com> wrote:
>>
>>> If we still have some more days I think #1280
>>> <https://github.com/apache/iceberg/pull/1280>: "fix serialization issue
>>> in BaseCombinedScanTask with Kyro" is a good candidate to be included. The
>>> bug affects both Spark and Flink (according to #1279
>>> <https://github.com/apache/iceberg/pull/1279>).
>>>
>>> On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <bl...@apache.org> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> We’ve accumulated a few bug fixes in the last couple of weeks and I
>>>> think it might make sense to get some of them out in an 0.9.1 release since
>>>> they make it harder to work with Iceberg. Here are the ones I know about:
>>>>
>>>>    - #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot
>>>>    fails for binary and unary predicates
>>>>    - #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import
>>>>    from commons-compress causes query failures
>>>>    - #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more
>>>>    imports from non-Iceberg Guava
>>>>    - #1283 <https://github.com/apache/iceberg/pull/1283>: Query
>>>>    descriptions fail when IN predicates are pushed
>>>>    - #1228 <https://github.com/apache/iceberg/pull/1228>: Data imports
>>>>    fail when paths include whitespace
>>>>    - #1194 <https://github.com/apache/iceberg/pull/1194>: USING should
>>>>    set format when used in a CTAS command
>>>>    - #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache
>>>>    should not expire
>>>>
>>>> If there are no objections, I’ll get started and create a release
>>>> branch. And please reply if there are other issues you’ve seen that should
>>>> also be included in a patch release.
>>>>
>>>> rb
>>>> --
>>>> Ryan Blue
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: [DISCUSS] 0.9.1 release

Posted by Anton Okolnychyi <ao...@apple.com.INVALID>.

I see a few open issues for ORC. Some of them seem critical (like issue #1269). Do we want to fix those before the release? Or is ORC support still experimental?

- Anton

> On 1 Aug 2020, at 20:04, Jungtaek Lim <ka...@gmail.com> wrote:
> 
> Sure! I just submitted #1285 <https://github.com/apache/iceberg/pull/1285> to exclude the refactor. Once #1285 is merged I'll rebase the existing PR to do the refactor. Thanks for the input!
> 
> On Sun, Aug 2, 2020 at 4:41 AM Ryan Blue <rb...@netflix.com.invalid> wrote:
> Thanks, Jungtaek! I agree it would be great to fix that problem. I took a quick look at the PR and it is a little big to go into a patch release since it refactors quite a few places to consolidate the list copy. What do you think about making a PR that just fixes the problem with BaseCombinedScanTask and Kryo, then doing the remainder of the refactor in master?
> 
> On Fri, Jul 31, 2020 at 5:29 PM Jungtaek Lim <kabhwan.opensource@gmail.com <ma...@gmail.com>> wrote:
> If we still have some more days I think #1280 <https://github.com/apache/iceberg/pull/1280>: "fix serialization issue in BaseCombinedScanTask with Kyro" is a good candidate to be included. The bug affects both Spark and Flink (according to #1279 <https://github.com/apache/iceberg/pull/1279>).
> 
> On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <blue@apache.org <ma...@apache.org>> wrote:
> Hi everyone,
> 
> We’ve accumulated a few bug fixes in the last couple of weeks and I think it might make sense to get some of them out in an 0.9.1 release since they make it harder to work with Iceberg. Here are the ones I know about:
> 
> #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot fails for binary and unary predicates
> #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import from commons-compress causes query failures
> #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more imports from non-Iceberg Guava
> #1283 <https://github.com/apache/iceberg/pull/1283>: Query descriptions fail when IN predicates are pushed
> #1228 <https://github.com/apache/iceberg/pull/1228>: Data imports fail when paths include whitespace
> #1194 <https://github.com/apache/iceberg/pull/1194>: USING should set format when used in a CTAS command
> #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache should not expire
> If there are no objections, I’ll get started and create a release branch. And please reply if there are other issues you’ve seen that should also be included in a patch release.
> 
> rb
> 
> -- 
> Ryan Blue
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Re: [DISCUSS] 0.9.1 release

Posted by Jungtaek Lim <ka...@gmail.com>.

Sure! I just submitted #1285 <https://github.com/apache/iceberg/pull/1285>
to exclude the refactor. Once #1285 is merged I'll rebase the existing PR
to do the refactor. Thanks for the input!

On Sun, Aug 2, 2020 at 4:41 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Thanks, Jungtaek! I agree it would be great to fix that problem. I took a
> quick look at the PR and it is a little big to go into a patch release
> since it refactors quite a few places to consolidate the list copy. What do
> you think about making a PR that just fixes the problem with
> BaseCombinedScanTask and Kryo, then doing the remainder of the refactor in
> master?
>
> On Fri, Jul 31, 2020 at 5:29 PM Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> If we still have some more days I think #1280
>> <https://github.com/apache/iceberg/pull/1280>: "fix serialization issue
>> in BaseCombinedScanTask with Kyro" is a good candidate to be included. The
>> bug affects both Spark and Flink (according to #1279
>> <https://github.com/apache/iceberg/pull/1279>).
>>
>> On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <bl...@apache.org> wrote:
>>
>>> Hi everyone,
>>>
>>> We’ve accumulated a few bug fixes in the last couple of weeks and I
>>> think it might make sense to get some of them out in an 0.9.1 release since
>>> they make it harder to work with Iceberg. Here are the ones I know about:
>>>
>>>    - #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot
>>>    fails for binary and unary predicates
>>>    - #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import
>>>    from commons-compress causes query failures
>>>    - #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more
>>>    imports from non-Iceberg Guava
>>>    - #1283 <https://github.com/apache/iceberg/pull/1283>: Query
>>>    descriptions fail when IN predicates are pushed
>>>    - #1228 <https://github.com/apache/iceberg/pull/1228>: Data imports
>>>    fail when paths include whitespace
>>>    - #1194 <https://github.com/apache/iceberg/pull/1194>: USING should
>>>    set format when used in a CTAS command
>>>    - #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache
>>>    should not expire
>>>
>>> If there are no objections, I’ll get started and create a release
>>> branch. And please reply if there are other issues you’ve seen that should
>>> also be included in a patch release.
>>>
>>> rb
>>> --
>>> Ryan Blue
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: [DISCUSS] 0.9.1 release

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

Thanks, Jungtaek! I agree it would be great to fix that problem. I took a
quick look at the PR and it is a little big to go into a patch release
since it refactors quite a few places to consolidate the list copy. What do
you think about making a PR that just fixes the problem with
BaseCombinedScanTask and Kryo, then doing the remainder of the refactor in
master?

On Fri, Jul 31, 2020 at 5:29 PM Jungtaek Lim <ka...@gmail.com>
wrote:

> If we still have some more days I think #1280
> <https://github.com/apache/iceberg/pull/1280>: "fix serialization issue
> in BaseCombinedScanTask with Kyro" is a good candidate to be included. The
> bug affects both Spark and Flink (according to #1279
> <https://github.com/apache/iceberg/pull/1279>).
>
> On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <bl...@apache.org> wrote:
>
>> Hi everyone,
>>
>> We’ve accumulated a few bug fixes in the last couple of weeks and I think
>> it might make sense to get some of them out in an 0.9.1 release since they
>> make it harder to work with Iceberg. Here are the ones I know about:
>>
>>    - #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot
>>    fails for binary and unary predicates
>>    - #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import
>>    from commons-compress causes query failures
>>    - #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more
>>    imports from non-Iceberg Guava
>>    - #1283 <https://github.com/apache/iceberg/pull/1283>: Query
>>    descriptions fail when IN predicates are pushed
>>    - #1228 <https://github.com/apache/iceberg/pull/1228>: Data imports
>>    fail when paths include whitespace
>>    - #1194 <https://github.com/apache/iceberg/pull/1194>: USING should
>>    set format when used in a CTAS command
>>    - #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache
>>    should not expire
>>
>> If there are no objections, I’ll get started and create a release branch.
>> And please reply if there are other issues you’ve seen that should also be
>> included in a patch release.
>>
>> rb
>> --
>> Ryan Blue
>>
>

-- 
Ryan Blue
Software Engineer
Netflix