You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Jungtaek Lim <ka...@gmail.com> on 2020/08/01 00:29:12 UTC
Re: [DISCUSS] 0.9.1 release
If we still have some more days I think #1280
<https://github.com/apache/iceberg/pull/1280>: "fix serialization issue in
BaseCombinedScanTask with Kyro" is a good candidate to be included. The bug
affects both Spark and Flink (according to #1279
<https://github.com/apache/iceberg/pull/1279>).
On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <bl...@apache.org> wrote:
> Hi everyone,
>
> We’ve accumulated a few bug fixes in the last couple of weeks and I think
> it might make sense to get some of them out in an 0.9.1 release since they
> make it harder to work with Iceberg. Here are the ones I know about:
>
> - #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot
> fails for binary and unary predicates
> - #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import from
> commons-compress causes query failures
> - #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more
> imports from non-Iceberg Guava
> - #1283 <https://github.com/apache/iceberg/pull/1283>: Query
> descriptions fail when IN predicates are pushed
> - #1228 <https://github.com/apache/iceberg/pull/1228>: Data imports
> fail when paths include whitespace
> - #1194 <https://github.com/apache/iceberg/pull/1194>: USING should
> set format when used in a CTAS command
> - #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache
> should not expire
>
> If there are no objections, I’ll get started and create a release branch.
> And please reply if there are other issues you’ve seen that should also be
> included in a patch release.
>
> rb
> --
> Ryan Blue
>
Re: [DISCUSS] 0.9.1 release
Posted by OpenInx <op...@gmail.com>.
> Does anyone know if we can recover existing data affected by it?
In the PR #1271, there are two data types which have correctness bugs:
decimal18 and timestampZone.
For decimal18, we actually write the correct decimal value, but read it in
an incorrect way. saying the decimal(10,3) and value = 10.100, the orc
writer will store it in file as 101*10^(-1), while before this patch we
will read it as 101*10^(-3). If we use the scale=-1 to construct the
BigDecimal and then adjust to scale=3, then in theory we could still get
the correct decimal 10100*10^(-3).
For timestampZone, I'd say that we've stored the wrong value in the file,
the error range between the written timestamp and correct timestamp should
be less than a few seconds. Because here [1] for negative value, -5 / 2
= -2, floorDiv(-5, 2) = -3, the error range should be less than 1, the
nanoseconds of timestamp is the value that is less than one second. While
I did not get the way to recover the existing data.
1.
https://github.com/apache/iceberg/pull/1271/files#diff-5aa4840155ec70fdf7f725e122cde7b7L218
On Tue, Aug 4, 2020 at 3:08 AM Ryan Blue <rb...@netflix.com.invalid> wrote:
> Yes, we should get #1269 into a patch release as well since it is a
> correctness bug.
>
> Does anyone know if we can recover existing data affected by it?
>
> On Mon, Aug 3, 2020 at 11:08 AM Anton Okolnychyi <ao...@apple.com>
> wrote:
>
>> I see a few open issues for ORC. Some of them seem critical (like issue
>> #1269). Do we want to fix those before the release? Or is ORC support still
>> experimental?
>>
>> - Anton
>>
>> On 1 Aug 2020, at 20:04, Jungtaek Lim <ka...@gmail.com>
>> wrote:
>>
>> Sure! I just submitted #1285
>> <https://github.com/apache/iceberg/pull/1285> to exclude the refactor.
>> Once #1285 is merged I'll rebase the existing PR to do the refactor. Thanks
>> for the input!
>>
>> On Sun, Aug 2, 2020 at 4:41 AM Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>>> Thanks, Jungtaek! I agree it would be great to fix that problem. I took
>>> a quick look at the PR and it is a little big to go into a patch release
>>> since it refactors quite a few places to consolidate the list copy. What do
>>> you think about making a PR that just fixes the problem with
>>> BaseCombinedScanTask and Kryo, then doing the remainder of the refactor in
>>> master?
>>>
>>> On Fri, Jul 31, 2020 at 5:29 PM Jungtaek Lim <
>>> kabhwan.opensource@gmail.com> wrote:
>>>
>>>> If we still have some more days I think #1280
>>>> <https://github.com/apache/iceberg/pull/1280>: "fix serialization
>>>> issue in BaseCombinedScanTask with Kyro" is a good candidate to be
>>>> included. The bug affects both Spark and Flink (according to #1279
>>>> <https://github.com/apache/iceberg/pull/1279>).
>>>>
>>>> On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <bl...@apache.org> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> We’ve accumulated a few bug fixes in the last couple of weeks and I
>>>>> think it might make sense to get some of them out in an 0.9.1 release since
>>>>> they make it harder to work with Iceberg. Here are the ones I know about:
>>>>>
>>>>> - #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot
>>>>> fails for binary and unary predicates
>>>>> - #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import
>>>>> from commons-compress causes query failures
>>>>> - #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more
>>>>> imports from non-Iceberg Guava
>>>>> - #1283 <https://github.com/apache/iceberg/pull/1283>: Query
>>>>> descriptions fail when IN predicates are pushed
>>>>> - #1228 <https://github.com/apache/iceberg/pull/1228>: Data
>>>>> imports fail when paths include whitespace
>>>>> - #1194 <https://github.com/apache/iceberg/pull/1194>: USING
>>>>> should set format when used in a CTAS command
>>>>> - #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache
>>>>> should not expire
>>>>>
>>>>> If there are no objections, I’ll get started and create a release
>>>>> branch. And please reply if there are other issues you’ve seen that should
>>>>> also be included in a patch release.
>>>>>
>>>>> rb
>>>>> --
>>>>> Ryan Blue
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
Re: [DISCUSS] 0.9.1 release
Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Yes, we should get #1269 into a patch release as well since it is a
correctness bug.
Does anyone know if we can recover existing data affected by it?
On Mon, Aug 3, 2020 at 11:08 AM Anton Okolnychyi <ao...@apple.com>
wrote:
> I see a few open issues for ORC. Some of them seem critical (like issue
> #1269). Do we want to fix those before the release? Or is ORC support still
> experimental?
>
> - Anton
>
> On 1 Aug 2020, at 20:04, Jungtaek Lim <ka...@gmail.com>
> wrote:
>
> Sure! I just submitted #1285 <https://github.com/apache/iceberg/pull/1285>
> to exclude the refactor. Once #1285 is merged I'll rebase the existing PR
> to do the refactor. Thanks for the input!
>
> On Sun, Aug 2, 2020 at 4:41 AM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> Thanks, Jungtaek! I agree it would be great to fix that problem. I took a
>> quick look at the PR and it is a little big to go into a patch release
>> since it refactors quite a few places to consolidate the list copy. What do
>> you think about making a PR that just fixes the problem with
>> BaseCombinedScanTask and Kryo, then doing the remainder of the refactor in
>> master?
>>
>> On Fri, Jul 31, 2020 at 5:29 PM Jungtaek Lim <
>> kabhwan.opensource@gmail.com> wrote:
>>
>>> If we still have some more days I think #1280
>>> <https://github.com/apache/iceberg/pull/1280>: "fix serialization issue
>>> in BaseCombinedScanTask with Kyro" is a good candidate to be included. The
>>> bug affects both Spark and Flink (according to #1279
>>> <https://github.com/apache/iceberg/pull/1279>).
>>>
>>> On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <bl...@apache.org> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> We’ve accumulated a few bug fixes in the last couple of weeks and I
>>>> think it might make sense to get some of them out in an 0.9.1 release since
>>>> they make it harder to work with Iceberg. Here are the ones I know about:
>>>>
>>>> - #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot
>>>> fails for binary and unary predicates
>>>> - #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import
>>>> from commons-compress causes query failures
>>>> - #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more
>>>> imports from non-Iceberg Guava
>>>> - #1283 <https://github.com/apache/iceberg/pull/1283>: Query
>>>> descriptions fail when IN predicates are pushed
>>>> - #1228 <https://github.com/apache/iceberg/pull/1228>: Data imports
>>>> fail when paths include whitespace
>>>> - #1194 <https://github.com/apache/iceberg/pull/1194>: USING should
>>>> set format when used in a CTAS command
>>>> - #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache
>>>> should not expire
>>>>
>>>> If there are no objections, I’ll get started and create a release
>>>> branch. And please reply if there are other issues you’ve seen that should
>>>> also be included in a patch release.
>>>>
>>>> rb
>>>> --
>>>> Ryan Blue
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>
--
Ryan Blue
Software Engineer
Netflix
Re: [DISCUSS] 0.9.1 release
Posted by Anton Okolnychyi <ao...@apple.com.INVALID>.
I see a few open issues for ORC. Some of them seem critical (like issue #1269). Do we want to fix those before the release? Or is ORC support still experimental?
- Anton
> On 1 Aug 2020, at 20:04, Jungtaek Lim <ka...@gmail.com> wrote:
>
> Sure! I just submitted #1285 <https://github.com/apache/iceberg/pull/1285> to exclude the refactor. Once #1285 is merged I'll rebase the existing PR to do the refactor. Thanks for the input!
>
> On Sun, Aug 2, 2020 at 4:41 AM Ryan Blue <rb...@netflix.com.invalid> wrote:
> Thanks, Jungtaek! I agree it would be great to fix that problem. I took a quick look at the PR and it is a little big to go into a patch release since it refactors quite a few places to consolidate the list copy. What do you think about making a PR that just fixes the problem with BaseCombinedScanTask and Kryo, then doing the remainder of the refactor in master?
>
> On Fri, Jul 31, 2020 at 5:29 PM Jungtaek Lim <kabhwan.opensource@gmail.com <ma...@gmail.com>> wrote:
> If we still have some more days I think #1280 <https://github.com/apache/iceberg/pull/1280>: "fix serialization issue in BaseCombinedScanTask with Kyro" is a good candidate to be included. The bug affects both Spark and Flink (according to #1279 <https://github.com/apache/iceberg/pull/1279>).
>
> On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <blue@apache.org <ma...@apache.org>> wrote:
> Hi everyone,
>
> We’ve accumulated a few bug fixes in the last couple of weeks and I think it might make sense to get some of them out in an 0.9.1 release since they make it harder to work with Iceberg. Here are the ones I know about:
>
> #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot fails for binary and unary predicates
> #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import from commons-compress causes query failures
> #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more imports from non-Iceberg Guava
> #1283 <https://github.com/apache/iceberg/pull/1283>: Query descriptions fail when IN predicates are pushed
> #1228 <https://github.com/apache/iceberg/pull/1228>: Data imports fail when paths include whitespace
> #1194 <https://github.com/apache/iceberg/pull/1194>: USING should set format when used in a CTAS command
> #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache should not expire
> If there are no objections, I’ll get started and create a release branch. And please reply if there are other issues you’ve seen that should also be included in a patch release.
>
> rb
>
> --
> Ryan Blue
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
Re: [DISCUSS] 0.9.1 release
Posted by Jungtaek Lim <ka...@gmail.com>.
Sure! I just submitted #1285 <https://github.com/apache/iceberg/pull/1285>
to exclude the refactor. Once #1285 is merged I'll rebase the existing PR
to do the refactor. Thanks for the input!
On Sun, Aug 2, 2020 at 4:41 AM Ryan Blue <rb...@netflix.com.invalid> wrote:
> Thanks, Jungtaek! I agree it would be great to fix that problem. I took a
> quick look at the PR and it is a little big to go into a patch release
> since it refactors quite a few places to consolidate the list copy. What do
> you think about making a PR that just fixes the problem with
> BaseCombinedScanTask and Kryo, then doing the remainder of the refactor in
> master?
>
> On Fri, Jul 31, 2020 at 5:29 PM Jungtaek Lim <ka...@gmail.com>
> wrote:
>
>> If we still have some more days I think #1280
>> <https://github.com/apache/iceberg/pull/1280>: "fix serialization issue
>> in BaseCombinedScanTask with Kyro" is a good candidate to be included. The
>> bug affects both Spark and Flink (according to #1279
>> <https://github.com/apache/iceberg/pull/1279>).
>>
>> On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <bl...@apache.org> wrote:
>>
>>> Hi everyone,
>>>
>>> We’ve accumulated a few bug fixes in the last couple of weeks and I
>>> think it might make sense to get some of them out in an 0.9.1 release since
>>> they make it harder to work with Iceberg. Here are the ones I know about:
>>>
>>> - #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot
>>> fails for binary and unary predicates
>>> - #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import
>>> from commons-compress causes query failures
>>> - #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more
>>> imports from non-Iceberg Guava
>>> - #1283 <https://github.com/apache/iceberg/pull/1283>: Query
>>> descriptions fail when IN predicates are pushed
>>> - #1228 <https://github.com/apache/iceberg/pull/1228>: Data imports
>>> fail when paths include whitespace
>>> - #1194 <https://github.com/apache/iceberg/pull/1194>: USING should
>>> set format when used in a CTAS command
>>> - #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache
>>> should not expire
>>>
>>> If there are no objections, I’ll get started and create a release
>>> branch. And please reply if there are other issues you’ve seen that should
>>> also be included in a patch release.
>>>
>>> rb
>>> --
>>> Ryan Blue
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
Re: [DISCUSS] 0.9.1 release
Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Thanks, Jungtaek! I agree it would be great to fix that problem. I took a
quick look at the PR and it is a little big to go into a patch release
since it refactors quite a few places to consolidate the list copy. What do
you think about making a PR that just fixes the problem with
BaseCombinedScanTask and Kryo, then doing the remainder of the refactor in
master?
On Fri, Jul 31, 2020 at 5:29 PM Jungtaek Lim <ka...@gmail.com>
wrote:
> If we still have some more days I think #1280
> <https://github.com/apache/iceberg/pull/1280>: "fix serialization issue
> in BaseCombinedScanTask with Kyro" is a good candidate to be included. The
> bug affects both Spark and Flink (according to #1279
> <https://github.com/apache/iceberg/pull/1279>).
>
> On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <bl...@apache.org> wrote:
>
>> Hi everyone,
>>
>> We’ve accumulated a few bug fixes in the last couple of weeks and I think
>> it might make sense to get some of them out in an 0.9.1 release since they
>> make it harder to work with Iceberg. Here are the ones I know about:
>>
>> - #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot
>> fails for binary and unary predicates
>> - #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import
>> from commons-compress causes query failures
>> - #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more
>> imports from non-Iceberg Guava
>> - #1283 <https://github.com/apache/iceberg/pull/1283>: Query
>> descriptions fail when IN predicates are pushed
>> - #1228 <https://github.com/apache/iceberg/pull/1228>: Data imports
>> fail when paths include whitespace
>> - #1194 <https://github.com/apache/iceberg/pull/1194>: USING should
>> set format when used in a CTAS command
>> - #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache
>> should not expire
>>
>> If there are no objections, I’ll get started and create a release branch.
>> And please reply if there are other issues you’ve seen that should also be
>> included in a patch release.
>>
>> rb
>> --
>> Ryan Blue
>>
>
--
Ryan Blue
Software Engineer
Netflix