You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by RD <rd...@gmail.com> on 2019/05/13 17:05:49 UTC

Reader schema does not project union but data has non-optional unions

Iceberg today does not support non optional unions and that is the right
behaviour, but we do have a lot of datasets which have non-optional union
fields. I'm wondering whether Iceberg should allow reading these datasets
as long as the user does not project the union field.

I tried it out and today is throws an exception during column pruning.

If we think this should be support, I'll create an issue for this.

-Best,
R.

Re: Reader schema does not project union but data has non-optional unions

Posted by RD <rd...@gmail.com>.
Mostly avro today, but ORC also has union types. I'll create an issue for
this later today.

Thanks,
R.

On Wed, May 15, 2019 at 11:16 AM Ryan Blue <rb...@netflix.com.invalid>
wrote:

> Are you talking about Avro data? I think Parquet data would work fine
> because unions are represented as a struct of optionals.
>
> I think this makes sense. Maybe we could also allow projecting the
> contents of unions by representing them as structs of optionals and
> materializing them that way. I'd be up for reviewing this.
>
> On Mon, May 13, 2019 at 1:48 PM RD <rd...@gmail.com> wrote:
>
>> To add to this, I'm not suggesting to change Iceberg writers to support
>> writing non-optional unions. The motivation for this is to support legacy
>> datasets [not written by Iceberg].
>>
>> On Mon, May 13, 2019 at 10:05 AM RD <rd...@gmail.com> wrote:
>>
>>> Iceberg today does not support non optional unions and that is the right
>>> behaviour, but we do have a lot of datasets which have non-optional union
>>> fields. I'm wondering whether Iceberg should allow reading these datasets
>>> as long as the user does not project the union field.
>>>
>>> I tried it out and today is throws an exception during column pruning.
>>>
>>> If we think this should be support, I'll create an issue for this.
>>>
>>> -Best,
>>> R.
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Reader schema does not project union but data has non-optional unions

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Are you talking about Avro data? I think Parquet data would work fine
because unions are represented as a struct of optionals.

I think this makes sense. Maybe we could also allow projecting the contents
of unions by representing them as structs of optionals and materializing
them that way. I'd be up for reviewing this.

On Mon, May 13, 2019 at 1:48 PM RD <rd...@gmail.com> wrote:

> To add to this, I'm not suggesting to change Iceberg writers to support
> writing non-optional unions. The motivation for this is to support legacy
> datasets [not written by Iceberg].
>
> On Mon, May 13, 2019 at 10:05 AM RD <rd...@gmail.com> wrote:
>
>> Iceberg today does not support non optional unions and that is the right
>> behaviour, but we do have a lot of datasets which have non-optional union
>> fields. I'm wondering whether Iceberg should allow reading these datasets
>> as long as the user does not project the union field.
>>
>> I tried it out and today is throws an exception during column pruning.
>>
>> If we think this should be support, I'll create an issue for this.
>>
>> -Best,
>> R.
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Reader schema does not project union but data has non-optional unions

Posted by RD <rd...@gmail.com>.
To add to this, I'm not suggesting to change Iceberg writers to support
writing non-optional unions. The motivation for this is to support legacy
datasets [not written by Iceberg].

On Mon, May 13, 2019 at 10:05 AM RD <rd...@gmail.com> wrote:

> Iceberg today does not support non optional unions and that is the right
> behaviour, but we do have a lot of datasets which have non-optional union
> fields. I'm wondering whether Iceberg should allow reading these datasets
> as long as the user does not project the union field.
>
> I tried it out and today is throws an exception during column pruning.
>
> If we think this should be support, I'll create an issue for this.
>
> -Best,
> R.
>