You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Owen O'Malley <ow...@gmail.com> on 2021/01/05 00:43:33 UTC

Type attributes

One of the challenges that we have at LinkedIn is that we have a *lot* of
Avro schemas. I'd like to be able to represent those Avro schemas using
Iceberg's types and there are a few challenges:

   - unions
   - enums
   - default values

One way out of those problems without extending the Iceberg type model is
to add type attributes where each sub-type has a logical string to string
map that can hold user-defined attributes.

Another use for those kind of attributes are to mark columns with
classification tags (eg. pii, etc.).

Thoughts,
    Owen

Re: Type attributes

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I'll add this to the agenda for the sync on Wednesday. It would be good to
hear if there are other use cases like this that we should support. Are you
trying to convert back to the original Avro schema from an Iceberg schema?

I have a few concerns about this. First, there is no SQL representation of
field-level metadata that I'm aware of, so it would necessarily be lossy in
some cases, like CREATE TABLE LIKE. Second, I think it is important for
Iceberg to have a strict spec when writing Avro data so that engines have a
small set of representations to expect. It's okay to read a union and make
it appear like an Iceberg struct, but I would not want to make it possible
to write an Iceberg struct as an Avro union.

On Mon, Jan 4, 2021 at 4:43 PM Owen O'Malley <ow...@gmail.com> wrote:

> One of the challenges that we have at LinkedIn is that we have a *lot* of
> Avro schemas. I'd like to be able to represent those Avro schemas using
> Iceberg's types and there are a few challenges:
>
>    - unions
>    - enums
>    - default values
>
> One way out of those problems without extending the Iceberg type model is
> to add type attributes where each sub-type has a logical string to string
> map that can hold user-defined attributes.
>
> Another use for those kind of attributes are to mark columns with
> classification tags (eg. pii, etc.).
>
> Thoughts,
>     Owen
>


-- 
Ryan Blue
Software Engineer
Netflix