You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Ryan Blue <bl...@cloudera.com> on 2015/05/18 22:43:15 UTC
High-level type evolution
I've been looking at schema evolution lately, and we don't currently
support changing physical types when a logical type does not change.
This could be a problem when two different systems have different, but
valid, representations for a logical type.
Decimal, for example, can be represented either with a binary or a
fixed. But if the requested schema for a file (say, binary) doesn't
match the underlying type (fixed) then the check that verifies all
columns can be satisfied fails, even though both requested type and
actual type are valid.
We can fix this by adding logic to the `checkContains` methods in the
Type classes, plus support in the converters. But I'm wondering if we
shouldn't take a closer look at projection and schema evolution in
general at this point.
Are there other ways to solve this problem? Can we do projection
differently, so we don't have to ignore the physical type of a requested
column in some cases? What are the rules for valid projection?
Thanks!
rb
--
Ryan Blue
Software Engineer
Cloudera, Inc.
Re: High-level type evolution
Posted by Julien Le Dem <ju...@twitter.com.INVALID>.
There should be a centralized place where type equivalence and conversion
are defined.
Then converters could reuse them and we would minimize the amount of work
required.
When projecting, parquet deserializes the physical types it knows about and
the converter uses the proper type conversion.
This could be implemented as a set of reusable PrimitiveConverters that
know how to convert from a given physical type to a logical type. they can
be composed with the appropriate converter if there's a more specific type
for a particular framework.
On Mon, May 18, 2015 at 1:43 PM, Ryan Blue <bl...@cloudera.com> wrote:
> I've been looking at schema evolution lately, and we don't currently
> support changing physical types when a logical type does not change. This
> could be a problem when two different systems have different, but valid,
> representations for a logical type.
>
> Decimal, for example, can be represented either with a binary or a fixed.
> But if the requested schema for a file (say, binary) doesn't match the
> underlying type (fixed) then the check that verifies all columns can be
> satisfied fails, even though both requested type and actual type are valid.
>
> We can fix this by adding logic to the `checkContains` methods in the Type
> classes, plus support in the converters. But I'm wondering if we shouldn't
> take a closer look at projection and schema evolution in general at this
> point.
>
> Are there other ways to solve this problem? Can we do projection
> differently, so we don't have to ignore the physical type of a requested
> column in some cases? What are the rules for valid projection?
>
> Thanks!
>
> rb
>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>