You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Swapnil Chougule <th...@gmail.com> on 2018/09/27 18:09:17 UTC

widening primitive conversion in parquet dictionary

Hi

Is there widening primitive conversion support in parquet dictionary?

I could see only same type methods are implemented in dictionary
https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/dictionary/PlainValuesDictionary.java

I came across a case where long data needs to be read as double.
PlainLongDictionary is being created for same. This dictionary has only
implementation for 'decodeToLong'.
Can we have 'decodeToDouble' implentation as well here? (as long to double
is widening primitive conversion). Same scenarios can be replicated for
other supported(widening primitive conversion) types.

Thanks,
Swapnil

Re: widening primitive conversion in parquet dictionary

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Type promotion should be done when constructing records, not in the
decoder. The decoding level should return exactly the data that was in the
file and allow higher-level logic to decide what to do with it, including
to cast it to a different type. That allows logic to vary across
applications, which may have different rules. Iceberg, for example, doesn't
allow promotion from long to double, but does allow promotion from float to
double and int to long.

Here's an example from Iceberg, where ints are promoted to longs:
https://github.com/Netflix/iceberg/blob/master/parquet/src/main/java/com/netflix/iceberg/parquet/ParquetValueReaders.java#L179-L193

That reader is used when the caller expects a long but the file contains an
int.

rb

On Thu, Sep 27, 2018 at 5:30 PM Swapnil Chougule <th...@gmail.com>
wrote:

> Hi
>
> Is there widening primitive conversion support in parquet dictionary?
>
> I could see only same type methods are implemented in dictionary
>
> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/values/dictionary/PlainValuesDictionary.java
>
> I came across a case where long data needs to be read as double.
> PlainLongDictionary is being created for same. This dictionary has only
> implementation for 'decodeToLong'.
> Can we have 'decodeToDouble' implentation as well here? (as long to double
> is widening primitive conversion). Same scenarios can be replicated for
> other supported(widening primitive conversion) types.
>
> Thanks,
> Swapnil
>


-- 
Ryan Blue
Software Engineer
Netflix