You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by Stefán Baxter <st...@activitystream.com> on 2015/11/21 23:16:40 UTC

CTAS - Converting Avro files to parquet - Missing timestamp datatype

Hi,

We are using Avro files for all our logging and they contain long
timestamp_mills values.

When they are converted to Parquet using CTAS we wither need a hint (or
something) to ensure that these columns become Timestamp values in parquet
- or - we need to create a complex select with casting.

I'm wondering if there are any shortcuts/tricks available.

Currently We have dictionary encoding turned on and strangely the BigInt
values, the parquet field type selected for the long-timestamp, uses
dictionary encoding:

ov 21, 2015 9:59:16 PM INFO:
org.apache.parquet.hadoop.ColumnChunkPageWriteStore: written 8,525B for
[occurred_at] INT64: 6,615 values, 9,379B raw, 8,479B comp, 1 pages,
encodings: [*PLAIN_DICTIONARY*, RLE, BIT_PACKED], dic { 2,812 entries,
22,496B raw, 2,812B comp}


Can someone shed a light on these two issues?

Regards,
 -Stefan