You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Stefán Baxter <st...@activitystream.com> on 2016/02/05 09:10:01 UTC

Drill 1.5 and a few Avro iseeus/questions

Hi,

I'm wondering why DRILL-4120 has been pushed back to 1.6.

I have no idea if we are the only ones using directory pruning with Avro
but we use Avro for streaming/fresh data before a Parquet conversion and
this would be a welcome fix.

Pet peeve - Avro Schema validation.

Some facts:

   - The Map structure supported by Avro can not be validated with a schema
   as it allows keys to vary and only ensures the data type of the value.

   - Evolving schema will fail with the current Avro validation when
   directory pruning is used unless all file headers, even in the pruned
   directories, are scanned

   - Schema validation in Avro and schema validation in Parquet are
   different

This, and in my opinion many other things, mean that the strict schema
validation in Avro should be a opt-in arrangement for those wanting stop
evolving their schema and put all their entries in a single file /
directory.

Additionally,  Avro 1.8 is just out and it, plus the parquet-avro now
support timestamp fields. It would be a great benefit of hafin proper date
/ timestamp handling in Avro and the Avro->Parquet conversion.

Yours truly,
  - The Slightly Disgruntled Drill-Avro User