You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Stefán Baxter <st...@activitystream.com> on 2016/02/05 09:10:01 UTC
Drill 1.5 and a few Avro iseeus/questions
Hi,
I'm wondering why DRILL-4120 has been pushed back to 1.6.
I have no idea if we are the only ones using directory pruning with Avro
but we use Avro for streaming/fresh data before a Parquet conversion and
this would be a welcome fix.
Pet peeve - Avro Schema validation.
Some facts:
- The Map structure supported by Avro can not be validated with a schema
as it allows keys to vary and only ensures the data type of the value.
- Evolving schema will fail with the current Avro validation when
directory pruning is used unless all file headers, even in the pruned
directories, are scanned
- Schema validation in Avro and schema validation in Parquet are
different
This, and in my opinion many other things, mean that the strict schema
validation in Avro should be a opt-in arrangement for those wanting stop
evolving their schema and put all their entries in a single file /
directory.
Additionally, Avro 1.8 is just out and it, plus the parquet-avro now
support timestamp fields. It would be a great benefit of hafin proper date
/ timestamp handling in Avro and the Avro->Parquet conversion.
Yours truly,
- The Slightly Disgruntled Drill-Avro User