You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@twitter.com.INVALID> on 2014/10/28 18:31:34 UTC
parquet sync up
Happening now:
https://plus.google.com/events/c2qu63kvjn2m31gnlq9hcrounh8
Re: parquet sync up
Posted by Julien Le Dem <ju...@twitter.com.INVALID>.
Attendance:
- Criteo: Mickael working on Hive Serde
- Apache Drill: Parth (MapR)
- Cloudera: Ryan
- Netflix: Dan, Tonjie, Zhengxiao, Nezih (working on Presto)
- Twitter: Julien
Notes:
- Dealing with List and Maps containing nulls.
in the Serde, Map of array and array of Map has been fixed
Mickael currently working on HIVE-6994 => null inside array.
List or arrays are modeled with a 3 level representation:
- One optional field for the list itself that can be null
- One repeated field for the items
- One optional field to allow storing nulls in the list
Ryan to send a PR for standardizing representation of lists.
We need a permissive model for backward compatibility.
We need to make sure there's no ambiguity between user defined one field
groups and synthetic extra layers to represent null in lists
- Vectorized execution. Netflix and Drill team working together
proposed API based on presto.
people interested should review (Drill, Hive, Spark)
Parth: we should be able to pass in an allocator. (init and cleanup) See
PARQUET-8[7-8]
possibly we should use [Byte,...]Buffers instead of arrays
- Jobs with significant setup time. What done to speed it up.
PARQUET-100: HCatalog => write one file per partition.
increasing default parallelism.
Need to be reviewed.
- Java 8 support: Tom form Cloudera working on it.
- Parquet release:
- We need to add license headers.
- plan: release, rename packages, merge byte buffer APIs, merge 2.0
related JIRAs
- See PARQUET-111: plan for release to review
- encoding fallback: Julien to add description in PR
- new PRs for Parquet 2.0
encoding fall back
new page formats
predicate push down on dictionary
Next sync up Tuesday, Nov 18, 2014 10:30 am PST
If you want a reminder send an email.
On Tue, Oct 28, 2014 at 10:31 AM, Julien Le Dem <ju...@twitter.com> wrote:
> Happening now:
> https://plus.google.com/events/c2qu63kvjn2m31gnlq9hcrounh8
>