You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@twitter.com.INVALID> on 2014/10/28 18:31:34 UTC

parquet sync up

Happening now:
https://plus.google.com/events/c2qu63kvjn2m31gnlq9hcrounh8

Re: parquet sync up

Posted by Julien Le Dem <ju...@twitter.com.INVALID>.
Attendance:
- Criteo: Mickael working on Hive Serde
- Apache Drill: Parth (MapR)
- Cloudera: Ryan
- Netflix: Dan, Tonjie, Zhengxiao, Nezih (working on Presto)
- Twitter: Julien

Notes:
- Dealing with List and Maps containing nulls.
in the Serde, Map of array and array of Map has been fixed
Mickael currently working on HIVE-6994 => null inside array.
List or arrays are modeled with a 3 level representation:
- One optional field for the list itself that can be null
- One repeated field for the items
- One optional field to allow storing nulls in the list
Ryan to send a PR for standardizing representation of lists.
We need a permissive model for backward compatibility.
We need to make sure there's no ambiguity between user defined one field
groups and synthetic extra layers to represent null in lists
- Vectorized execution. Netflix and Drill team working together
  proposed API based on presto.
  people interested should review (Drill, Hive, Spark)
  Parth: we should be able to pass in an allocator. (init and cleanup) See
PARQUET-8[7-8]
  possibly we should use [Byte,...]Buffers instead of arrays
- Jobs with significant setup time. What done to speed it up.
   PARQUET-100: HCatalog => write one file per partition.
   increasing default parallelism.
Need to be reviewed.
- Java 8 support: Tom form Cloudera working on it.
- Parquet release:
   - We need to add license headers.
   - plan: release, rename packages, merge byte buffer APIs, merge 2.0
related JIRAs
   - See PARQUET-111: plan for release to review
- encoding fallback: Julien to add description in PR
- new PRs for Parquet 2.0
 encoding fall back
 new page formats
 predicate push down on dictionary

Next sync up Tuesday, Nov 18, 2014 10:30 am PST
If you want a reminder send an email.

On Tue, Oct 28, 2014 at 10:31 AM, Julien Le Dem <ju...@twitter.com> wrote:

> Happening now:
> https://plus.google.com/events/c2qu63kvjn2m31gnlq9hcrounh8
>