You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Anna Szonyi <sz...@cloudera.com.INVALID> on 2018/10/19 14:46:57 UTC

Parquet sync notes

Hi,

I took some notes during our last parquet sync, please find below:

Ryan Blue, (Netflix)
Steven Moy (Yelp): encryption, indexing
Jim Apple (Cloudera): Bloom filters, indexing
Xinli Shang (Uber): encryption
Csaba Ringhofer (Cloudera)
Anna Szonyi (Cloudera): how to reduce commit time
Nandor Kollar (Cloudera)
Zoltan Ivanfi (Cloudera): present page size results, merge strategy
Julien (We Work)

*Encryption:*
Voting (and new discussion) has started on the design doc for the
modular encryption:
https://lists.apache.org/thread.html/e93a723df1c8c3b961cd9664d5da289f5ccffa47160cf1ecfb3227b5@%3Cdev.parquet.apache.org%3E


*Page size results:*
- https://docs.google.com/spreadsheets/d/1hfQPy8NkGbgGugnHWvIHSzZ-3Q5M7f3Dtf_oD9ACFRg/edit?usp=sharing#gid=552274286

*Merge strategies for feature branches:*
    - Squashing: con: losing the (blame) history for lines
    - Merge: con: complicated history, difficult to revert
    - Rebase or rebase and squash: con: difficult for non-committers
collaborating on a feature branch when rebasing is necessary
Generally Treat feature branch as master by proxy, gets reviewed as if
on master, squashed, has jiras related to the commit (no junk commit).
Cloudera folks will try out rebase/merge/merge revert and see which is
the most feasible.

Discussion started on the dev list:
https://lists.apache.org/thread.html/b923f9608e9d6d59f1040bb196b4aca176d1c06f838dda0b28020ebd@%3Cdev.parquet.apache.org%3E

*Bloom filters*
Discussion around what is left to get it committed?
A.I.:Start a vote on the design doc

*Reduce "time to committed"*
Generally differentiate between changes that affect the format vs.
not, if it does than it should be slower, if it doesn't then we should
be less strict.
Format changes: design docs, votes on docs/changes
contributors reviewing other contributors can lead to faster committing.
Frequent reviews -> committership
A.I.: Anna to create a draft of a contributors doc.


Best,
Anna