You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Gidon Gershinsky <gg...@gmail.com> on 2019/09/19 18:03:45 UTC

Re: Parquet Sync - Meeting Notes

Hi Xinli,

Regarding the parquet-cpp encryption - there are no integration errors. A
number of pull requests are merged by now; the remaining code has been
reviewed and updated; the only outstanding question (how/when to turn off
OpenSSL getting included in Arrow) had been addressed by the Cpp leads and
will be resolved soon.

Cheers, Gidon.

On Thu, Sep 19, 2019 at 8:37 PM <sh...@uber.com> wrote:

> Hi all,
>
> This is the meeting notes that I took. Feel free to add or correct it if
> something is missed or wrong.
>
> 9/19/2019
>
> Attendee:
> Xinli Shang(Uber)
> Gidon Gershinsky(IBM)
> Jim Apple (Netflix)
> Nandor Kollar, Gabor, and several other Cloudera folks
> Julien Le Dem (WeWork)
> Deepak (Vertica)
> Please add if you are missed.
> Topics:
> Column Encryption
> Parquet-format has the specification merged.
> One PR is merged into parquet-mr, the second is being reviewed.
> For parquet-cpp, we still have some integration errors.
> Xinli backported the encryption code to parquet 1.10.1 to mitigate the
> risk. We can wait for 1.11.0 release before deciding should public
> community should do that.
>
> Bloom filter
> The spec has been checked in to parquet-format.
> Will continue the validation of the correctness on parquet-mr(feature
> branch) and parquet-cpp(master branch? some code like reader/writer not in
> master branch yet).
> Netflix has done enough testing on performance. The remaining tests are
> mainly for correctness.
> There are unit tests and integration tests that cover.
>
> Parquet-format 2.7.0
> Releasing of parquet-format is slow now. We need the release before
> checking into parquet-mr master.
> There are several options. We prefer option 3 that is to release bloom
> filter and parquet encryption together in 2.7.0.
> 3 PMC voted in this meeting +1 for the option 3.
> Ryan can help on the release, signing keys etc.
>
> Remove old Parquet modules
> Hive modules - sounds good
> Scooge - Julien will reach out to twitter
> Tools - undecided - Cloudera may still use the parquet-tools according to
> Gabor.
> Cascading - undecided
> We can change the module as deprecated as description.
>
> 1.11.0 Release
> Column index validation - Need Ryan to review it.
>
> Someone is proposing byte_stream_split encoding in the mailing list
> Ryan made a proposal and the owner just replied to try the proposal and
> get back.
>
> 7. Merge Parquet and ORC
> Ryan and Owen had a talk in ApacheCon regarding merging ORC and Parquet.
> There are a lot of benefits to doing that but also a lot of work. Overall,
> people in this meeting support this effort.
> Ryan can start socializing this effort.
>
> Xinli Shang (Uber)
>
>
>
> Parquet Sync
> Hi all,
>
> This is an invitation for the next occasion of the regular sync meeting of
> the Parquet community.
>
> Xinli Shang
>
> Join Zoom Meeting
> https://uber.zoom.us/j/112318682
> <https://www.google.com/url?q=https%3A%2F%2Fuber.zoom.us%2Fj%2F112318682&sa=D&ust=1569346673766000&usg=AFQjCNHq8oiMu0Nmrd08gPaB-b628c7g0g>
>
> One tap mobile
> +16699006833,,112318682# US (San Jose)
> +16468769923,,112318682# US (New York)
>
> Dial by your location
>         +1 669 900 6833 US (San Jose)
>         +1 646 876 9923 US (New York)
>         855 880 1246 US Toll-free
>         877 369 0926 US Toll-free
> Meeting ID: 112 318 682
> Find your local number: https://zoom.us/u/aZKZunOZ9
> <https://www.google.com/url?q=https%3A%2F%2Fzoom.us%2Fu%2FaZKZunOZ9&sa=D&ust=1569346673767000&usg=AFQjCNETnBw2FHwysQKOvs9iV8And4RGaQ>
>
> Join by SIP
> 112318682@zoomcrc.com
>
> Join by H.323
> 162.255.37.11 (US West)
> 162.255.36.11 (US East)
> 221.122.88.195 (China)
> 115.114.131.7 (India)
> 213.19.144.110 (EMEA)
> 103.122.166.55 (Australia)
> 209.9.211.110 (Hong Kong)
> 64.211.144.160 (Brazil)
> 69.174.57.160 (Canada)
> 207.226.132.110 (Japan)
> Meeting ID: 112 318 682
> *When*
> Thu Sep 19, 2019 9am – 10am Pacific Time - Los Angeles
>
> *Where*
> https://uber.zoom.us/j/112318682 (map
> <https://www.google.com/url?q=https%3A%2F%2Fuber.zoom.us%2Fj%2F112318682&sa=D&ust=1569346673769000&usg=AFQjCNHxn_23fnnu4UhPHNAm2VP5bk4BUg>
> )
>
> *Who*
> •
> shangx@uber.com - organizer
> •
> gg5070@gmail.com
> •
> Daniel Weeks
> •
> aniket486@gmail.com
> •
> danielshir@gmail.com
> •
> altekrusejason@gmail.com
> •
> ippokratis@gmail.com
> •
> Lars Volker
> •
> Mohit Sabharwal
> •
> santlal.gupta@bitwiseglobal.com
> •
> yumwang@ebay.com
> •
> smanik.im@gmail.com
> •
> szonyi@cloudera.com
> •
> Julien Le Dem
> •
> j.coffey@criteo.com
> •
> dev@parquet.apache.org
> •
> m.lacour@criteo.com
> •
> nongli@gmail.com
> •
> jacques@apache.org
> •
> fnothaft@berkeley.edu
> •
> venkik@uber.com
> •
> boroknagyz@cloudera.com
> •
> Xu, Cheng A
> •
> majeti.deepak@gmail.com
> •
> csringhofer@cloudera.com
> •
> stakiar@cloudera.com
> •
> o.kaidannik@criteo.com
> •
> bikramjeet.vig@cloudera.com
> •
> brian.bowman@sas.com
> •
> aphadke@cloudera.com
> •
> nkollar@cloudera.com
> •
> wesmckinn@gmail.com
> •
> Ryan Blue
> •
> Wei Han
> •
> robertk@palantir.com
> •
> Zoltan Ivanfi
> •
> smoy@yelp.com
> •
> xhochy@gmail.com
> •
> gabor.szadovszky@cloudera.com
> •
> m.liroz@criteo.com
> •
> baliga@uber.com
> •
> jimmyjchen@tencent.com
> •
> Pavi Subenderan
> •
> Reynold Xin
> •
> b.hanotte@criteo.com
> •
> sunchao@apache.org
> •
> q.xu@criteo.com
> •
> mark.marsh@kognitio.com
> •
> yaliangw@twitter.com
> •
> alexlevenson@twitter.com
> •
> marcelk@gmail.com
> •
> Parth Chandra
> •
> jbapple@cloudera.com
> •
> nilangekar.pooja@gmail.com
> •
> Mohammad Islam
> •
> julien.ledem@gmail.com
> •
> Sergio Pena
> •
> rzamora@nvidia.com
> •
> vercegovac@cloudera.com
> •
> szonyi.anna@gmail.com
> •
> dam6923@gmail.com
> •
> shri.hariharasubrahmanian@oracle.com
>