You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Xinli shang <sh...@uber.com.INVALID> on 2020/11/24 18:46:10 UTC

Parquet sync meeting 11/24/2020

11/24/2020

Hi all,

Attendees:

   1.

   To solve Parquet upgrading with Avro version issue, should we release
   Parquet Avro with a separate release?
   1.

      For uprading Avro from1.8 to 1.9, Parquet only have unit test change
      and parquet-cli and user can excluce avro from Parquet
      2.

      The long-term still benefits if we can separate but it is not easy,
      for now, it is not required.
      2.

   Column Encryption
   1.

      C++ version has several PRs (improvements) recently.
      3.

   Data masking


   1.

   Some upper layer can develop their own data masking easily.
   2.

   We might think about some simple tools other than executing them in
   Parquet.
   3.

   Developed null data masking in Parquet and it works now. Open a Google
   doc and we can discuss from there.


   1.

   Parquet 1.11.x adoption to Presto
   1.

      PR <https://github.com/prestodb/presto/pull/14960> is created but it
      has a unit test failure.
      2.

   Parquet 1.11.x feature adoption to Iceberg
   1.

      Iceberg meeting notes
      <https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#>
      for discussing this issue.
      2.

      Issue summary and proposals
      <https://docs.google.com/document/d/1f8erGSnhVcdD0UokGx2opjmGvCU69g7fsiPXCJhP3MA/edit#>

      3.

      For having Parquet V2 API to support Iceberg, if we do that, then
      makes sense to have vectorized API with Parquet V2 API. Let’s bring other
      PMS/commuters to discuss for the next community meeting.
      3.

   Parquet 1.12.0

a. Will cut RC release soon

Please let me know if you have any questions.

Xinli Shang | Tech Lead Manager @ Uber Data Infra


-- 
Xinli Shang

Re: Parquet sync meeting 11/24/2020

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Sorry I wasn't able to make it to the sync today. I should be able to make
it to the next one and we can talk about getting some of Iceberg's changes
upstream.

On Tue, Nov 24, 2020 at 10:46 AM Xinli shang <sh...@uber.com.invalid>
wrote:

> 11/24/2020
>
> Hi all,
>
> Attendees:
>
>    1.
>
>    To solve Parquet upgrading with Avro version issue, should we release
>    Parquet Avro with a separate release?
>    1.
>
>       For uprading Avro from1.8 to 1.9, Parquet only have unit test change
>       and parquet-cli and user can excluce avro from Parquet
>       2.
>
>       The long-term still benefits if we can separate but it is not easy,
>       for now, it is not required.
>       2.
>
>    Column Encryption
>    1.
>
>       C++ version has several PRs (improvements) recently.
>       3.
>
>    Data masking
>
>
>    1.
>
>    Some upper layer can develop their own data masking easily.
>    2.
>
>    We might think about some simple tools other than executing them in
>    Parquet.
>    3.
>
>    Developed null data masking in Parquet and it works now. Open a Google
>    doc and we can discuss from there.
>
>
>    1.
>
>    Parquet 1.11.x adoption to Presto
>    1.
>
>       PR <https://github.com/prestodb/presto/pull/14960> is created but it
>       has a unit test failure.
>       2.
>
>    Parquet 1.11.x feature adoption to Iceberg
>    1.
>
>       Iceberg meeting notes
>       <
> https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#
> >
>       for discussing this issue.
>       2.
>
>       Issue summary and proposals
>       <
> https://docs.google.com/document/d/1f8erGSnhVcdD0UokGx2opjmGvCU69g7fsiPXCJhP3MA/edit#
> >
>
>       3.
>
>       For having Parquet V2 API to support Iceberg, if we do that, then
>       makes sense to have vectorized API with Parquet V2 API. Let’s bring
> other
>       PMS/commuters to discuss for the next community meeting.
>       3.
>
>    Parquet 1.12.0
>
> a. Will cut RC release soon
>
> Please let me know if you have any questions.
>
> Xinli Shang | Tech Lead Manager @ Uber Data Infra
>
>
> --
> Xinli Shang
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Parquet sync meeting 11/24/2020

Posted by Gidon Gershinsky <gg...@gmail.com>.
Thanks Xinli,

A slight correction re 2.1: The recent improvements / pull requests are in
the Java version, parquet-mr. But the ideas for some of them indeed came
from our work with the Arrow teams, that develop a C++ version of parquet
modular encryption.

Cheers, Gidon


On Tue, Nov 24, 2020 at 8:46 PM Xinli shang <sh...@uber.com.invalid> wrote:

> 11/24/2020
>
> Hi all,
>
> Attendees:
>
>    1.
>
>    To solve Parquet upgrading with Avro version issue, should we release
>    Parquet Avro with a separate release?
>    1.
>
>       For uprading Avro from1.8 to 1.9, Parquet only have unit test change
>       and parquet-cli and user can excluce avro from Parquet
>       2.
>
>       The long-term still benefits if we can separate but it is not easy,
>       for now, it is not required.
>       2.
>
>    Column Encryption
>    1.
>
>       C++ version has several PRs (improvements) recently.
>       3.
>
>    Data masking
>
>
>    1.
>
>    Some upper layer can develop their own data masking easily.
>    2.
>
>    We might think about some simple tools other than executing them in
>    Parquet.
>    3.
>
>    Developed null data masking in Parquet and it works now. Open a Google
>    doc and we can discuss from there.
>
>
>    1.
>
>    Parquet 1.11.x adoption to Presto
>    1.
>
>       PR <https://github.com/prestodb/presto/pull/14960> is created but it
>       has a unit test failure.
>       2.
>
>    Parquet 1.11.x feature adoption to Iceberg
>    1.
>
>       Iceberg meeting notes
>       <
> https://docs.google.com/document/d/1YuGhUdukLP5gGiqCbk0A5_Wifqe2CZWgOd3TbhY3UQg/edit#
> >
>       for discussing this issue.
>       2.
>
>       Issue summary and proposals
>       <
> https://docs.google.com/document/d/1f8erGSnhVcdD0UokGx2opjmGvCU69g7fsiPXCJhP3MA/edit#
> >
>
>       3.
>
>       For having Parquet V2 API to support Iceberg, if we do that, then
>       makes sense to have vectorized API with Parquet V2 API. Let’s bring
> other
>       PMS/commuters to discuss for the next community meeting.
>       3.
>
>    Parquet 1.12.0
>
> a. Will cut RC release soon
>
> Please let me know if you have any questions.
>
> Xinli Shang | Tech Lead Manager @ Uber Data Infra
>
>
> --
> Xinli Shang
>