You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Tham Ha <th...@emotiv.com> on 2019/05/20 10:40:06 UTC

Announcement for c++ parquet encryption

Hi community,



After a long time of development, I'm honor to announce that we have just
completed C++ parquet encryption module which implements encryption in low
level api and with examples included.



To have this feature completed, I would like to thank Gidon and Revital for
their contribution.

Gidon had a key role in encryption design and in writing Java version code
on which we based on to write C++ version. He also wrote crypto package in
C++ version.

Revital and me has been joining together in writing C++ version. Revital
was responsible for AAD calculations, API updating (to be the same with
Java version) and Java-C++ inter-operation testing. I was writing the first
draft (properties, metadata, writer, reader) and keeping them updated when
crypto package change.

We have had a great time to cooperate. Thank Gidon and Revital for all
guide and experience I have received from them, too.



Here are the links of pull requests:

1) encryption module (properties, metadata, writer, reader):
https://github.com/apache/arrow/pull/2555.

2) some merged pull requests for new thrift structure and crypto algorithm,
and one still open: https://github.com/apache/arrow/pull/3520



However, in order to make (1) buildable with current build scripts, we need
“adding openssl in C++ build toolchain” which is mentioned in this jira:
https://issues.apache.org/jira/browse/ARROW-4302. I will be grateful if
someone could help fullfill this work.



About current pull requests, they has been currently using in our
development phase at Emotiv (https://www.emotiv.com/). We love using
parquet files to store EEG data. We are going to release a product with
encrypted parquet files soon and look forward to the official release of
parquet encryption feature. So it will be many thank and great honor to
have you review and merge them (if qualified).



Thank you very much!

Tham

Re: Announcement for c++ parquet encryption

Posted by Gidon Gershinsky <gg...@gmail.com>.
Hi Wes,


*Public API changes -- if it's possible to make the encryption stuffas
unobtrusive to non-encrypted users, that would be preferred *

There are a number of parts here. You might have meant only the first one,
but since you are working with the PARQUET-1568, I'll outline my view of
the full picture.
Also, this discussion has started at the last Parquet sync, now might be
the right time to get back to it (I'll send a separate note).

1) I agree the changes to the current Parquet public API must be minimal.
In a sense, they already are. All we change in the existing APIs, is adding
a method "encryption" to the existing
parquet::WriterProperties::Builder . From the cpp samples:

parquet::WriterProperties::Builder builder;
builder.compression(parquet::Compression::SNAPPY);
builder.encryption(file_encryption_properties);
The file_encryption_properties is also simple - basically, it lists the
columns to be encrypted, along with keys (and key metadata) for each
encrypted column. A trivial example of a key metadata is the key name/id.
So I believe the low-level encryption APIs are simple to use, and
are unobtrusive to non-encrypted users - these just won't call the
.encryption method (and won't create encryption props).

2) There is a way to run encryption with *zero *changes in the current
Parquet API - at least in parquet-mr. For example, we can leverage the
existing Hadoop configuration properties object (or other types of string
"key-value" properties object) already passed to ParquetFileWriter and
ParquetFileReader. This approach, based on table properties, has been
proposed by Ryan; making explicit encryption params disappear is not it's
main goal (see item 3), but still, we can pass everything transparently, as
string parameters: the list of columns names to be encrypted, the keys
(base64 encoded) and key metadata (base64 encoded). Again, this is not the
main goal, since the low level API is already simple; and converting the
keys etc into strings, adding config parameters is not necessarily easier
for a user. Btw, is there a Hadoop-like string key-value params interface
in parquet-cpp?
In any case, switching to the main challenge addressed by the table
properties approach:

3) Encryption management in production systems, and its high level
interface
Per the spec, low level encryption API works with explicit keys (byte
arrays) and key metadata. For production environments, a layer must be
added on top that manages the keys and key metadata: via remote KMS
servers, with authentication/authorization systems, and table metadata
catalogs. There is a wide variety of those, no standard solutions, and the
implementation will always be complex and custom. But we can create a set
of dev tools; and a simple high level interface to that layer. This
interface will be based on (table) properties. The internal implementation
will handle the complexity; it will be comprised of coded parts, and
pluggable custom parts (for things like KMS/Auth/Metadata).
We had a brief discussion on that at the last Parquet sync. There are a
number of proposals on the table, I've volunteered to collect them and
create a draft doc, mainly to facilitate the community discussion. Sending
it out today.


*factoring code into helper methods makes it both easier to read andeasier
to maintain. *

Sound good.

*In the meantime, CI is broken .. so nothing can be merged until that is
taken care of. *

Sure, we'll work on fixing that.


Cheers, Gidon.


On Wed, Jun 5, 2019 at 5:54 AM Wes McKinney <we...@gmail.com> wrote:

> I'll try to spend an hour or two or so looking at the C++ encryption
> patch this week. I'm going to be looking at:
>
> * Public API changes -- it it's possible to make the encryption stuff
> as unobtrusive to non-encrypted users, that would be preferred
> * Stylistic problems: inconsistent naming style, etc.
> * Code duplication
>
> I notice at a glance that some very large methods have been written,
> or that 50-100 lines or more of code have been added to methods. This
> is sometimes a sign of a code factoring issue, and I find that
> factoring code into helper methods makes it both easier to read and
> easier to maintain.
>
> Follow on patches will be decidedly required in order to address some
> of these issues. In the meantime, CI is broken (the unit tests aren't
> passing, and maybe some other issues) so nothing can be merged until
> that is taken care of.
>
> On Thu, May 30, 2019 at 7:42 AM Gidon Gershinsky <gg...@gmail.com> wrote:
> >
> > Thinking of this, it occurred to me that the words "interop" and "test"
> > might be actually misleading in this context.
> > What we have, is a C++ sample, similar to the current reader-writer.cc or
> > the new encryption-reader-writer.cc (both located under the
> > cpp/examples/parquet/low-level-api).
> > The encryption-reader-writer.cc  writes one file using one encryption
> > config, and reads it with one decryption config.
> > Our sample is the same, but it writes N files with N different encryption
> > configs, and reads all these files using M different decryption configs.
> > All encryption/decryption configs are meaningful, we're adding detailed
> > comments to each, so this sample becomes an "encryption HelloWorld" on
> > steroids.
> > A good place to see the code usage of API for different
> > encryption/decryption modes.
> >
> > Now, since we have an identical sample in Java, we can test the
> > interoperability - when files are produced by Java sample and read by C++
> > sample, or vice versa.
> >
> > Given the first part, maybe we should place it together with other
> samples,
> > under cpp/examples/parquet/low-level-api in the apache/arrow repo.
> > This has an added benefit of getting built together with arrow/parquet,
> > without dependency on other repos.
> >
> > Here goes the current version:
> >
> https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-reader-writer-all-crypto-options.cc
> >
> >
> > Cheers, Gidon.
> >
> > On Thu, May 30, 2019 at 12:45 AM Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > If there are encryption integration testing utilities there's not much
> > > reason not to put them in
> > >
> > > https://github.com/apache/arrow/tree/master/cpp/tools/parquet
> > >
> > > (or similar)
> > >
> > > Then there is no additional build system to maintain
> > >
> > > On Wed, May 29, 2019 at 4:41 PM Deepak Majeti <majeti.deepak@gmail.com
> >
> > > wrote:
> > > >
> > > > Hi Gidon,
> > > >
> > > > We should definitely add encrypted parquet files written by
> parquet-mr in
> > > > the parquet-testing repo data folder.
> > > > I also like the idea of keeping the source files in a separate "src"
> > > folder
> > > > in that repository. I see no downside in doing so.
> > > >
> > > > On Tue, May 28, 2019 at 9:23 PM Gidon Gershinsky <gg...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Wes,
> > > > >
> > > > > Yep, I meant private as in not under apache/arrow. Will be more
> careful
> > > > > with that term.
> > > > >
> > > > > https://github.com/apache/parquet-testing  seems to currently
> contain
> > > only
> > > > > data (.parquet) files.
> > > > > Our samples are C++ and Java code files, that write and read
> .parquet
> > > files
> > > > > in different encryption configurations.
> > > > > We can add these encrypted .parquet files to the
> apache/parquet-testing
> > > > > repo.
> > > > > Let us know if we also should add the samples themselves there (by
> > > creating
> > > > > src folders in this repo), or add them to the apache/arrow/mr
> repos,
> > > or to
> > > > > keep them in our personal repos (the ones I've called private
> before).
> > > > > These samples are based on the standard parquet samples (such as
> > > > >
> > > > >
> > >
> https://github.com/apache/arrow/blob/master/cpp/examples/parquet/low-level-api/reader-writer.cc
> > > > > ),
> > > > > with a loop that creates different encryption configs, and
> > > writes/reads the
> > > > > same data with them.
> > > > >
> > > > > Cheers, Gidon.
> > > > >
> > > > >
> > > > > On Tue, May 28, 2019 at 5:51 PM Wes McKinney <we...@gmail.com>
> > > wrote:
> > > > >
> > > > > > hi Gidon,
> > > > > >
> > > > > > Surely you don't mean "private" private -- this was always on
> GitHub
> > > > > > right (otherwise this runs against the Apache Way)?
> > > > > >
> > > > > > I think we should try to have all of the testing code in the
> upstream
> > > > > > repositories. If there are sample files you need to check in (and
> > > they
> > > > > > are not too large), please use the apache/parquet-testing
> repository
> > > > > >
> > > > > > - Wes
> > > > > >
> > > > > > On Tue, May 28, 2019 at 8:00 AM Gidon Gershinsky <
> gg5070@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > Hi Deepak,
> > > > > > >
> > > > > > > Sounds good. Here goes the current cpp interop sample (not
> final
> > > yet).
> > > > > We
> > > > > > > have a sample with a similar code in Java.
> > > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-interop-tests.cc
> > > > > > >
> > > > > > >
> > > > > > > Please let me know if we should add this file to the cpp PR
> and its
> > > > > > staging
> > > > > > > branch (under parquet/low-level-api, or another folder?)
> > > > > > > - or leave this file in our private branch.
> > > > > > >
> > > > > > > Cheers, Gidon.
> > > > > > >
> > > > > > > On Tue, May 28, 2019 at 7:25 AM Deepak Majeti <
> > > majeti.deepak@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Gidon,
> > > > > > > >
> > > > > > > > I agree with your point 2). I merged the PR for OpenSSL build
> > > > > support.
> > > > > > We
> > > > > > > > can do the same for PR-3520 after you push the changes this
> > > week. We
> > > > > > can
> > > > > > > > then simply rebase PR-2555 and make the final review.
> > > > > > > > For 1) I would like to take a look at the inter-op testing
> > > framework
> > > > > > first.
> > > > > > > > Can you share the inter-op sample? Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <
> > > gg5070@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Deepak,
> > > > > > > > >
> > > > > > > > > If it simplifies the code review, we can make this change.
> > > > > Actually,
> > > > > > > > we're
> > > > > > > > > already creating a single branch that has the parquet-cpp
> with
> > > > > > encryption
> > > > > > > > > in one place, ready for build and sample run. This is a
> private
> > > > > > branch -
> > > > > > > > > but we can do the same for the pull request (2555), making
> it
> > > the
> > > > > > staging
> > > > > > > > > branch. However, there are a number of technical details we
> > > need to
> > > > > > > > address
> > > > > > > > > first:
> > > > > > > > >
> > > > > > > > > 1) The private branch we're creating contains not only the
> cpp
> > > > > > encryption
> > > > > > > > > sample, but also an additional interop sample that checks
> > > > > > > > interoperability
> > > > > > > > > with parquet-mr encryption.
> > > > > > > > > Do we want to add this interop sample to the staging
> branch?
> > > Should
> > > > > > this
> > > > > > > > > sample be merged in arrow/parquet-cpp for future interop
> > > testing
> > > > > with
> > > > > > > > Java
> > > > > > > > > encryption? Or this sample can stay in the staging branch
> for
> > > the
> > > > > > > > duration
> > > > > > > > > of the review process, and be removed just before the
> staging
> > > PR is
> > > > > > > > merged
> > > > > > > > > in arrow/parquet-cpp? Or we dont add this interop sample
> to the
> > > > > > staging
> > > > > > > > > branch at all, and keep our private branch for interop
> > > testing? Let
> > > > > > us
> > > > > > > > know
> > > > > > > > > your preference.
> > > > > > > > >
> > > > > > > > > 2) PR 3520 has been reviewed. We'll make the requested
> changes
> > > this
> > > > > > week,
> > > > > > > > > and update the PR. It adds 2 new files, and doesnt touch
> the
> > > > > existing
> > > > > > > > ones
> > > > > > > > > - if you guys will be ok with the updated version, you can
> > > merge
> > > > > it,
> > > > > > and
> > > > > > > > > then the staging PR (2555) will automatically pick the 3520
> > > patch
> > > > > > upon
> > > > > > > > > rebasing to the then-current master state. But, if further
> > > changes
> > > > > > will
> > > > > > > > be
> > > > > > > > > required in the #3520 files, we'll add them to the staging
> PR.
> > > > > > > > >
> > > > > > > > > Cheers, Gidon.
> > > > > > > > >
> > > > > > > > > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <
> > > > > > majeti.deepak@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Gidon and Tham,
> > > > > > > > > >
> > > > > > > > > > I opened a PR for OpenSSL build support here:
> > > > > > > > > > https://github.com/apache/arrow/pull/4384/files
> > > > > > > > > > However, your PRs need to be rebased/merged with master.
> > > > > > > > > > Why don't we keep one staging PR for C++? Probably make
> > > > > > > > > > https://github.com/apache/arrow/pull/2555 the staging
> PR and
> > > > > > > > cherry-pick
> > > > > > > > > > https://github.com/apache/arrow/pull/3520/files and the
> > > above PR
> > > > > > onto
> > > > > > > > > it.
> > > > > > > > > > The later PRs are tiny anyway. This simplifies code
> review,
> > > > > syncing
> > > > > > > > with
> > > > > > > > > > the master, as well as building the encryption work while
> > > > > > reviewing.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <
> > > > > > gg5070@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks guys. We'll create a doc with a description of
> the
> > > > > current
> > > > > > > > > interop
> > > > > > > > > > > tests, including details on how to build/run them.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <
> > > > > > > > majeti.deepak@gmail.com
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I will make the first attempt to review the C++
> code. I
> > > > > > reviewed
> > > > > > > > part
> > > > > > > > > > of
> > > > > > > > > > > it
> > > > > > > > > > > > a while ago.
> > > > > > > > > > > > Java-C++ inter-operation testing is my main concern.
> Can
> > > you
> > > > > > share
> > > > > > > > > some
> > > > > > > > > > > > details on what has been tested?
> > > > > > > > > > > > I need to update myself with the key management part
> and
> > > see
> > > > > > if we
> > > > > > > > > can
> > > > > > > > > > > add
> > > > > > > > > > > > some test parquet files written in Java along with
> the
> > > keys
> > > > > to
> > > > > > the
> > > > > > > > > data
> > > > > > > > > > > > folder.
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <
> > > > > > wesmckinn@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > hi folks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm extraordinarily busy but it seems likely that
> the
> > > code
> > > > > > review
> > > > > > > > > for
> > > > > > > > > > > > > Parquet C++ is largely going to fall on my
> shoulders.
> > > Since
> > > > > > this
> > > > > > > > > is a
> > > > > > > > > > > > > large patch with many touch-points in the C++
> > > codebase, and
> > > > > > there
> > > > > > > > > > > > > hasn't been much feedback over the course of its
> > > > > > development, I
> > > > > > > > > would
> > > > > > > > > > > > > like to leave thoughtful code review so that we can
> > > avoid
> > > > > > issues
> > > > > > > > > that
> > > > > > > > > > > > > may lead to burdensome maintenance issues.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On the OpenSSL issue -- there are many CMake
> projects
> > > in
> > > > > the
> > > > > > wild
> > > > > > > > > > that
> > > > > > > > > > > > > use OpenSSL and there are many examples of how to
> > > > > incorporate
> > > > > > > > other
> > > > > > > > > > > > > dependencies into the Arrow build system. I don't
> have
> > > time
> > > > > > to
> > > > > > > > work
> > > > > > > > > > on
> > > > > > > > > > > > > this right now so someone else will have to sort
> out
> > > the
> > > > > > > > toolchain
> > > > > > > > > > > > > issue
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <
> > > > > > > > gg5070@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > <re-sending from another account>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Tham, thank you for this! and for volunteering
> early
> > > for
> > > > > > the
> > > > > > > > C++
> > > > > > > > > > > > version
> > > > > > > > > > > > > > work, driving it forward and creating the bulk
> of the
> > > > > > > > parquet-cpp
> > > > > > > > > > > > > > encryption code along the way.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @All - this announcement means that two
> > > implementations
> > > > > of
> > > > > > > > > Parquet
> > > > > > > > > > > > > > encryption, fully conforming to the formal
> > > specification,
> > > > > > are
> > > > > > > > > > > available
> > > > > > > > > > > > > > today. Thanks to Revital for contributing to C++
> > > version
> > > > > > > > > compliance
> > > > > > > > > > > > with
> > > > > > > > > > > > > > the encryption spec, and for running a set of
> basic
> > > > > > Java-C++
> > > > > > > > > > > encryption
> > > > > > > > > > > > > > interoperability tests. We have tested plaintext
> and
> > > > > > encrypted
> > > > > > > > > > footer
> > > > > > > > > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy
> > > > > readers.
> > > > > > > > Files
> > > > > > > > > > > > written
> > > > > > > > > > > > > > with parquet-cpp are successfully parsed by
> > > parquet-mr,
> > > > > and
> > > > > > > > vice
> > > > > > > > > > > versa.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli
> for
> > > > > their
> > > > > > > > > support
> > > > > > > > > > > and
> > > > > > > > > > > > > vote
> > > > > > > > > > > > > > for the encryption specification - along with
> the PMC
> > > > > > folks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > All parquet-format pull requests are merged by
> now
> > > into
> > > > > the
> > > > > > > > > > > encryption
> > > > > > > > > > > > > > branch,
> > > > > > > > > > > > > >
> > > https://github.com/apache/parquet-format/tree/encryption
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The community is welcome to review the parquet-mr
> > > pull
> > > > > > > > requests,
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > > following order:
> > > > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Currently, an end-to-end implementation of Java
> (mr)
> > > > > > Parquet
> > > > > > > > > > > encryption
> > > > > > > > > > > > > is
> > > > > > > > > > > > > > collected in this branch:
> > > > > > > > > > > > > >
> https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > > > > > > > > > Thanks to Xinli for working with this branch
> code,
> > > and
> > > > > > > > > contributing
> > > > > > > > > > > to
> > > > > > > > > > > > it
> > > > > > > > > > > > > > based on his field experience. Everybody is
> welcome
> > > to do
> > > > > > the
> > > > > > > > > same.
> > > > > > > > > > > > > > @All - it would be helpful to review & merge the
> > > above
> > > > > PRs
> > > > > > in
> > > > > > > > > > > > > > apache/parquet-mr/encryption, so that folks can
> work
> > > with
> > > > > > it
> > > > > > > > > > instead
> > > > > > > > > > > of
> > > > > > > > > > > > > my
> > > > > > > > > > > > > > private branch..
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > And I certainly second Tham's call to review &
> merge
> > > the
> > > > > > > > > > parquet-cpp
> > > > > > > > > > > > pull
> > > > > > > > > > > > > > requests. By now, we have a number of companies
> > > starting
> > > > > to
> > > > > > > > > utilize
> > > > > > > > > > > > > Parquet
> > > > > > > > > > > > > > encryption (both C++ and Java), including IBM.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Cheers, Gidon.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <
> > > tham@emotiv.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi community,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > After a long time of development, I'm honor to
> > > announce
> > > > > > that
> > > > > > > > we
> > > > > > > > > > > have
> > > > > > > > > > > > > just
> > > > > > > > > > > > > > > completed C++ parquet encryption module which
> > > > > implements
> > > > > > > > > > encryption
> > > > > > > > > > > > in
> > > > > > > > > > > > > low
> > > > > > > > > > > > > > > level api and with examples included.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > To have this feature completed, I would like to
> > > thank
> > > > > > Gidon
> > > > > > > > and
> > > > > > > > > > > > > Revital for
> > > > > > > > > > > > > > > their contribution.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Gidon had a key role in encryption design and
> in
> > > > > writing
> > > > > > Java
> > > > > > > > > > > version
> > > > > > > > > > > > > code
> > > > > > > > > > > > > > > on which we based on to write C++ version. He
> also
> > > > > wrote
> > > > > > > > crypto
> > > > > > > > > > > > > package in
> > > > > > > > > > > > > > > C++ version.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Revital and me has been joining together in
> > > writing C++
> > > > > > > > > version.
> > > > > > > > > > > > > Revital
> > > > > > > > > > > > > > > was responsible for AAD calculations, API
> updating
> > > (to
> > > > > > be the
> > > > > > > > > > same
> > > > > > > > > > > > with
> > > > > > > > > > > > > > > Java version) and Java-C++ inter-operation
> > > testing. I
> > > > > was
> > > > > > > > > writing
> > > > > > > > > > > the
> > > > > > > > > > > > > first
> > > > > > > > > > > > > > > draft (properties, metadata, writer, reader)
> and
> > > > > keeping
> > > > > > them
> > > > > > > > > > > updated
> > > > > > > > > > > > > when
> > > > > > > > > > > > > > > crypto package change.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We have had a great time to cooperate. Thank
> Gidon
> > > and
> > > > > > > > Revital
> > > > > > > > > > for
> > > > > > > > > > > > all
> > > > > > > > > > > > > > > guide and experience I have received from them,
> > > too.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Here are the links of pull requests:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1) encryption module (properties, metadata,
> writer,
> > > > > > reader):
> > > > > > > > > > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2) some merged pull requests for new thrift
> > > structure
> > > > > and
> > > > > > > > > crypto
> > > > > > > > > > > > > algorithm,
> > > > > > > > > > > > > > > and one still open:
> > > > > > > > https://github.com/apache/arrow/pull/3520
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > However, in order to make (1) buildable with
> > > current
> > > > > > build
> > > > > > > > > > scripts,
> > > > > > > > > > > > we
> > > > > > > > > > > > > need
> > > > > > > > > > > > > > > “adding openssl in C++ build toolchain” which
> is
> > > > > > mentioned in
> > > > > > > > > > this
> > > > > > > > > > > > > jira:
> > > > > > > > > > > > > > >
> https://issues.apache.org/jira/browse/ARROW-4302.
> > > I
> > > > > > will be
> > > > > > > > > > > grateful
> > > > > > > > > > > > > if
> > > > > > > > > > > > > > > someone could help fullfill this work.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > About current pull requests, they has been
> > > currently
> > > > > > using in
> > > > > > > > > our
> > > > > > > > > > > > > > > development phase at Emotiv (
> > > https://www.emotiv.com/).
> > > > > > We
> > > > > > > > love
> > > > > > > > > > > using
> > > > > > > > > > > > > > > parquet files to store EEG data. We are going
> to
> > > > > release
> > > > > > a
> > > > > > > > > > product
> > > > > > > > > > > > with
> > > > > > > > > > > > > > > encrypted parquet files soon and look forward
> to
> > > the
> > > > > > official
> > > > > > > > > > > release
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > parquet encryption feature. So it will be many
> > > thank
> > > > > and
> > > > > > > > great
> > > > > > > > > > > honor
> > > > > > > > > > > > to
> > > > > > > > > > > > > > > have you review and merge them (if qualified).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you very much!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Tham
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > regards,
> > > > > > > > > > > > Deepak Majeti
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > regards,
> > > > > > > > > > Deepak Majeti
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > regards,
> > > > > > > > Deepak Majeti
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > regards,
> > > > Deepak Majeti
> > >
>

Re: Announcement for c++ parquet encryption

Posted by Wes McKinney <we...@gmail.com>.
I'll try to spend an hour or two or so looking at the C++ encryption
patch this week. I'm going to be looking at:

* Public API changes -- it it's possible to make the encryption stuff
as unobtrusive to non-encrypted users, that would be preferred
* Stylistic problems: inconsistent naming style, etc.
* Code duplication

I notice at a glance that some very large methods have been written,
or that 50-100 lines or more of code have been added to methods. This
is sometimes a sign of a code factoring issue, and I find that
factoring code into helper methods makes it both easier to read and
easier to maintain.

Follow on patches will be decidedly required in order to address some
of these issues. In the meantime, CI is broken (the unit tests aren't
passing, and maybe some other issues) so nothing can be merged until
that is taken care of.

On Thu, May 30, 2019 at 7:42 AM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> Thinking of this, it occurred to me that the words "interop" and "test"
> might be actually misleading in this context.
> What we have, is a C++ sample, similar to the current reader-writer.cc or
> the new encryption-reader-writer.cc (both located under the
> cpp/examples/parquet/low-level-api).
> The encryption-reader-writer.cc  writes one file using one encryption
> config, and reads it with one decryption config.
> Our sample is the same, but it writes N files with N different encryption
> configs, and reads all these files using M different decryption configs.
> All encryption/decryption configs are meaningful, we're adding detailed
> comments to each, so this sample becomes an "encryption HelloWorld" on
> steroids.
> A good place to see the code usage of API for different
> encryption/decryption modes.
>
> Now, since we have an identical sample in Java, we can test the
> interoperability - when files are produced by Java sample and read by C++
> sample, or vice versa.
>
> Given the first part, maybe we should place it together with other samples,
> under cpp/examples/parquet/low-level-api in the apache/arrow repo.
> This has an added benefit of getting built together with arrow/parquet,
> without dependency on other repos.
>
> Here goes the current version:
> https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-reader-writer-all-crypto-options.cc
>
>
> Cheers, Gidon.
>
> On Thu, May 30, 2019 at 12:45 AM Wes McKinney <we...@gmail.com> wrote:
>
> > If there are encryption integration testing utilities there's not much
> > reason not to put them in
> >
> > https://github.com/apache/arrow/tree/master/cpp/tools/parquet
> >
> > (or similar)
> >
> > Then there is no additional build system to maintain
> >
> > On Wed, May 29, 2019 at 4:41 PM Deepak Majeti <ma...@gmail.com>
> > wrote:
> > >
> > > Hi Gidon,
> > >
> > > We should definitely add encrypted parquet files written by parquet-mr in
> > > the parquet-testing repo data folder.
> > > I also like the idea of keeping the source files in a separate "src"
> > folder
> > > in that repository. I see no downside in doing so.
> > >
> > > On Tue, May 28, 2019 at 9:23 PM Gidon Gershinsky <gg...@gmail.com>
> > wrote:
> > >
> > > > Hi Wes,
> > > >
> > > > Yep, I meant private as in not under apache/arrow. Will be more careful
> > > > with that term.
> > > >
> > > > https://github.com/apache/parquet-testing  seems to currently contain
> > only
> > > > data (.parquet) files.
> > > > Our samples are C++ and Java code files, that write and read .parquet
> > files
> > > > in different encryption configurations.
> > > > We can add these encrypted .parquet files to the apache/parquet-testing
> > > > repo.
> > > > Let us know if we also should add the samples themselves there (by
> > creating
> > > > src folders in this repo), or add them to the apache/arrow/mr repos,
> > or to
> > > > keep them in our personal repos (the ones I've called private before).
> > > > These samples are based on the standard parquet samples (such as
> > > >
> > > >
> > https://github.com/apache/arrow/blob/master/cpp/examples/parquet/low-level-api/reader-writer.cc
> > > > ),
> > > > with a loop that creates different encryption configs, and
> > writes/reads the
> > > > same data with them.
> > > >
> > > > Cheers, Gidon.
> > > >
> > > >
> > > > On Tue, May 28, 2019 at 5:51 PM Wes McKinney <we...@gmail.com>
> > wrote:
> > > >
> > > > > hi Gidon,
> > > > >
> > > > > Surely you don't mean "private" private -- this was always on GitHub
> > > > > right (otherwise this runs against the Apache Way)?
> > > > >
> > > > > I think we should try to have all of the testing code in the upstream
> > > > > repositories. If there are sample files you need to check in (and
> > they
> > > > > are not too large), please use the apache/parquet-testing repository
> > > > >
> > > > > - Wes
> > > > >
> > > > > On Tue, May 28, 2019 at 8:00 AM Gidon Gershinsky <gg...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > Hi Deepak,
> > > > > >
> > > > > > Sounds good. Here goes the current cpp interop sample (not final
> > yet).
> > > > We
> > > > > > have a sample with a similar code in Java.
> > > > > >
> > > > >
> > > >
> > https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-interop-tests.cc
> > > > > >
> > > > > >
> > > > > > Please let me know if we should add this file to the cpp PR and its
> > > > > staging
> > > > > > branch (under parquet/low-level-api, or another folder?)
> > > > > > - or leave this file in our private branch.
> > > > > >
> > > > > > Cheers, Gidon.
> > > > > >
> > > > > > On Tue, May 28, 2019 at 7:25 AM Deepak Majeti <
> > majeti.deepak@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Gidon,
> > > > > > >
> > > > > > > I agree with your point 2). I merged the PR for OpenSSL build
> > > > support.
> > > > > We
> > > > > > > can do the same for PR-3520 after you push the changes this
> > week. We
> > > > > can
> > > > > > > then simply rebase PR-2555 and make the final review.
> > > > > > > For 1) I would like to take a look at the inter-op testing
> > framework
> > > > > first.
> > > > > > > Can you share the inter-op sample? Thanks.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <
> > gg5070@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Deepak,
> > > > > > > >
> > > > > > > > If it simplifies the code review, we can make this change.
> > > > Actually,
> > > > > > > we're
> > > > > > > > already creating a single branch that has the parquet-cpp with
> > > > > encryption
> > > > > > > > in one place, ready for build and sample run. This is a private
> > > > > branch -
> > > > > > > > but we can do the same for the pull request (2555), making it
> > the
> > > > > staging
> > > > > > > > branch. However, there are a number of technical details we
> > need to
> > > > > > > address
> > > > > > > > first:
> > > > > > > >
> > > > > > > > 1) The private branch we're creating contains not only the cpp
> > > > > encryption
> > > > > > > > sample, but also an additional interop sample that checks
> > > > > > > interoperability
> > > > > > > > with parquet-mr encryption.
> > > > > > > > Do we want to add this interop sample to the staging branch?
> > Should
> > > > > this
> > > > > > > > sample be merged in arrow/parquet-cpp for future interop
> > testing
> > > > with
> > > > > > > Java
> > > > > > > > encryption? Or this sample can stay in the staging branch for
> > the
> > > > > > > duration
> > > > > > > > of the review process, and be removed just before the staging
> > PR is
> > > > > > > merged
> > > > > > > > in arrow/parquet-cpp? Or we dont add this interop sample to the
> > > > > staging
> > > > > > > > branch at all, and keep our private branch for interop
> > testing? Let
> > > > > us
> > > > > > > know
> > > > > > > > your preference.
> > > > > > > >
> > > > > > > > 2) PR 3520 has been reviewed. We'll make the requested changes
> > this
> > > > > week,
> > > > > > > > and update the PR. It adds 2 new files, and doesnt touch the
> > > > existing
> > > > > > > ones
> > > > > > > > - if you guys will be ok with the updated version, you can
> > merge
> > > > it,
> > > > > and
> > > > > > > > then the staging PR (2555) will automatically pick the 3520
> > patch
> > > > > upon
> > > > > > > > rebasing to the then-current master state. But, if further
> > changes
> > > > > will
> > > > > > > be
> > > > > > > > required in the #3520 files, we'll add them to the staging PR.
> > > > > > > >
> > > > > > > > Cheers, Gidon.
> > > > > > > >
> > > > > > > > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <
> > > > > majeti.deepak@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Gidon and Tham,
> > > > > > > > >
> > > > > > > > > I opened a PR for OpenSSL build support here:
> > > > > > > > > https://github.com/apache/arrow/pull/4384/files
> > > > > > > > > However, your PRs need to be rebased/merged with master.
> > > > > > > > > Why don't we keep one staging PR for C++? Probably make
> > > > > > > > > https://github.com/apache/arrow/pull/2555 the staging PR and
> > > > > > > cherry-pick
> > > > > > > > > https://github.com/apache/arrow/pull/3520/files and the
> > above PR
> > > > > onto
> > > > > > > > it.
> > > > > > > > > The later PRs are tiny anyway. This simplifies code review,
> > > > syncing
> > > > > > > with
> > > > > > > > > the master, as well as building the encryption work while
> > > > > reviewing.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <
> > > > > gg5070@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks guys. We'll create a doc with a description of the
> > > > current
> > > > > > > > interop
> > > > > > > > > > tests, including details on how to build/run them.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <
> > > > > > > majeti.deepak@gmail.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I will make the first attempt to review the C++ code. I
> > > > > reviewed
> > > > > > > part
> > > > > > > > > of
> > > > > > > > > > it
> > > > > > > > > > > a while ago.
> > > > > > > > > > > Java-C++ inter-operation testing is my main concern. Can
> > you
> > > > > share
> > > > > > > > some
> > > > > > > > > > > details on what has been tested?
> > > > > > > > > > > I need to update myself with the key management part and
> > see
> > > > > if we
> > > > > > > > can
> > > > > > > > > > add
> > > > > > > > > > > some test parquet files written in Java along with the
> > keys
> > > > to
> > > > > the
> > > > > > > > data
> > > > > > > > > > > folder.
> > > > > > > > > > >
> > > > > > > > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <
> > > > > wesmckinn@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > hi folks,
> > > > > > > > > > > >
> > > > > > > > > > > > I'm extraordinarily busy but it seems likely that the
> > code
> > > > > review
> > > > > > > > for
> > > > > > > > > > > > Parquet C++ is largely going to fall on my shoulders.
> > Since
> > > > > this
> > > > > > > > is a
> > > > > > > > > > > > large patch with many touch-points in the C++
> > codebase, and
> > > > > there
> > > > > > > > > > > > hasn't been much feedback over the course of its
> > > > > development, I
> > > > > > > > would
> > > > > > > > > > > > like to leave thoughtful code review so that we can
> > avoid
> > > > > issues
> > > > > > > > that
> > > > > > > > > > > > may lead to burdensome maintenance issues.
> > > > > > > > > > > >
> > > > > > > > > > > > On the OpenSSL issue -- there are many CMake projects
> > in
> > > > the
> > > > > wild
> > > > > > > > > that
> > > > > > > > > > > > use OpenSSL and there are many examples of how to
> > > > incorporate
> > > > > > > other
> > > > > > > > > > > > dependencies into the Arrow build system. I don't have
> > time
> > > > > to
> > > > > > > work
> > > > > > > > > on
> > > > > > > > > > > > this right now so someone else will have to sort out
> > the
> > > > > > > toolchain
> > > > > > > > > > > > issue
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <
> > > > > > > gg5070@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > <re-sending from another account>
> > > > > > > > > > > > >
> > > > > > > > > > > > > Tham, thank you for this! and for volunteering early
> > for
> > > > > the
> > > > > > > C++
> > > > > > > > > > > version
> > > > > > > > > > > > > work, driving it forward and creating the bulk of the
> > > > > > > parquet-cpp
> > > > > > > > > > > > > encryption code along the way.
> > > > > > > > > > > > >
> > > > > > > > > > > > > @All - this announcement means that two
> > implementations
> > > > of
> > > > > > > > Parquet
> > > > > > > > > > > > > encryption, fully conforming to the formal
> > specification,
> > > > > are
> > > > > > > > > > available
> > > > > > > > > > > > > today. Thanks to Revital for contributing to C++
> > version
> > > > > > > > compliance
> > > > > > > > > > > with
> > > > > > > > > > > > > the encryption spec, and for running a set of basic
> > > > > Java-C++
> > > > > > > > > > encryption
> > > > > > > > > > > > > interoperability tests. We have tested plaintext and
> > > > > encrypted
> > > > > > > > > footer
> > > > > > > > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy
> > > > readers.
> > > > > > > Files
> > > > > > > > > > > written
> > > > > > > > > > > > > with parquet-cpp are successfully parsed by
> > parquet-mr,
> > > > and
> > > > > > > vice
> > > > > > > > > > versa.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for
> > > > their
> > > > > > > > support
> > > > > > > > > > and
> > > > > > > > > > > > vote
> > > > > > > > > > > > > for the encryption specification - along with the PMC
> > > > > folks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > All parquet-format pull requests are merged by now
> > into
> > > > the
> > > > > > > > > > encryption
> > > > > > > > > > > > > branch,
> > > > > > > > > > > > >
> > https://github.com/apache/parquet-format/tree/encryption
> > > > > > > > > > > > >
> > > > > > > > > > > > > The community is welcome to review the parquet-mr
> > pull
> > > > > > > requests,
> > > > > > > > in
> > > > > > > > > > the
> > > > > > > > > > > > > following order:
> > > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Currently, an end-to-end implementation of Java (mr)
> > > > > Parquet
> > > > > > > > > > encryption
> > > > > > > > > > > > is
> > > > > > > > > > > > > collected in this branch:
> > > > > > > > > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > > > > > > > > Thanks to Xinli for working with this branch code,
> > and
> > > > > > > > contributing
> > > > > > > > > > to
> > > > > > > > > > > it
> > > > > > > > > > > > > based on his field experience. Everybody is welcome
> > to do
> > > > > the
> > > > > > > > same.
> > > > > > > > > > > > > @All - it would be helpful to review & merge the
> > above
> > > > PRs
> > > > > in
> > > > > > > > > > > > > apache/parquet-mr/encryption, so that folks can work
> > with
> > > > > it
> > > > > > > > > instead
> > > > > > > > > > of
> > > > > > > > > > > > my
> > > > > > > > > > > > > private branch..
> > > > > > > > > > > > >
> > > > > > > > > > > > > And I certainly second Tham's call to review & merge
> > the
> > > > > > > > > parquet-cpp
> > > > > > > > > > > pull
> > > > > > > > > > > > > requests. By now, we have a number of companies
> > starting
> > > > to
> > > > > > > > utilize
> > > > > > > > > > > > Parquet
> > > > > > > > > > > > > encryption (both C++ and Java), including IBM.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers, Gidon.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <
> > tham@emotiv.com
> > > > >
> > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi community,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > After a long time of development, I'm honor to
> > announce
> > > > > that
> > > > > > > we
> > > > > > > > > > have
> > > > > > > > > > > > just
> > > > > > > > > > > > > > completed C++ parquet encryption module which
> > > > implements
> > > > > > > > > encryption
> > > > > > > > > > > in
> > > > > > > > > > > > low
> > > > > > > > > > > > > > level api and with examples included.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > To have this feature completed, I would like to
> > thank
> > > > > Gidon
> > > > > > > and
> > > > > > > > > > > > Revital for
> > > > > > > > > > > > > > their contribution.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Gidon had a key role in encryption design and in
> > > > writing
> > > > > Java
> > > > > > > > > > version
> > > > > > > > > > > > code
> > > > > > > > > > > > > > on which we based on to write C++ version. He also
> > > > wrote
> > > > > > > crypto
> > > > > > > > > > > > package in
> > > > > > > > > > > > > > C++ version.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Revital and me has been joining together in
> > writing C++
> > > > > > > > version.
> > > > > > > > > > > > Revital
> > > > > > > > > > > > > > was responsible for AAD calculations, API updating
> > (to
> > > > > be the
> > > > > > > > > same
> > > > > > > > > > > with
> > > > > > > > > > > > > > Java version) and Java-C++ inter-operation
> > testing. I
> > > > was
> > > > > > > > writing
> > > > > > > > > > the
> > > > > > > > > > > > first
> > > > > > > > > > > > > > draft (properties, metadata, writer, reader) and
> > > > keeping
> > > > > them
> > > > > > > > > > updated
> > > > > > > > > > > > when
> > > > > > > > > > > > > > crypto package change.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We have had a great time to cooperate. Thank Gidon
> > and
> > > > > > > Revital
> > > > > > > > > for
> > > > > > > > > > > all
> > > > > > > > > > > > > > guide and experience I have received from them,
> > too.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Here are the links of pull requests:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1) encryption module (properties, metadata, writer,
> > > > > reader):
> > > > > > > > > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 2) some merged pull requests for new thrift
> > structure
> > > > and
> > > > > > > > crypto
> > > > > > > > > > > > algorithm,
> > > > > > > > > > > > > > and one still open:
> > > > > > > https://github.com/apache/arrow/pull/3520
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > However, in order to make (1) buildable with
> > current
> > > > > build
> > > > > > > > > scripts,
> > > > > > > > > > > we
> > > > > > > > > > > > need
> > > > > > > > > > > > > > “adding openssl in C++ build toolchain” which is
> > > > > mentioned in
> > > > > > > > > this
> > > > > > > > > > > > jira:
> > > > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-4302.
> > I
> > > > > will be
> > > > > > > > > > grateful
> > > > > > > > > > > > if
> > > > > > > > > > > > > > someone could help fullfill this work.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > About current pull requests, they has been
> > currently
> > > > > using in
> > > > > > > > our
> > > > > > > > > > > > > > development phase at Emotiv (
> > https://www.emotiv.com/).
> > > > > We
> > > > > > > love
> > > > > > > > > > using
> > > > > > > > > > > > > > parquet files to store EEG data. We are going to
> > > > release
> > > > > a
> > > > > > > > > product
> > > > > > > > > > > with
> > > > > > > > > > > > > > encrypted parquet files soon and look forward to
> > the
> > > > > official
> > > > > > > > > > release
> > > > > > > > > > > > of
> > > > > > > > > > > > > > parquet encryption feature. So it will be many
> > thank
> > > > and
> > > > > > > great
> > > > > > > > > > honor
> > > > > > > > > > > to
> > > > > > > > > > > > > > have you review and merge them (if qualified).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you very much!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Tham
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > regards,
> > > > > > > > > > > Deepak Majeti
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > regards,
> > > > > > > > > Deepak Majeti
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > regards,
> > > > > > > Deepak Majeti
> > > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > regards,
> > > Deepak Majeti
> >

Re: Announcement for c++ parquet encryption

Posted by Gidon Gershinsky <gg...@gmail.com>.
Thinking of this, it occurred to me that the words "interop" and "test"
might be actually misleading in this context.
What we have, is a C++ sample, similar to the current reader-writer.cc or
the new encryption-reader-writer.cc (both located under the
cpp/examples/parquet/low-level-api).
The encryption-reader-writer.cc  writes one file using one encryption
config, and reads it with one decryption config.
Our sample is the same, but it writes N files with N different encryption
configs, and reads all these files using M different decryption configs.
All encryption/decryption configs are meaningful, we're adding detailed
comments to each, so this sample becomes an "encryption HelloWorld" on
steroids.
A good place to see the code usage of API for different
encryption/decryption modes.

Now, since we have an identical sample in Java, we can test the
interoperability - when files are produced by Java sample and read by C++
sample, or vice versa.

Given the first part, maybe we should place it together with other samples,
under cpp/examples/parquet/low-level-api in the apache/arrow repo.
This has an added benefit of getting built together with arrow/parquet,
without dependency on other repos.

Here goes the current version:
https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-reader-writer-all-crypto-options.cc


Cheers, Gidon.

On Thu, May 30, 2019 at 12:45 AM Wes McKinney <we...@gmail.com> wrote:

> If there are encryption integration testing utilities there's not much
> reason not to put them in
>
> https://github.com/apache/arrow/tree/master/cpp/tools/parquet
>
> (or similar)
>
> Then there is no additional build system to maintain
>
> On Wed, May 29, 2019 at 4:41 PM Deepak Majeti <ma...@gmail.com>
> wrote:
> >
> > Hi Gidon,
> >
> > We should definitely add encrypted parquet files written by parquet-mr in
> > the parquet-testing repo data folder.
> > I also like the idea of keeping the source files in a separate "src"
> folder
> > in that repository. I see no downside in doing so.
> >
> > On Tue, May 28, 2019 at 9:23 PM Gidon Gershinsky <gg...@gmail.com>
> wrote:
> >
> > > Hi Wes,
> > >
> > > Yep, I meant private as in not under apache/arrow. Will be more careful
> > > with that term.
> > >
> > > https://github.com/apache/parquet-testing  seems to currently contain
> only
> > > data (.parquet) files.
> > > Our samples are C++ and Java code files, that write and read .parquet
> files
> > > in different encryption configurations.
> > > We can add these encrypted .parquet files to the apache/parquet-testing
> > > repo.
> > > Let us know if we also should add the samples themselves there (by
> creating
> > > src folders in this repo), or add them to the apache/arrow/mr repos,
> or to
> > > keep them in our personal repos (the ones I've called private before).
> > > These samples are based on the standard parquet samples (such as
> > >
> > >
> https://github.com/apache/arrow/blob/master/cpp/examples/parquet/low-level-api/reader-writer.cc
> > > ),
> > > with a loop that creates different encryption configs, and
> writes/reads the
> > > same data with them.
> > >
> > > Cheers, Gidon.
> > >
> > >
> > > On Tue, May 28, 2019 at 5:51 PM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > > hi Gidon,
> > > >
> > > > Surely you don't mean "private" private -- this was always on GitHub
> > > > right (otherwise this runs against the Apache Way)?
> > > >
> > > > I think we should try to have all of the testing code in the upstream
> > > > repositories. If there are sample files you need to check in (and
> they
> > > > are not too large), please use the apache/parquet-testing repository
> > > >
> > > > - Wes
> > > >
> > > > On Tue, May 28, 2019 at 8:00 AM Gidon Gershinsky <gg...@gmail.com>
> > > wrote:
> > > > >
> > > > > Hi Deepak,
> > > > >
> > > > > Sounds good. Here goes the current cpp interop sample (not final
> yet).
> > > We
> > > > > have a sample with a similar code in Java.
> > > > >
> > > >
> > >
> https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-interop-tests.cc
> > > > >
> > > > >
> > > > > Please let me know if we should add this file to the cpp PR and its
> > > > staging
> > > > > branch (under parquet/low-level-api, or another folder?)
> > > > > - or leave this file in our private branch.
> > > > >
> > > > > Cheers, Gidon.
> > > > >
> > > > > On Tue, May 28, 2019 at 7:25 AM Deepak Majeti <
> majeti.deepak@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Gidon,
> > > > > >
> > > > > > I agree with your point 2). I merged the PR for OpenSSL build
> > > support.
> > > > We
> > > > > > can do the same for PR-3520 after you push the changes this
> week. We
> > > > can
> > > > > > then simply rebase PR-2555 and make the final review.
> > > > > > For 1) I would like to take a look at the inter-op testing
> framework
> > > > first.
> > > > > > Can you share the inter-op sample? Thanks.
> > > > > >
> > > > > >
> > > > > > On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <
> gg5070@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi Deepak,
> > > > > > >
> > > > > > > If it simplifies the code review, we can make this change.
> > > Actually,
> > > > > > we're
> > > > > > > already creating a single branch that has the parquet-cpp with
> > > > encryption
> > > > > > > in one place, ready for build and sample run. This is a private
> > > > branch -
> > > > > > > but we can do the same for the pull request (2555), making it
> the
> > > > staging
> > > > > > > branch. However, there are a number of technical details we
> need to
> > > > > > address
> > > > > > > first:
> > > > > > >
> > > > > > > 1) The private branch we're creating contains not only the cpp
> > > > encryption
> > > > > > > sample, but also an additional interop sample that checks
> > > > > > interoperability
> > > > > > > with parquet-mr encryption.
> > > > > > > Do we want to add this interop sample to the staging branch?
> Should
> > > > this
> > > > > > > sample be merged in arrow/parquet-cpp for future interop
> testing
> > > with
> > > > > > Java
> > > > > > > encryption? Or this sample can stay in the staging branch for
> the
> > > > > > duration
> > > > > > > of the review process, and be removed just before the staging
> PR is
> > > > > > merged
> > > > > > > in arrow/parquet-cpp? Or we dont add this interop sample to the
> > > > staging
> > > > > > > branch at all, and keep our private branch for interop
> testing? Let
> > > > us
> > > > > > know
> > > > > > > your preference.
> > > > > > >
> > > > > > > 2) PR 3520 has been reviewed. We'll make the requested changes
> this
> > > > week,
> > > > > > > and update the PR. It adds 2 new files, and doesnt touch the
> > > existing
> > > > > > ones
> > > > > > > - if you guys will be ok with the updated version, you can
> merge
> > > it,
> > > > and
> > > > > > > then the staging PR (2555) will automatically pick the 3520
> patch
> > > > upon
> > > > > > > rebasing to the then-current master state. But, if further
> changes
> > > > will
> > > > > > be
> > > > > > > required in the #3520 files, we'll add them to the staging PR.
> > > > > > >
> > > > > > > Cheers, Gidon.
> > > > > > >
> > > > > > > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <
> > > > majeti.deepak@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Gidon and Tham,
> > > > > > > >
> > > > > > > > I opened a PR for OpenSSL build support here:
> > > > > > > > https://github.com/apache/arrow/pull/4384/files
> > > > > > > > However, your PRs need to be rebased/merged with master.
> > > > > > > > Why don't we keep one staging PR for C++? Probably make
> > > > > > > > https://github.com/apache/arrow/pull/2555 the staging PR and
> > > > > > cherry-pick
> > > > > > > > https://github.com/apache/arrow/pull/3520/files and the
> above PR
> > > > onto
> > > > > > > it.
> > > > > > > > The later PRs are tiny anyway. This simplifies code review,
> > > syncing
> > > > > > with
> > > > > > > > the master, as well as building the encryption work while
> > > > reviewing.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <
> > > > gg5070@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks guys. We'll create a doc with a description of the
> > > current
> > > > > > > interop
> > > > > > > > > tests, including details on how to build/run them.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <
> > > > > > majeti.deepak@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I will make the first attempt to review the C++ code. I
> > > > reviewed
> > > > > > part
> > > > > > > > of
> > > > > > > > > it
> > > > > > > > > > a while ago.
> > > > > > > > > > Java-C++ inter-operation testing is my main concern. Can
> you
> > > > share
> > > > > > > some
> > > > > > > > > > details on what has been tested?
> > > > > > > > > > I need to update myself with the key management part and
> see
> > > > if we
> > > > > > > can
> > > > > > > > > add
> > > > > > > > > > some test parquet files written in Java along with the
> keys
> > > to
> > > > the
> > > > > > > data
> > > > > > > > > > folder.
> > > > > > > > > >
> > > > > > > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <
> > > > wesmckinn@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > hi folks,
> > > > > > > > > > >
> > > > > > > > > > > I'm extraordinarily busy but it seems likely that the
> code
> > > > review
> > > > > > > for
> > > > > > > > > > > Parquet C++ is largely going to fall on my shoulders.
> Since
> > > > this
> > > > > > > is a
> > > > > > > > > > > large patch with many touch-points in the C++
> codebase, and
> > > > there
> > > > > > > > > > > hasn't been much feedback over the course of its
> > > > development, I
> > > > > > > would
> > > > > > > > > > > like to leave thoughtful code review so that we can
> avoid
> > > > issues
> > > > > > > that
> > > > > > > > > > > may lead to burdensome maintenance issues.
> > > > > > > > > > >
> > > > > > > > > > > On the OpenSSL issue -- there are many CMake projects
> in
> > > the
> > > > wild
> > > > > > > > that
> > > > > > > > > > > use OpenSSL and there are many examples of how to
> > > incorporate
> > > > > > other
> > > > > > > > > > > dependencies into the Arrow build system. I don't have
> time
> > > > to
> > > > > > work
> > > > > > > > on
> > > > > > > > > > > this right now so someone else will have to sort out
> the
> > > > > > toolchain
> > > > > > > > > > > issue
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <
> > > > > > gg5070@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > <re-sending from another account>
> > > > > > > > > > > >
> > > > > > > > > > > > Tham, thank you for this! and for volunteering early
> for
> > > > the
> > > > > > C++
> > > > > > > > > > version
> > > > > > > > > > > > work, driving it forward and creating the bulk of the
> > > > > > parquet-cpp
> > > > > > > > > > > > encryption code along the way.
> > > > > > > > > > > >
> > > > > > > > > > > > @All - this announcement means that two
> implementations
> > > of
> > > > > > > Parquet
> > > > > > > > > > > > encryption, fully conforming to the formal
> specification,
> > > > are
> > > > > > > > > available
> > > > > > > > > > > > today. Thanks to Revital for contributing to C++
> version
> > > > > > > compliance
> > > > > > > > > > with
> > > > > > > > > > > > the encryption spec, and for running a set of basic
> > > > Java-C++
> > > > > > > > > encryption
> > > > > > > > > > > > interoperability tests. We have tested plaintext and
> > > > encrypted
> > > > > > > > footer
> > > > > > > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy
> > > readers.
> > > > > > Files
> > > > > > > > > > written
> > > > > > > > > > > > with parquet-cpp are successfully parsed by
> parquet-mr,
> > > and
> > > > > > vice
> > > > > > > > > versa.
> > > > > > > > > > > >
> > > > > > > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for
> > > their
> > > > > > > support
> > > > > > > > > and
> > > > > > > > > > > vote
> > > > > > > > > > > > for the encryption specification - along with the PMC
> > > > folks.
> > > > > > > > > > > >
> > > > > > > > > > > > All parquet-format pull requests are merged by now
> into
> > > the
> > > > > > > > > encryption
> > > > > > > > > > > > branch,
> > > > > > > > > > > >
> https://github.com/apache/parquet-format/tree/encryption
> > > > > > > > > > > >
> > > > > > > > > > > > The community is welcome to review the parquet-mr
> pull
> > > > > > requests,
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > following order:
> > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > > > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Currently, an end-to-end implementation of Java (mr)
> > > > Parquet
> > > > > > > > > encryption
> > > > > > > > > > > is
> > > > > > > > > > > > collected in this branch:
> > > > > > > > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > > > > > > > Thanks to Xinli for working with this branch code,
> and
> > > > > > > contributing
> > > > > > > > > to
> > > > > > > > > > it
> > > > > > > > > > > > based on his field experience. Everybody is welcome
> to do
> > > > the
> > > > > > > same.
> > > > > > > > > > > > @All - it would be helpful to review & merge the
> above
> > > PRs
> > > > in
> > > > > > > > > > > > apache/parquet-mr/encryption, so that folks can work
> with
> > > > it
> > > > > > > > instead
> > > > > > > > > of
> > > > > > > > > > > my
> > > > > > > > > > > > private branch..
> > > > > > > > > > > >
> > > > > > > > > > > > And I certainly second Tham's call to review & merge
> the
> > > > > > > > parquet-cpp
> > > > > > > > > > pull
> > > > > > > > > > > > requests. By now, we have a number of companies
> starting
> > > to
> > > > > > > utilize
> > > > > > > > > > > Parquet
> > > > > > > > > > > > encryption (both C++ and Java), including IBM.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Cheers, Gidon.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <
> tham@emotiv.com
> > > >
> > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi community,
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > After a long time of development, I'm honor to
> announce
> > > > that
> > > > > > we
> > > > > > > > > have
> > > > > > > > > > > just
> > > > > > > > > > > > > completed C++ parquet encryption module which
> > > implements
> > > > > > > > encryption
> > > > > > > > > > in
> > > > > > > > > > > low
> > > > > > > > > > > > > level api and with examples included.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > To have this feature completed, I would like to
> thank
> > > > Gidon
> > > > > > and
> > > > > > > > > > > Revital for
> > > > > > > > > > > > > their contribution.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Gidon had a key role in encryption design and in
> > > writing
> > > > Java
> > > > > > > > > version
> > > > > > > > > > > code
> > > > > > > > > > > > > on which we based on to write C++ version. He also
> > > wrote
> > > > > > crypto
> > > > > > > > > > > package in
> > > > > > > > > > > > > C++ version.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Revital and me has been joining together in
> writing C++
> > > > > > > version.
> > > > > > > > > > > Revital
> > > > > > > > > > > > > was responsible for AAD calculations, API updating
> (to
> > > > be the
> > > > > > > > same
> > > > > > > > > > with
> > > > > > > > > > > > > Java version) and Java-C++ inter-operation
> testing. I
> > > was
> > > > > > > writing
> > > > > > > > > the
> > > > > > > > > > > first
> > > > > > > > > > > > > draft (properties, metadata, writer, reader) and
> > > keeping
> > > > them
> > > > > > > > > updated
> > > > > > > > > > > when
> > > > > > > > > > > > > crypto package change.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We have had a great time to cooperate. Thank Gidon
> and
> > > > > > Revital
> > > > > > > > for
> > > > > > > > > > all
> > > > > > > > > > > > > guide and experience I have received from them,
> too.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Here are the links of pull requests:
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1) encryption module (properties, metadata, writer,
> > > > reader):
> > > > > > > > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2) some merged pull requests for new thrift
> structure
> > > and
> > > > > > > crypto
> > > > > > > > > > > algorithm,
> > > > > > > > > > > > > and one still open:
> > > > > > https://github.com/apache/arrow/pull/3520
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > However, in order to make (1) buildable with
> current
> > > > build
> > > > > > > > scripts,
> > > > > > > > > > we
> > > > > > > > > > > need
> > > > > > > > > > > > > “adding openssl in C++ build toolchain” which is
> > > > mentioned in
> > > > > > > > this
> > > > > > > > > > > jira:
> > > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-4302.
> I
> > > > will be
> > > > > > > > > grateful
> > > > > > > > > > > if
> > > > > > > > > > > > > someone could help fullfill this work.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > About current pull requests, they has been
> currently
> > > > using in
> > > > > > > our
> > > > > > > > > > > > > development phase at Emotiv (
> https://www.emotiv.com/).
> > > > We
> > > > > > love
> > > > > > > > > using
> > > > > > > > > > > > > parquet files to store EEG data. We are going to
> > > release
> > > > a
> > > > > > > > product
> > > > > > > > > > with
> > > > > > > > > > > > > encrypted parquet files soon and look forward to
> the
> > > > official
> > > > > > > > > release
> > > > > > > > > > > of
> > > > > > > > > > > > > parquet encryption feature. So it will be many
> thank
> > > and
> > > > > > great
> > > > > > > > > honor
> > > > > > > > > > to
> > > > > > > > > > > > > have you review and merge them (if qualified).
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thank you very much!
> > > > > > > > > > > > >
> > > > > > > > > > > > > Tham
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > regards,
> > > > > > > > > > Deepak Majeti
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > regards,
> > > > > > > > Deepak Majeti
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > regards,
> > > > > > Deepak Majeti
> > > > > >
> > > >
> > >
> >
> >
> > --
> > regards,
> > Deepak Majeti
>

Re: Announcement for c++ parquet encryption

Posted by Wes McKinney <we...@gmail.com>.
If there are encryption integration testing utilities there's not much
reason not to put them in

https://github.com/apache/arrow/tree/master/cpp/tools/parquet

(or similar)

Then there is no additional build system to maintain

On Wed, May 29, 2019 at 4:41 PM Deepak Majeti <ma...@gmail.com> wrote:
>
> Hi Gidon,
>
> We should definitely add encrypted parquet files written by parquet-mr in
> the parquet-testing repo data folder.
> I also like the idea of keeping the source files in a separate "src" folder
> in that repository. I see no downside in doing so.
>
> On Tue, May 28, 2019 at 9:23 PM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> > Hi Wes,
> >
> > Yep, I meant private as in not under apache/arrow. Will be more careful
> > with that term.
> >
> > https://github.com/apache/parquet-testing  seems to currently contain only
> > data (.parquet) files.
> > Our samples are C++ and Java code files, that write and read .parquet files
> > in different encryption configurations.
> > We can add these encrypted .parquet files to the apache/parquet-testing
> > repo.
> > Let us know if we also should add the samples themselves there (by creating
> > src folders in this repo), or add them to the apache/arrow/mr repos, or to
> > keep them in our personal repos (the ones I've called private before).
> > These samples are based on the standard parquet samples (such as
> >
> > https://github.com/apache/arrow/blob/master/cpp/examples/parquet/low-level-api/reader-writer.cc
> > ),
> > with a loop that creates different encryption configs, and writes/reads the
> > same data with them.
> >
> > Cheers, Gidon.
> >
> >
> > On Tue, May 28, 2019 at 5:51 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > > hi Gidon,
> > >
> > > Surely you don't mean "private" private -- this was always on GitHub
> > > right (otherwise this runs against the Apache Way)?
> > >
> > > I think we should try to have all of the testing code in the upstream
> > > repositories. If there are sample files you need to check in (and they
> > > are not too large), please use the apache/parquet-testing repository
> > >
> > > - Wes
> > >
> > > On Tue, May 28, 2019 at 8:00 AM Gidon Gershinsky <gg...@gmail.com>
> > wrote:
> > > >
> > > > Hi Deepak,
> > > >
> > > > Sounds good. Here goes the current cpp interop sample (not final yet).
> > We
> > > > have a sample with a similar code in Java.
> > > >
> > >
> > https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-interop-tests.cc
> > > >
> > > >
> > > > Please let me know if we should add this file to the cpp PR and its
> > > staging
> > > > branch (under parquet/low-level-api, or another folder?)
> > > > - or leave this file in our private branch.
> > > >
> > > > Cheers, Gidon.
> > > >
> > > > On Tue, May 28, 2019 at 7:25 AM Deepak Majeti <majeti.deepak@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Gidon,
> > > > >
> > > > > I agree with your point 2). I merged the PR for OpenSSL build
> > support.
> > > We
> > > > > can do the same for PR-3520 after you push the changes this week. We
> > > can
> > > > > then simply rebase PR-2555 and make the final review.
> > > > > For 1) I would like to take a look at the inter-op testing framework
> > > first.
> > > > > Can you share the inter-op sample? Thanks.
> > > > >
> > > > >
> > > > > On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <gg...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Deepak,
> > > > > >
> > > > > > If it simplifies the code review, we can make this change.
> > Actually,
> > > > > we're
> > > > > > already creating a single branch that has the parquet-cpp with
> > > encryption
> > > > > > in one place, ready for build and sample run. This is a private
> > > branch -
> > > > > > but we can do the same for the pull request (2555), making it the
> > > staging
> > > > > > branch. However, there are a number of technical details we need to
> > > > > address
> > > > > > first:
> > > > > >
> > > > > > 1) The private branch we're creating contains not only the cpp
> > > encryption
> > > > > > sample, but also an additional interop sample that checks
> > > > > interoperability
> > > > > > with parquet-mr encryption.
> > > > > > Do we want to add this interop sample to the staging branch? Should
> > > this
> > > > > > sample be merged in arrow/parquet-cpp for future interop testing
> > with
> > > > > Java
> > > > > > encryption? Or this sample can stay in the staging branch for the
> > > > > duration
> > > > > > of the review process, and be removed just before the staging PR is
> > > > > merged
> > > > > > in arrow/parquet-cpp? Or we dont add this interop sample to the
> > > staging
> > > > > > branch at all, and keep our private branch for interop testing? Let
> > > us
> > > > > know
> > > > > > your preference.
> > > > > >
> > > > > > 2) PR 3520 has been reviewed. We'll make the requested changes this
> > > week,
> > > > > > and update the PR. It adds 2 new files, and doesnt touch the
> > existing
> > > > > ones
> > > > > > - if you guys will be ok with the updated version, you can merge
> > it,
> > > and
> > > > > > then the staging PR (2555) will automatically pick the 3520 patch
> > > upon
> > > > > > rebasing to the then-current master state. But, if further changes
> > > will
> > > > > be
> > > > > > required in the #3520 files, we'll add them to the staging PR.
> > > > > >
> > > > > > Cheers, Gidon.
> > > > > >
> > > > > > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <
> > > majeti.deepak@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Gidon and Tham,
> > > > > > >
> > > > > > > I opened a PR for OpenSSL build support here:
> > > > > > > https://github.com/apache/arrow/pull/4384/files
> > > > > > > However, your PRs need to be rebased/merged with master.
> > > > > > > Why don't we keep one staging PR for C++? Probably make
> > > > > > > https://github.com/apache/arrow/pull/2555 the staging PR and
> > > > > cherry-pick
> > > > > > > https://github.com/apache/arrow/pull/3520/files and the above PR
> > > onto
> > > > > > it.
> > > > > > > The later PRs are tiny anyway. This simplifies code review,
> > syncing
> > > > > with
> > > > > > > the master, as well as building the encryption work while
> > > reviewing.
> > > > > > >
> > > > > > >
> > > > > > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <
> > > gg5070@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks guys. We'll create a doc with a description of the
> > current
> > > > > > interop
> > > > > > > > tests, including details on how to build/run them.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <
> > > > > majeti.deepak@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I will make the first attempt to review the C++ code. I
> > > reviewed
> > > > > part
> > > > > > > of
> > > > > > > > it
> > > > > > > > > a while ago.
> > > > > > > > > Java-C++ inter-operation testing is my main concern. Can you
> > > share
> > > > > > some
> > > > > > > > > details on what has been tested?
> > > > > > > > > I need to update myself with the key management part and see
> > > if we
> > > > > > can
> > > > > > > > add
> > > > > > > > > some test parquet files written in Java along with the keys
> > to
> > > the
> > > > > > data
> > > > > > > > > folder.
> > > > > > > > >
> > > > > > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <
> > > wesmckinn@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > hi folks,
> > > > > > > > > >
> > > > > > > > > > I'm extraordinarily busy but it seems likely that the code
> > > review
> > > > > > for
> > > > > > > > > > Parquet C++ is largely going to fall on my shoulders. Since
> > > this
> > > > > > is a
> > > > > > > > > > large patch with many touch-points in the C++ codebase, and
> > > there
> > > > > > > > > > hasn't been much feedback over the course of its
> > > development, I
> > > > > > would
> > > > > > > > > > like to leave thoughtful code review so that we can avoid
> > > issues
> > > > > > that
> > > > > > > > > > may lead to burdensome maintenance issues.
> > > > > > > > > >
> > > > > > > > > > On the OpenSSL issue -- there are many CMake projects in
> > the
> > > wild
> > > > > > > that
> > > > > > > > > > use OpenSSL and there are many examples of how to
> > incorporate
> > > > > other
> > > > > > > > > > dependencies into the Arrow build system. I don't have time
> > > to
> > > > > work
> > > > > > > on
> > > > > > > > > > this right now so someone else will have to sort out the
> > > > > toolchain
> > > > > > > > > > issue
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <
> > > > > gg5070@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > <re-sending from another account>
> > > > > > > > > > >
> > > > > > > > > > > Tham, thank you for this! and for volunteering early for
> > > the
> > > > > C++
> > > > > > > > > version
> > > > > > > > > > > work, driving it forward and creating the bulk of the
> > > > > parquet-cpp
> > > > > > > > > > > encryption code along the way.
> > > > > > > > > > >
> > > > > > > > > > > @All - this announcement means that two implementations
> > of
> > > > > > Parquet
> > > > > > > > > > > encryption, fully conforming to the formal specification,
> > > are
> > > > > > > > available
> > > > > > > > > > > today. Thanks to Revital for contributing to C++ version
> > > > > > compliance
> > > > > > > > > with
> > > > > > > > > > > the encryption spec, and for running a set of basic
> > > Java-C++
> > > > > > > > encryption
> > > > > > > > > > > interoperability tests. We have tested plaintext and
> > > encrypted
> > > > > > > footer
> > > > > > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy
> > readers.
> > > > > Files
> > > > > > > > > written
> > > > > > > > > > > with parquet-cpp are successfully parsed by parquet-mr,
> > and
> > > > > vice
> > > > > > > > versa.
> > > > > > > > > > >
> > > > > > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for
> > their
> > > > > > support
> > > > > > > > and
> > > > > > > > > > vote
> > > > > > > > > > > for the encryption specification - along with the PMC
> > > folks.
> > > > > > > > > > >
> > > > > > > > > > > All parquet-format pull requests are merged by now into
> > the
> > > > > > > > encryption
> > > > > > > > > > > branch,
> > > > > > > > > > > https://github.com/apache/parquet-format/tree/encryption
> > > > > > > > > > >
> > > > > > > > > > > The community is welcome to review the parquet-mr pull
> > > > > requests,
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > following order:
> > > > > > > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Currently, an end-to-end implementation of Java (mr)
> > > Parquet
> > > > > > > > encryption
> > > > > > > > > > is
> > > > > > > > > > > collected in this branch:
> > > > > > > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > > > > > > Thanks to Xinli for working with this branch code, and
> > > > > > contributing
> > > > > > > > to
> > > > > > > > > it
> > > > > > > > > > > based on his field experience. Everybody is welcome to do
> > > the
> > > > > > same.
> > > > > > > > > > > @All - it would be helpful to review & merge the above
> > PRs
> > > in
> > > > > > > > > > > apache/parquet-mr/encryption, so that folks can work with
> > > it
> > > > > > > instead
> > > > > > > > of
> > > > > > > > > > my
> > > > > > > > > > > private branch..
> > > > > > > > > > >
> > > > > > > > > > > And I certainly second Tham's call to review & merge the
> > > > > > > parquet-cpp
> > > > > > > > > pull
> > > > > > > > > > > requests. By now, we have a number of companies starting
> > to
> > > > > > utilize
> > > > > > > > > > Parquet
> > > > > > > > > > > encryption (both C++ and Java), including IBM.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Cheers, Gidon.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <tham@emotiv.com
> > >
> > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi community,
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > After a long time of development, I'm honor to announce
> > > that
> > > > > we
> > > > > > > > have
> > > > > > > > > > just
> > > > > > > > > > > > completed C++ parquet encryption module which
> > implements
> > > > > > > encryption
> > > > > > > > > in
> > > > > > > > > > low
> > > > > > > > > > > > level api and with examples included.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > To have this feature completed, I would like to thank
> > > Gidon
> > > > > and
> > > > > > > > > > Revital for
> > > > > > > > > > > > their contribution.
> > > > > > > > > > > >
> > > > > > > > > > > > Gidon had a key role in encryption design and in
> > writing
> > > Java
> > > > > > > > version
> > > > > > > > > > code
> > > > > > > > > > > > on which we based on to write C++ version. He also
> > wrote
> > > > > crypto
> > > > > > > > > > package in
> > > > > > > > > > > > C++ version.
> > > > > > > > > > > >
> > > > > > > > > > > > Revital and me has been joining together in writing C++
> > > > > > version.
> > > > > > > > > > Revital
> > > > > > > > > > > > was responsible for AAD calculations, API updating (to
> > > be the
> > > > > > > same
> > > > > > > > > with
> > > > > > > > > > > > Java version) and Java-C++ inter-operation testing. I
> > was
> > > > > > writing
> > > > > > > > the
> > > > > > > > > > first
> > > > > > > > > > > > draft (properties, metadata, writer, reader) and
> > keeping
> > > them
> > > > > > > > updated
> > > > > > > > > > when
> > > > > > > > > > > > crypto package change.
> > > > > > > > > > > >
> > > > > > > > > > > > We have had a great time to cooperate. Thank Gidon and
> > > > > Revital
> > > > > > > for
> > > > > > > > > all
> > > > > > > > > > > > guide and experience I have received from them, too.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Here are the links of pull requests:
> > > > > > > > > > > >
> > > > > > > > > > > > 1) encryption module (properties, metadata, writer,
> > > reader):
> > > > > > > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > > > > > > >
> > > > > > > > > > > > 2) some merged pull requests for new thrift structure
> > and
> > > > > > crypto
> > > > > > > > > > algorithm,
> > > > > > > > > > > > and one still open:
> > > > > https://github.com/apache/arrow/pull/3520
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > However, in order to make (1) buildable with current
> > > build
> > > > > > > scripts,
> > > > > > > > > we
> > > > > > > > > > need
> > > > > > > > > > > > “adding openssl in C++ build toolchain” which is
> > > mentioned in
> > > > > > > this
> > > > > > > > > > jira:
> > > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-4302. I
> > > will be
> > > > > > > > grateful
> > > > > > > > > > if
> > > > > > > > > > > > someone could help fullfill this work.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > About current pull requests, they has been currently
> > > using in
> > > > > > our
> > > > > > > > > > > > development phase at Emotiv (https://www.emotiv.com/).
> > > We
> > > > > love
> > > > > > > > using
> > > > > > > > > > > > parquet files to store EEG data. We are going to
> > release
> > > a
> > > > > > > product
> > > > > > > > > with
> > > > > > > > > > > > encrypted parquet files soon and look forward to the
> > > official
> > > > > > > > release
> > > > > > > > > > of
> > > > > > > > > > > > parquet encryption feature. So it will be many thank
> > and
> > > > > great
> > > > > > > > honor
> > > > > > > > > to
> > > > > > > > > > > > have you review and merge them (if qualified).
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you very much!
> > > > > > > > > > > >
> > > > > > > > > > > > Tham
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > regards,
> > > > > > > > > Deepak Majeti
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > regards,
> > > > > > > Deepak Majeti
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > regards,
> > > > > Deepak Majeti
> > > > >
> > >
> >
>
>
> --
> regards,
> Deepak Majeti

Re: Announcement for c++ parquet encryption

Posted by Deepak Majeti <ma...@gmail.com>.
Hi Gidon,

We should definitely add encrypted parquet files written by parquet-mr in
the parquet-testing repo data folder.
I also like the idea of keeping the source files in a separate "src" folder
in that repository. I see no downside in doing so.

On Tue, May 28, 2019 at 9:23 PM Gidon Gershinsky <gg...@gmail.com> wrote:

> Hi Wes,
>
> Yep, I meant private as in not under apache/arrow. Will be more careful
> with that term.
>
> https://github.com/apache/parquet-testing  seems to currently contain only
> data (.parquet) files.
> Our samples are C++ and Java code files, that write and read .parquet files
> in different encryption configurations.
> We can add these encrypted .parquet files to the apache/parquet-testing
> repo.
> Let us know if we also should add the samples themselves there (by creating
> src folders in this repo), or add them to the apache/arrow/mr repos, or to
> keep them in our personal repos (the ones I've called private before).
> These samples are based on the standard parquet samples (such as
>
> https://github.com/apache/arrow/blob/master/cpp/examples/parquet/low-level-api/reader-writer.cc
> ),
> with a loop that creates different encryption configs, and writes/reads the
> same data with them.
>
> Cheers, Gidon.
>
>
> On Tue, May 28, 2019 at 5:51 PM Wes McKinney <we...@gmail.com> wrote:
>
> > hi Gidon,
> >
> > Surely you don't mean "private" private -- this was always on GitHub
> > right (otherwise this runs against the Apache Way)?
> >
> > I think we should try to have all of the testing code in the upstream
> > repositories. If there are sample files you need to check in (and they
> > are not too large), please use the apache/parquet-testing repository
> >
> > - Wes
> >
> > On Tue, May 28, 2019 at 8:00 AM Gidon Gershinsky <gg...@gmail.com>
> wrote:
> > >
> > > Hi Deepak,
> > >
> > > Sounds good. Here goes the current cpp interop sample (not final yet).
> We
> > > have a sample with a similar code in Java.
> > >
> >
> https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-interop-tests.cc
> > >
> > >
> > > Please let me know if we should add this file to the cpp PR and its
> > staging
> > > branch (under parquet/low-level-api, or another folder?)
> > > - or leave this file in our private branch.
> > >
> > > Cheers, Gidon.
> > >
> > > On Tue, May 28, 2019 at 7:25 AM Deepak Majeti <majeti.deepak@gmail.com
> >
> > > wrote:
> > >
> > > > Hi Gidon,
> > > >
> > > > I agree with your point 2). I merged the PR for OpenSSL build
> support.
> > We
> > > > can do the same for PR-3520 after you push the changes this week. We
> > can
> > > > then simply rebase PR-2555 and make the final review.
> > > > For 1) I would like to take a look at the inter-op testing framework
> > first.
> > > > Can you share the inter-op sample? Thanks.
> > > >
> > > >
> > > > On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <gg...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Deepak,
> > > > >
> > > > > If it simplifies the code review, we can make this change.
> Actually,
> > > > we're
> > > > > already creating a single branch that has the parquet-cpp with
> > encryption
> > > > > in one place, ready for build and sample run. This is a private
> > branch -
> > > > > but we can do the same for the pull request (2555), making it the
> > staging
> > > > > branch. However, there are a number of technical details we need to
> > > > address
> > > > > first:
> > > > >
> > > > > 1) The private branch we're creating contains not only the cpp
> > encryption
> > > > > sample, but also an additional interop sample that checks
> > > > interoperability
> > > > > with parquet-mr encryption.
> > > > > Do we want to add this interop sample to the staging branch? Should
> > this
> > > > > sample be merged in arrow/parquet-cpp for future interop testing
> with
> > > > Java
> > > > > encryption? Or this sample can stay in the staging branch for the
> > > > duration
> > > > > of the review process, and be removed just before the staging PR is
> > > > merged
> > > > > in arrow/parquet-cpp? Or we dont add this interop sample to the
> > staging
> > > > > branch at all, and keep our private branch for interop testing? Let
> > us
> > > > know
> > > > > your preference.
> > > > >
> > > > > 2) PR 3520 has been reviewed. We'll make the requested changes this
> > week,
> > > > > and update the PR. It adds 2 new files, and doesnt touch the
> existing
> > > > ones
> > > > > - if you guys will be ok with the updated version, you can merge
> it,
> > and
> > > > > then the staging PR (2555) will automatically pick the 3520 patch
> > upon
> > > > > rebasing to the then-current master state. But, if further changes
> > will
> > > > be
> > > > > required in the #3520 files, we'll add them to the staging PR.
> > > > >
> > > > > Cheers, Gidon.
> > > > >
> > > > > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <
> > majeti.deepak@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Gidon and Tham,
> > > > > >
> > > > > > I opened a PR for OpenSSL build support here:
> > > > > > https://github.com/apache/arrow/pull/4384/files
> > > > > > However, your PRs need to be rebased/merged with master.
> > > > > > Why don't we keep one staging PR for C++? Probably make
> > > > > > https://github.com/apache/arrow/pull/2555 the staging PR and
> > > > cherry-pick
> > > > > > https://github.com/apache/arrow/pull/3520/files and the above PR
> > onto
> > > > > it.
> > > > > > The later PRs are tiny anyway. This simplifies code review,
> syncing
> > > > with
> > > > > > the master, as well as building the encryption work while
> > reviewing.
> > > > > >
> > > > > >
> > > > > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <
> > gg5070@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks guys. We'll create a doc with a description of the
> current
> > > > > interop
> > > > > > > tests, including details on how to build/run them.
> > > > > > >
> > > > > > >
> > > > > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <
> > > > majeti.deepak@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I will make the first attempt to review the C++ code. I
> > reviewed
> > > > part
> > > > > > of
> > > > > > > it
> > > > > > > > a while ago.
> > > > > > > > Java-C++ inter-operation testing is my main concern. Can you
> > share
> > > > > some
> > > > > > > > details on what has been tested?
> > > > > > > > I need to update myself with the key management part and see
> > if we
> > > > > can
> > > > > > > add
> > > > > > > > some test parquet files written in Java along with the keys
> to
> > the
> > > > > data
> > > > > > > > folder.
> > > > > > > >
> > > > > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <
> > wesmckinn@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > hi folks,
> > > > > > > > >
> > > > > > > > > I'm extraordinarily busy but it seems likely that the code
> > review
> > > > > for
> > > > > > > > > Parquet C++ is largely going to fall on my shoulders. Since
> > this
> > > > > is a
> > > > > > > > > large patch with many touch-points in the C++ codebase, and
> > there
> > > > > > > > > hasn't been much feedback over the course of its
> > development, I
> > > > > would
> > > > > > > > > like to leave thoughtful code review so that we can avoid
> > issues
> > > > > that
> > > > > > > > > may lead to burdensome maintenance issues.
> > > > > > > > >
> > > > > > > > > On the OpenSSL issue -- there are many CMake projects in
> the
> > wild
> > > > > > that
> > > > > > > > > use OpenSSL and there are many examples of how to
> incorporate
> > > > other
> > > > > > > > > dependencies into the Arrow build system. I don't have time
> > to
> > > > work
> > > > > > on
> > > > > > > > > this right now so someone else will have to sort out the
> > > > toolchain
> > > > > > > > > issue
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <
> > > > gg5070@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > <re-sending from another account>
> > > > > > > > > >
> > > > > > > > > > Tham, thank you for this! and for volunteering early for
> > the
> > > > C++
> > > > > > > > version
> > > > > > > > > > work, driving it forward and creating the bulk of the
> > > > parquet-cpp
> > > > > > > > > > encryption code along the way.
> > > > > > > > > >
> > > > > > > > > > @All - this announcement means that two implementations
> of
> > > > > Parquet
> > > > > > > > > > encryption, fully conforming to the formal specification,
> > are
> > > > > > > available
> > > > > > > > > > today. Thanks to Revital for contributing to C++ version
> > > > > compliance
> > > > > > > > with
> > > > > > > > > > the encryption spec, and for running a set of basic
> > Java-C++
> > > > > > > encryption
> > > > > > > > > > interoperability tests. We have tested plaintext and
> > encrypted
> > > > > > footer
> > > > > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy
> readers.
> > > > Files
> > > > > > > > written
> > > > > > > > > > with parquet-cpp are successfully parsed by parquet-mr,
> and
> > > > vice
> > > > > > > versa.
> > > > > > > > > >
> > > > > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for
> their
> > > > > support
> > > > > > > and
> > > > > > > > > vote
> > > > > > > > > > for the encryption specification - along with the PMC
> > folks.
> > > > > > > > > >
> > > > > > > > > > All parquet-format pull requests are merged by now into
> the
> > > > > > > encryption
> > > > > > > > > > branch,
> > > > > > > > > > https://github.com/apache/parquet-format/tree/encryption
> > > > > > > > > >
> > > > > > > > > > The community is welcome to review the parquet-mr pull
> > > > requests,
> > > > > in
> > > > > > > the
> > > > > > > > > > following order:
> > > > > > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Currently, an end-to-end implementation of Java (mr)
> > Parquet
> > > > > > > encryption
> > > > > > > > > is
> > > > > > > > > > collected in this branch:
> > > > > > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > > > > > Thanks to Xinli for working with this branch code, and
> > > > > contributing
> > > > > > > to
> > > > > > > > it
> > > > > > > > > > based on his field experience. Everybody is welcome to do
> > the
> > > > > same.
> > > > > > > > > > @All - it would be helpful to review & merge the above
> PRs
> > in
> > > > > > > > > > apache/parquet-mr/encryption, so that folks can work with
> > it
> > > > > > instead
> > > > > > > of
> > > > > > > > > my
> > > > > > > > > > private branch..
> > > > > > > > > >
> > > > > > > > > > And I certainly second Tham's call to review & merge the
> > > > > > parquet-cpp
> > > > > > > > pull
> > > > > > > > > > requests. By now, we have a number of companies starting
> to
> > > > > utilize
> > > > > > > > > Parquet
> > > > > > > > > > encryption (both C++ and Java), including IBM.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Cheers, Gidon.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <tham@emotiv.com
> >
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi community,
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > After a long time of development, I'm honor to announce
> > that
> > > > we
> > > > > > > have
> > > > > > > > > just
> > > > > > > > > > > completed C++ parquet encryption module which
> implements
> > > > > > encryption
> > > > > > > > in
> > > > > > > > > low
> > > > > > > > > > > level api and with examples included.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > To have this feature completed, I would like to thank
> > Gidon
> > > > and
> > > > > > > > > Revital for
> > > > > > > > > > > their contribution.
> > > > > > > > > > >
> > > > > > > > > > > Gidon had a key role in encryption design and in
> writing
> > Java
> > > > > > > version
> > > > > > > > > code
> > > > > > > > > > > on which we based on to write C++ version. He also
> wrote
> > > > crypto
> > > > > > > > > package in
> > > > > > > > > > > C++ version.
> > > > > > > > > > >
> > > > > > > > > > > Revital and me has been joining together in writing C++
> > > > > version.
> > > > > > > > > Revital
> > > > > > > > > > > was responsible for AAD calculations, API updating (to
> > be the
> > > > > > same
> > > > > > > > with
> > > > > > > > > > > Java version) and Java-C++ inter-operation testing. I
> was
> > > > > writing
> > > > > > > the
> > > > > > > > > first
> > > > > > > > > > > draft (properties, metadata, writer, reader) and
> keeping
> > them
> > > > > > > updated
> > > > > > > > > when
> > > > > > > > > > > crypto package change.
> > > > > > > > > > >
> > > > > > > > > > > We have had a great time to cooperate. Thank Gidon and
> > > > Revital
> > > > > > for
> > > > > > > > all
> > > > > > > > > > > guide and experience I have received from them, too.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Here are the links of pull requests:
> > > > > > > > > > >
> > > > > > > > > > > 1) encryption module (properties, metadata, writer,
> > reader):
> > > > > > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > > > > > >
> > > > > > > > > > > 2) some merged pull requests for new thrift structure
> and
> > > > > crypto
> > > > > > > > > algorithm,
> > > > > > > > > > > and one still open:
> > > > https://github.com/apache/arrow/pull/3520
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > However, in order to make (1) buildable with current
> > build
> > > > > > scripts,
> > > > > > > > we
> > > > > > > > > need
> > > > > > > > > > > “adding openssl in C++ build toolchain” which is
> > mentioned in
> > > > > > this
> > > > > > > > > jira:
> > > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-4302. I
> > will be
> > > > > > > grateful
> > > > > > > > > if
> > > > > > > > > > > someone could help fullfill this work.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > About current pull requests, they has been currently
> > using in
> > > > > our
> > > > > > > > > > > development phase at Emotiv (https://www.emotiv.com/).
> > We
> > > > love
> > > > > > > using
> > > > > > > > > > > parquet files to store EEG data. We are going to
> release
> > a
> > > > > > product
> > > > > > > > with
> > > > > > > > > > > encrypted parquet files soon and look forward to the
> > official
> > > > > > > release
> > > > > > > > > of
> > > > > > > > > > > parquet encryption feature. So it will be many thank
> and
> > > > great
> > > > > > > honor
> > > > > > > > to
> > > > > > > > > > > have you review and merge them (if qualified).
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thank you very much!
> > > > > > > > > > >
> > > > > > > > > > > Tham
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > regards,
> > > > > > > > Deepak Majeti
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > regards,
> > > > > > Deepak Majeti
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > regards,
> > > > Deepak Majeti
> > > >
> >
>


-- 
regards,
Deepak Majeti

Re: Announcement for c++ parquet encryption

Posted by Gidon Gershinsky <gg...@gmail.com>.
Hi Wes,

Yep, I meant private as in not under apache/arrow. Will be more careful
with that term.

https://github.com/apache/parquet-testing  seems to currently contain only
data (.parquet) files.
Our samples are C++ and Java code files, that write and read .parquet files
in different encryption configurations.
We can add these encrypted .parquet files to the apache/parquet-testing
repo.
Let us know if we also should add the samples themselves there (by creating
src folders in this repo), or add them to the apache/arrow/mr repos, or to
keep them in our personal repos (the ones I've called private before).
These samples are based on the standard parquet samples (such as
https://github.com/apache/arrow/blob/master/cpp/examples/parquet/low-level-api/reader-writer.cc),
with a loop that creates different encryption configs, and writes/reads the
same data with them.

Cheers, Gidon.


On Tue, May 28, 2019 at 5:51 PM Wes McKinney <we...@gmail.com> wrote:

> hi Gidon,
>
> Surely you don't mean "private" private -- this was always on GitHub
> right (otherwise this runs against the Apache Way)?
>
> I think we should try to have all of the testing code in the upstream
> repositories. If there are sample files you need to check in (and they
> are not too large), please use the apache/parquet-testing repository
>
> - Wes
>
> On Tue, May 28, 2019 at 8:00 AM Gidon Gershinsky <gg...@gmail.com> wrote:
> >
> > Hi Deepak,
> >
> > Sounds good. Here goes the current cpp interop sample (not final yet). We
> > have a sample with a similar code in Java.
> >
> https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-interop-tests.cc
> >
> >
> > Please let me know if we should add this file to the cpp PR and its
> staging
> > branch (under parquet/low-level-api, or another folder?)
> > - or leave this file in our private branch.
> >
> > Cheers, Gidon.
> >
> > On Tue, May 28, 2019 at 7:25 AM Deepak Majeti <ma...@gmail.com>
> > wrote:
> >
> > > Hi Gidon,
> > >
> > > I agree with your point 2). I merged the PR for OpenSSL build support.
> We
> > > can do the same for PR-3520 after you push the changes this week. We
> can
> > > then simply rebase PR-2555 and make the final review.
> > > For 1) I would like to take a look at the inter-op testing framework
> first.
> > > Can you share the inter-op sample? Thanks.
> > >
> > >
> > > On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <gg...@gmail.com>
> wrote:
> > >
> > > > Hi Deepak,
> > > >
> > > > If it simplifies the code review, we can make this change. Actually,
> > > we're
> > > > already creating a single branch that has the parquet-cpp with
> encryption
> > > > in one place, ready for build and sample run. This is a private
> branch -
> > > > but we can do the same for the pull request (2555), making it the
> staging
> > > > branch. However, there are a number of technical details we need to
> > > address
> > > > first:
> > > >
> > > > 1) The private branch we're creating contains not only the cpp
> encryption
> > > > sample, but also an additional interop sample that checks
> > > interoperability
> > > > with parquet-mr encryption.
> > > > Do we want to add this interop sample to the staging branch? Should
> this
> > > > sample be merged in arrow/parquet-cpp for future interop testing with
> > > Java
> > > > encryption? Or this sample can stay in the staging branch for the
> > > duration
> > > > of the review process, and be removed just before the staging PR is
> > > merged
> > > > in arrow/parquet-cpp? Or we dont add this interop sample to the
> staging
> > > > branch at all, and keep our private branch for interop testing? Let
> us
> > > know
> > > > your preference.
> > > >
> > > > 2) PR 3520 has been reviewed. We'll make the requested changes this
> week,
> > > > and update the PR. It adds 2 new files, and doesnt touch the existing
> > > ones
> > > > - if you guys will be ok with the updated version, you can merge it,
> and
> > > > then the staging PR (2555) will automatically pick the 3520 patch
> upon
> > > > rebasing to the then-current master state. But, if further changes
> will
> > > be
> > > > required in the #3520 files, we'll add them to the staging PR.
> > > >
> > > > Cheers, Gidon.
> > > >
> > > > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <
> majeti.deepak@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Gidon and Tham,
> > > > >
> > > > > I opened a PR for OpenSSL build support here:
> > > > > https://github.com/apache/arrow/pull/4384/files
> > > > > However, your PRs need to be rebased/merged with master.
> > > > > Why don't we keep one staging PR for C++? Probably make
> > > > > https://github.com/apache/arrow/pull/2555 the staging PR and
> > > cherry-pick
> > > > > https://github.com/apache/arrow/pull/3520/files and the above PR
> onto
> > > > it.
> > > > > The later PRs are tiny anyway. This simplifies code review, syncing
> > > with
> > > > > the master, as well as building the encryption work while
> reviewing.
> > > > >
> > > > >
> > > > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <
> gg5070@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks guys. We'll create a doc with a description of the current
> > > > interop
> > > > > > tests, including details on how to build/run them.
> > > > > >
> > > > > >
> > > > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <
> > > majeti.deepak@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I will make the first attempt to review the C++ code. I
> reviewed
> > > part
> > > > > of
> > > > > > it
> > > > > > > a while ago.
> > > > > > > Java-C++ inter-operation testing is my main concern. Can you
> share
> > > > some
> > > > > > > details on what has been tested?
> > > > > > > I need to update myself with the key management part and see
> if we
> > > > can
> > > > > > add
> > > > > > > some test parquet files written in Java along with the keys to
> the
> > > > data
> > > > > > > folder.
> > > > > > >
> > > > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <
> wesmckinn@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > hi folks,
> > > > > > > >
> > > > > > > > I'm extraordinarily busy but it seems likely that the code
> review
> > > > for
> > > > > > > > Parquet C++ is largely going to fall on my shoulders. Since
> this
> > > > is a
> > > > > > > > large patch with many touch-points in the C++ codebase, and
> there
> > > > > > > > hasn't been much feedback over the course of its
> development, I
> > > > would
> > > > > > > > like to leave thoughtful code review so that we can avoid
> issues
> > > > that
> > > > > > > > may lead to burdensome maintenance issues.
> > > > > > > >
> > > > > > > > On the OpenSSL issue -- there are many CMake projects in the
> wild
> > > > > that
> > > > > > > > use OpenSSL and there are many examples of how to incorporate
> > > other
> > > > > > > > dependencies into the Arrow build system. I don't have time
> to
> > > work
> > > > > on
> > > > > > > > this right now so someone else will have to sort out the
> > > toolchain
> > > > > > > > issue
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <
> > > gg5070@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > <re-sending from another account>
> > > > > > > > >
> > > > > > > > > Tham, thank you for this! and for volunteering early for
> the
> > > C++
> > > > > > > version
> > > > > > > > > work, driving it forward and creating the bulk of the
> > > parquet-cpp
> > > > > > > > > encryption code along the way.
> > > > > > > > >
> > > > > > > > > @All - this announcement means that two implementations of
> > > > Parquet
> > > > > > > > > encryption, fully conforming to the formal specification,
> are
> > > > > > available
> > > > > > > > > today. Thanks to Revital for contributing to C++ version
> > > > compliance
> > > > > > > with
> > > > > > > > > the encryption spec, and for running a set of basic
> Java-C++
> > > > > > encryption
> > > > > > > > > interoperability tests. We have tested plaintext and
> encrypted
> > > > > footer
> > > > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy readers.
> > > Files
> > > > > > > written
> > > > > > > > > with parquet-cpp are successfully parsed by parquet-mr, and
> > > vice
> > > > > > versa.
> > > > > > > > >
> > > > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for their
> > > > support
> > > > > > and
> > > > > > > > vote
> > > > > > > > > for the encryption specification - along with the PMC
> folks.
> > > > > > > > >
> > > > > > > > > All parquet-format pull requests are merged by now into the
> > > > > > encryption
> > > > > > > > > branch,
> > > > > > > > > https://github.com/apache/parquet-format/tree/encryption
> > > > > > > > >
> > > > > > > > > The community is welcome to review the parquet-mr pull
> > > requests,
> > > > in
> > > > > > the
> > > > > > > > > following order:
> > > > > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Currently, an end-to-end implementation of Java (mr)
> Parquet
> > > > > > encryption
> > > > > > > > is
> > > > > > > > > collected in this branch:
> > > > > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > > > > Thanks to Xinli for working with this branch code, and
> > > > contributing
> > > > > > to
> > > > > > > it
> > > > > > > > > based on his field experience. Everybody is welcome to do
> the
> > > > same.
> > > > > > > > > @All - it would be helpful to review & merge the above PRs
> in
> > > > > > > > > apache/parquet-mr/encryption, so that folks can work with
> it
> > > > > instead
> > > > > > of
> > > > > > > > my
> > > > > > > > > private branch..
> > > > > > > > >
> > > > > > > > > And I certainly second Tham's call to review & merge the
> > > > > parquet-cpp
> > > > > > > pull
> > > > > > > > > requests. By now, we have a number of companies starting to
> > > > utilize
> > > > > > > > Parquet
> > > > > > > > > encryption (both C++ and Java), including IBM.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers, Gidon.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <th...@emotiv.com>
> > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi community,
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > After a long time of development, I'm honor to announce
> that
> > > we
> > > > > > have
> > > > > > > > just
> > > > > > > > > > completed C++ parquet encryption module which implements
> > > > > encryption
> > > > > > > in
> > > > > > > > low
> > > > > > > > > > level api and with examples included.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > To have this feature completed, I would like to thank
> Gidon
> > > and
> > > > > > > > Revital for
> > > > > > > > > > their contribution.
> > > > > > > > > >
> > > > > > > > > > Gidon had a key role in encryption design and in writing
> Java
> > > > > > version
> > > > > > > > code
> > > > > > > > > > on which we based on to write C++ version. He also wrote
> > > crypto
> > > > > > > > package in
> > > > > > > > > > C++ version.
> > > > > > > > > >
> > > > > > > > > > Revital and me has been joining together in writing C++
> > > > version.
> > > > > > > > Revital
> > > > > > > > > > was responsible for AAD calculations, API updating (to
> be the
> > > > > same
> > > > > > > with
> > > > > > > > > > Java version) and Java-C++ inter-operation testing. I was
> > > > writing
> > > > > > the
> > > > > > > > first
> > > > > > > > > > draft (properties, metadata, writer, reader) and keeping
> them
> > > > > > updated
> > > > > > > > when
> > > > > > > > > > crypto package change.
> > > > > > > > > >
> > > > > > > > > > We have had a great time to cooperate. Thank Gidon and
> > > Revital
> > > > > for
> > > > > > > all
> > > > > > > > > > guide and experience I have received from them, too.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Here are the links of pull requests:
> > > > > > > > > >
> > > > > > > > > > 1) encryption module (properties, metadata, writer,
> reader):
> > > > > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > > > > >
> > > > > > > > > > 2) some merged pull requests for new thrift structure and
> > > > crypto
> > > > > > > > algorithm,
> > > > > > > > > > and one still open:
> > > https://github.com/apache/arrow/pull/3520
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > However, in order to make (1) buildable with current
> build
> > > > > scripts,
> > > > > > > we
> > > > > > > > need
> > > > > > > > > > “adding openssl in C++ build toolchain” which is
> mentioned in
> > > > > this
> > > > > > > > jira:
> > > > > > > > > > https://issues.apache.org/jira/browse/ARROW-4302. I
> will be
> > > > > > grateful
> > > > > > > > if
> > > > > > > > > > someone could help fullfill this work.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > About current pull requests, they has been currently
> using in
> > > > our
> > > > > > > > > > development phase at Emotiv (https://www.emotiv.com/).
> We
> > > love
> > > > > > using
> > > > > > > > > > parquet files to store EEG data. We are going to release
> a
> > > > > product
> > > > > > > with
> > > > > > > > > > encrypted parquet files soon and look forward to the
> official
> > > > > > release
> > > > > > > > of
> > > > > > > > > > parquet encryption feature. So it will be many thank and
> > > great
> > > > > > honor
> > > > > > > to
> > > > > > > > > > have you review and merge them (if qualified).
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thank you very much!
> > > > > > > > > >
> > > > > > > > > > Tham
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > regards,
> > > > > > > Deepak Majeti
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > regards,
> > > > > Deepak Majeti
> > > > >
> > > >
> > >
> > >
> > > --
> > > regards,
> > > Deepak Majeti
> > >
>

Re: Announcement for c++ parquet encryption

Posted by Wes McKinney <we...@gmail.com>.
hi Gidon,

Surely you don't mean "private" private -- this was always on GitHub
right (otherwise this runs against the Apache Way)?

I think we should try to have all of the testing code in the upstream
repositories. If there are sample files you need to check in (and they
are not too large), please use the apache/parquet-testing repository

- Wes

On Tue, May 28, 2019 at 8:00 AM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> Hi Deepak,
>
> Sounds good. Here goes the current cpp interop sample (not final yet). We
> have a sample with a similar code in Java.
> https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-interop-tests.cc
>
>
> Please let me know if we should add this file to the cpp PR and its staging
> branch (under parquet/low-level-api, or another folder?)
> - or leave this file in our private branch.
>
> Cheers, Gidon.
>
> On Tue, May 28, 2019 at 7:25 AM Deepak Majeti <ma...@gmail.com>
> wrote:
>
> > Hi Gidon,
> >
> > I agree with your point 2). I merged the PR for OpenSSL build support. We
> > can do the same for PR-3520 after you push the changes this week. We can
> > then simply rebase PR-2555 and make the final review.
> > For 1) I would like to take a look at the inter-op testing framework first.
> > Can you share the inter-op sample? Thanks.
> >
> >
> > On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <gg...@gmail.com> wrote:
> >
> > > Hi Deepak,
> > >
> > > If it simplifies the code review, we can make this change. Actually,
> > we're
> > > already creating a single branch that has the parquet-cpp with encryption
> > > in one place, ready for build and sample run. This is a private branch -
> > > but we can do the same for the pull request (2555), making it the staging
> > > branch. However, there are a number of technical details we need to
> > address
> > > first:
> > >
> > > 1) The private branch we're creating contains not only the cpp encryption
> > > sample, but also an additional interop sample that checks
> > interoperability
> > > with parquet-mr encryption.
> > > Do we want to add this interop sample to the staging branch? Should this
> > > sample be merged in arrow/parquet-cpp for future interop testing with
> > Java
> > > encryption? Or this sample can stay in the staging branch for the
> > duration
> > > of the review process, and be removed just before the staging PR is
> > merged
> > > in arrow/parquet-cpp? Or we dont add this interop sample to the staging
> > > branch at all, and keep our private branch for interop testing? Let us
> > know
> > > your preference.
> > >
> > > 2) PR 3520 has been reviewed. We'll make the requested changes this week,
> > > and update the PR. It adds 2 new files, and doesnt touch the existing
> > ones
> > > - if you guys will be ok with the updated version, you can merge it, and
> > > then the staging PR (2555) will automatically pick the 3520 patch upon
> > > rebasing to the then-current master state. But, if further changes will
> > be
> > > required in the #3520 files, we'll add them to the staging PR.
> > >
> > > Cheers, Gidon.
> > >
> > > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <ma...@gmail.com>
> > > wrote:
> > >
> > > > Hi Gidon and Tham,
> > > >
> > > > I opened a PR for OpenSSL build support here:
> > > > https://github.com/apache/arrow/pull/4384/files
> > > > However, your PRs need to be rebased/merged with master.
> > > > Why don't we keep one staging PR for C++? Probably make
> > > > https://github.com/apache/arrow/pull/2555 the staging PR and
> > cherry-pick
> > > > https://github.com/apache/arrow/pull/3520/files and the above PR onto
> > > it.
> > > > The later PRs are tiny anyway. This simplifies code review, syncing
> > with
> > > > the master, as well as building the encryption work while reviewing.
> > > >
> > > >
> > > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <gg...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks guys. We'll create a doc with a description of the current
> > > interop
> > > > > tests, including details on how to build/run them.
> > > > >
> > > > >
> > > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <
> > majeti.deepak@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > I will make the first attempt to review the C++ code. I reviewed
> > part
> > > > of
> > > > > it
> > > > > > a while ago.
> > > > > > Java-C++ inter-operation testing is my main concern. Can you share
> > > some
> > > > > > details on what has been tested?
> > > > > > I need to update myself with the key management part and see if we
> > > can
> > > > > add
> > > > > > some test parquet files written in Java along with the keys to the
> > > data
> > > > > > folder.
> > > > > >
> > > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <wesmckinn@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > hi folks,
> > > > > > >
> > > > > > > I'm extraordinarily busy but it seems likely that the code review
> > > for
> > > > > > > Parquet C++ is largely going to fall on my shoulders. Since this
> > > is a
> > > > > > > large patch with many touch-points in the C++ codebase, and there
> > > > > > > hasn't been much feedback over the course of its development, I
> > > would
> > > > > > > like to leave thoughtful code review so that we can avoid issues
> > > that
> > > > > > > may lead to burdensome maintenance issues.
> > > > > > >
> > > > > > > On the OpenSSL issue -- there are many CMake projects in the wild
> > > > that
> > > > > > > use OpenSSL and there are many examples of how to incorporate
> > other
> > > > > > > dependencies into the Arrow build system. I don't have time to
> > work
> > > > on
> > > > > > > this right now so someone else will have to sort out the
> > toolchain
> > > > > > > issue
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <
> > gg5070@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > <re-sending from another account>
> > > > > > > >
> > > > > > > > Tham, thank you for this! and for volunteering early for the
> > C++
> > > > > > version
> > > > > > > > work, driving it forward and creating the bulk of the
> > parquet-cpp
> > > > > > > > encryption code along the way.
> > > > > > > >
> > > > > > > > @All - this announcement means that two implementations of
> > > Parquet
> > > > > > > > encryption, fully conforming to the formal specification, are
> > > > > available
> > > > > > > > today. Thanks to Revital for contributing to C++ version
> > > compliance
> > > > > > with
> > > > > > > > the encryption spec, and for running a set of basic Java-C++
> > > > > encryption
> > > > > > > > interoperability tests. We have tested plaintext and encrypted
> > > > footer
> > > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy readers.
> > Files
> > > > > > written
> > > > > > > > with parquet-cpp are successfully parsed by parquet-mr, and
> > vice
> > > > > versa.
> > > > > > > >
> > > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for their
> > > support
> > > > > and
> > > > > > > vote
> > > > > > > > for the encryption specification - along with the PMC folks.
> > > > > > > >
> > > > > > > > All parquet-format pull requests are merged by now into the
> > > > > encryption
> > > > > > > > branch,
> > > > > > > > https://github.com/apache/parquet-format/tree/encryption
> > > > > > > >
> > > > > > > > The community is welcome to review the parquet-mr pull
> > requests,
> > > in
> > > > > the
> > > > > > > > following order:
> > > > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > > > >
> > > > > > > >
> > > > > > > > Currently, an end-to-end implementation of Java (mr) Parquet
> > > > > encryption
> > > > > > > is
> > > > > > > > collected in this branch:
> > > > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > > > Thanks to Xinli for working with this branch code, and
> > > contributing
> > > > > to
> > > > > > it
> > > > > > > > based on his field experience. Everybody is welcome to do the
> > > same.
> > > > > > > > @All - it would be helpful to review & merge the above PRs in
> > > > > > > > apache/parquet-mr/encryption, so that folks can work with it
> > > > instead
> > > > > of
> > > > > > > my
> > > > > > > > private branch..
> > > > > > > >
> > > > > > > > And I certainly second Tham's call to review & merge the
> > > > parquet-cpp
> > > > > > pull
> > > > > > > > requests. By now, we have a number of companies starting to
> > > utilize
> > > > > > > Parquet
> > > > > > > > encryption (both C++ and Java), including IBM.
> > > > > > > >
> > > > > > > >
> > > > > > > > Cheers, Gidon.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <th...@emotiv.com>
> > wrote:
> > > > > > > >
> > > > > > > > > Hi community,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > After a long time of development, I'm honor to announce that
> > we
> > > > > have
> > > > > > > just
> > > > > > > > > completed C++ parquet encryption module which implements
> > > > encryption
> > > > > > in
> > > > > > > low
> > > > > > > > > level api and with examples included.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > To have this feature completed, I would like to thank Gidon
> > and
> > > > > > > Revital for
> > > > > > > > > their contribution.
> > > > > > > > >
> > > > > > > > > Gidon had a key role in encryption design and in writing Java
> > > > > version
> > > > > > > code
> > > > > > > > > on which we based on to write C++ version. He also wrote
> > crypto
> > > > > > > package in
> > > > > > > > > C++ version.
> > > > > > > > >
> > > > > > > > > Revital and me has been joining together in writing C++
> > > version.
> > > > > > > Revital
> > > > > > > > > was responsible for AAD calculations, API updating (to be the
> > > > same
> > > > > > with
> > > > > > > > > Java version) and Java-C++ inter-operation testing. I was
> > > writing
> > > > > the
> > > > > > > first
> > > > > > > > > draft (properties, metadata, writer, reader) and keeping them
> > > > > updated
> > > > > > > when
> > > > > > > > > crypto package change.
> > > > > > > > >
> > > > > > > > > We have had a great time to cooperate. Thank Gidon and
> > Revital
> > > > for
> > > > > > all
> > > > > > > > > guide and experience I have received from them, too.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Here are the links of pull requests:
> > > > > > > > >
> > > > > > > > > 1) encryption module (properties, metadata, writer, reader):
> > > > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > > > >
> > > > > > > > > 2) some merged pull requests for new thrift structure and
> > > crypto
> > > > > > > algorithm,
> > > > > > > > > and one still open:
> > https://github.com/apache/arrow/pull/3520
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > However, in order to make (1) buildable with current build
> > > > scripts,
> > > > > > we
> > > > > > > need
> > > > > > > > > “adding openssl in C++ build toolchain” which is mentioned in
> > > > this
> > > > > > > jira:
> > > > > > > > > https://issues.apache.org/jira/browse/ARROW-4302. I will be
> > > > > grateful
> > > > > > > if
> > > > > > > > > someone could help fullfill this work.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > About current pull requests, they has been currently using in
> > > our
> > > > > > > > > development phase at Emotiv (https://www.emotiv.com/). We
> > love
> > > > > using
> > > > > > > > > parquet files to store EEG data. We are going to release a
> > > > product
> > > > > > with
> > > > > > > > > encrypted parquet files soon and look forward to the official
> > > > > release
> > > > > > > of
> > > > > > > > > parquet encryption feature. So it will be many thank and
> > great
> > > > > honor
> > > > > > to
> > > > > > > > > have you review and merge them (if qualified).
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thank you very much!
> > > > > > > > >
> > > > > > > > > Tham
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > regards,
> > > > > > Deepak Majeti
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > regards,
> > > > Deepak Majeti
> > > >
> > >
> >
> >
> > --
> > regards,
> > Deepak Majeti
> >

Re: Announcement for c++ parquet encryption

Posted by Gidon Gershinsky <gg...@gmail.com>.
Hi Deepak,

Sounds good. Here goes the current cpp interop sample (not final yet). We
have a sample with a similar code in Java.
https://github.com/newrevit13/arrow/blob/interop_tests/cpp/examples/parquet/low-level-api/encryption-interop-tests.cc


Please let me know if we should add this file to the cpp PR and its staging
branch (under parquet/low-level-api, or another folder?)
- or leave this file in our private branch.

Cheers, Gidon.

On Tue, May 28, 2019 at 7:25 AM Deepak Majeti <ma...@gmail.com>
wrote:

> Hi Gidon,
>
> I agree with your point 2). I merged the PR for OpenSSL build support. We
> can do the same for PR-3520 after you push the changes this week. We can
> then simply rebase PR-2555 and make the final review.
> For 1) I would like to take a look at the inter-op testing framework first.
> Can you share the inter-op sample? Thanks.
>
>
> On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> > Hi Deepak,
> >
> > If it simplifies the code review, we can make this change. Actually,
> we're
> > already creating a single branch that has the parquet-cpp with encryption
> > in one place, ready for build and sample run. This is a private branch -
> > but we can do the same for the pull request (2555), making it the staging
> > branch. However, there are a number of technical details we need to
> address
> > first:
> >
> > 1) The private branch we're creating contains not only the cpp encryption
> > sample, but also an additional interop sample that checks
> interoperability
> > with parquet-mr encryption.
> > Do we want to add this interop sample to the staging branch? Should this
> > sample be merged in arrow/parquet-cpp for future interop testing with
> Java
> > encryption? Or this sample can stay in the staging branch for the
> duration
> > of the review process, and be removed just before the staging PR is
> merged
> > in arrow/parquet-cpp? Or we dont add this interop sample to the staging
> > branch at all, and keep our private branch for interop testing? Let us
> know
> > your preference.
> >
> > 2) PR 3520 has been reviewed. We'll make the requested changes this week,
> > and update the PR. It adds 2 new files, and doesnt touch the existing
> ones
> > - if you guys will be ok with the updated version, you can merge it, and
> > then the staging PR (2555) will automatically pick the 3520 patch upon
> > rebasing to the then-current master state. But, if further changes will
> be
> > required in the #3520 files, we'll add them to the staging PR.
> >
> > Cheers, Gidon.
> >
> > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <ma...@gmail.com>
> > wrote:
> >
> > > Hi Gidon and Tham,
> > >
> > > I opened a PR for OpenSSL build support here:
> > > https://github.com/apache/arrow/pull/4384/files
> > > However, your PRs need to be rebased/merged with master.
> > > Why don't we keep one staging PR for C++? Probably make
> > > https://github.com/apache/arrow/pull/2555 the staging PR and
> cherry-pick
> > > https://github.com/apache/arrow/pull/3520/files and the above PR onto
> > it.
> > > The later PRs are tiny anyway. This simplifies code review, syncing
> with
> > > the master, as well as building the encryption work while reviewing.
> > >
> > >
> > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <gg...@gmail.com>
> > > wrote:
> > >
> > > > Thanks guys. We'll create a doc with a description of the current
> > interop
> > > > tests, including details on how to build/run them.
> > > >
> > > >
> > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <
> majeti.deepak@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > I will make the first attempt to review the C++ code. I reviewed
> part
> > > of
> > > > it
> > > > > a while ago.
> > > > > Java-C++ inter-operation testing is my main concern. Can you share
> > some
> > > > > details on what has been tested?
> > > > > I need to update myself with the key management part and see if we
> > can
> > > > add
> > > > > some test parquet files written in Java along with the keys to the
> > data
> > > > > folder.
> > > > >
> > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <wesmckinn@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > hi folks,
> > > > > >
> > > > > > I'm extraordinarily busy but it seems likely that the code review
> > for
> > > > > > Parquet C++ is largely going to fall on my shoulders. Since this
> > is a
> > > > > > large patch with many touch-points in the C++ codebase, and there
> > > > > > hasn't been much feedback over the course of its development, I
> > would
> > > > > > like to leave thoughtful code review so that we can avoid issues
> > that
> > > > > > may lead to burdensome maintenance issues.
> > > > > >
> > > > > > On the OpenSSL issue -- there are many CMake projects in the wild
> > > that
> > > > > > use OpenSSL and there are many examples of how to incorporate
> other
> > > > > > dependencies into the Arrow build system. I don't have time to
> work
> > > on
> > > > > > this right now so someone else will have to sort out the
> toolchain
> > > > > > issue
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <
> gg5070@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > <re-sending from another account>
> > > > > > >
> > > > > > > Tham, thank you for this! and for volunteering early for the
> C++
> > > > > version
> > > > > > > work, driving it forward and creating the bulk of the
> parquet-cpp
> > > > > > > encryption code along the way.
> > > > > > >
> > > > > > > @All - this announcement means that two implementations of
> > Parquet
> > > > > > > encryption, fully conforming to the formal specification, are
> > > > available
> > > > > > > today. Thanks to Revital for contributing to C++ version
> > compliance
> > > > > with
> > > > > > > the encryption spec, and for running a set of basic Java-C++
> > > > encryption
> > > > > > > interoperability tests. We have tested plaintext and encrypted
> > > footer
> > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy readers.
> Files
> > > > > written
> > > > > > > with parquet-cpp are successfully parsed by parquet-mr, and
> vice
> > > > versa.
> > > > > > >
> > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for their
> > support
> > > > and
> > > > > > vote
> > > > > > > for the encryption specification - along with the PMC folks.
> > > > > > >
> > > > > > > All parquet-format pull requests are merged by now into the
> > > > encryption
> > > > > > > branch,
> > > > > > > https://github.com/apache/parquet-format/tree/encryption
> > > > > > >
> > > > > > > The community is welcome to review the parquet-mr pull
> requests,
> > in
> > > > the
> > > > > > > following order:
> > > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > > >
> > > > > > >
> > > > > > > Currently, an end-to-end implementation of Java (mr) Parquet
> > > > encryption
> > > > > > is
> > > > > > > collected in this branch:
> > > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > > Thanks to Xinli for working with this branch code, and
> > contributing
> > > > to
> > > > > it
> > > > > > > based on his field experience. Everybody is welcome to do the
> > same.
> > > > > > > @All - it would be helpful to review & merge the above PRs in
> > > > > > > apache/parquet-mr/encryption, so that folks can work with it
> > > instead
> > > > of
> > > > > > my
> > > > > > > private branch..
> > > > > > >
> > > > > > > And I certainly second Tham's call to review & merge the
> > > parquet-cpp
> > > > > pull
> > > > > > > requests. By now, we have a number of companies starting to
> > utilize
> > > > > > Parquet
> > > > > > > encryption (both C++ and Java), including IBM.
> > > > > > >
> > > > > > >
> > > > > > > Cheers, Gidon.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <th...@emotiv.com>
> wrote:
> > > > > > >
> > > > > > > > Hi community,
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > After a long time of development, I'm honor to announce that
> we
> > > > have
> > > > > > just
> > > > > > > > completed C++ parquet encryption module which implements
> > > encryption
> > > > > in
> > > > > > low
> > > > > > > > level api and with examples included.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > To have this feature completed, I would like to thank Gidon
> and
> > > > > > Revital for
> > > > > > > > their contribution.
> > > > > > > >
> > > > > > > > Gidon had a key role in encryption design and in writing Java
> > > > version
> > > > > > code
> > > > > > > > on which we based on to write C++ version. He also wrote
> crypto
> > > > > > package in
> > > > > > > > C++ version.
> > > > > > > >
> > > > > > > > Revital and me has been joining together in writing C++
> > version.
> > > > > > Revital
> > > > > > > > was responsible for AAD calculations, API updating (to be the
> > > same
> > > > > with
> > > > > > > > Java version) and Java-C++ inter-operation testing. I was
> > writing
> > > > the
> > > > > > first
> > > > > > > > draft (properties, metadata, writer, reader) and keeping them
> > > > updated
> > > > > > when
> > > > > > > > crypto package change.
> > > > > > > >
> > > > > > > > We have had a great time to cooperate. Thank Gidon and
> Revital
> > > for
> > > > > all
> > > > > > > > guide and experience I have received from them, too.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Here are the links of pull requests:
> > > > > > > >
> > > > > > > > 1) encryption module (properties, metadata, writer, reader):
> > > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > > >
> > > > > > > > 2) some merged pull requests for new thrift structure and
> > crypto
> > > > > > algorithm,
> > > > > > > > and one still open:
> https://github.com/apache/arrow/pull/3520
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > However, in order to make (1) buildable with current build
> > > scripts,
> > > > > we
> > > > > > need
> > > > > > > > “adding openssl in C++ build toolchain” which is mentioned in
> > > this
> > > > > > jira:
> > > > > > > > https://issues.apache.org/jira/browse/ARROW-4302. I will be
> > > > grateful
> > > > > > if
> > > > > > > > someone could help fullfill this work.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > About current pull requests, they has been currently using in
> > our
> > > > > > > > development phase at Emotiv (https://www.emotiv.com/). We
> love
> > > > using
> > > > > > > > parquet files to store EEG data. We are going to release a
> > > product
> > > > > with
> > > > > > > > encrypted parquet files soon and look forward to the official
> > > > release
> > > > > > of
> > > > > > > > parquet encryption feature. So it will be many thank and
> great
> > > > honor
> > > > > to
> > > > > > > > have you review and merge them (if qualified).
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Thank you very much!
> > > > > > > >
> > > > > > > > Tham
> > > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > regards,
> > > > > Deepak Majeti
> > > > >
> > > >
> > >
> > >
> > > --
> > > regards,
> > > Deepak Majeti
> > >
> >
>
>
> --
> regards,
> Deepak Majeti
>

RE: Announcement for c++ parquet encryption

Posted by tham <th...@emotiv.com>.
Hi Gidon,

Sure. I’m working on it.

Regards, Tham

From: Gidon Gershinsky
Sent: Thursday, May 30, 2019 4:39 AM
To: dev@parquet.apache.org
Subject: Re: Announcement for c++ parquet encryption

Wes, thanks for reviewing and merging the 3520. Tham, can you please rebase
2555 to take these and Deepak's changes in, so your PR becomes the staging
branch.
Revital will then send an additional input to your branch. Afterwards, with
your changes and additions, please let us know when PR 2555 is ready for a
review.

Cheers, Gidon.

On Tue, May 28, 2019 at 7:25 AM Deepak Majeti <ma...@gmail.com>
wrote:

> Hi Gidon,
>
> I agree with your point 2). I merged the PR for OpenSSL build support. We
> can do the same for PR-3520 after you push the changes this week. We can
> then simply rebase PR-2555 and make the final review.
> For 1) I would like to take a look at the inter-op testing framework first.
> Can you share the inter-op sample? Thanks.
>
>
> On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> > Hi Deepak,
> >
> > If it simplifies the code review, we can make this change. Actually,
> we're
> > already creating a single branch that has the parquet-cpp with encryption
> > in one place, ready for build and sample run. This is a private branch -
> > but we can do the same for the pull request (2555), making it the staging
> > branch. However, there are a number of technical details we need to
> address
> > first:
> >
> > 1) The private branch we're creating contains not only the cpp encryption
> > sample, but also an additional interop sample that checks
> interoperability
> > with parquet-mr encryption.
> > Do we want to add this interop sample to the staging branch? Should this
> > sample be merged in arrow/parquet-cpp for future interop testing with
> Java
> > encryption? Or this sample can stay in the staging branch for the
> duration
> > of the review process, and be removed just before the staging PR is
> merged
> > in arrow/parquet-cpp? Or we dont add this interop sample to the staging
> > branch at all, and keep our private branch for interop testing? Let us
> know
> > your preference.
> >
> > 2) PR 3520 has been reviewed. We'll make the requested changes this week,
> > and update the PR. It adds 2 new files, and doesnt touch the existing
> ones
> > - if you guys will be ok with the updated version, you can merge it, and
> > then the staging PR (2555) will automatically pick the 3520 patch upon
> > rebasing to the then-current master state. But, if further changes will
> be
> > required in the #3520 files, we'll add them to the staging PR.
> >
> > Cheers, Gidon.
> >
> > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <ma...@gmail.com>
> > wrote:
> >
> > > Hi Gidon and Tham,
> > >
> > > I opened a PR for OpenSSL build support here:
> > > https://github.com/apache/arrow/pull/4384/files
> > > However, your PRs need to be rebased/merged with master.
> > > Why don't we keep one staging PR for C++? Probably make
> > > https://github.com/apache/arrow/pull/2555 the staging PR and
> cherry-pick
> > > https://github.com/apache/arrow/pull/3520/files and the above PR onto
> > it.
> > > The later PRs are tiny anyway. This simplifies code review, syncing
> with
> > > the master, as well as building the encryption work while reviewing.
> > >
> > >
> > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <gg...@gmail.com>
> > > wrote:
> > >
> > > > Thanks guys. We'll create a doc with a description of the current
> > interop
> > > > tests, including details on how to build/run them.
> > > >
> > > >
> > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <
> majeti.deepak@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > I will make the first attempt to review the C++ code. I reviewed
> part
> > > of
> > > > it
> > > > > a while ago.
> > > > > Java-C++ inter-operation testing is my main concern. Can you share
> > some
> > > > > details on what has been tested?
> > > > > I need to update myself with the key management part and see if we
> > can
> > > > add
> > > > > some test parquet files written in Java along with the keys to the
> > data
> > > > > folder.
> > > > >
> > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <wesmckinn@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > hi folks,
> > > > > >
> > > > > > I'm extraordinarily busy but it seems likely that the code review
> > for
> > > > > > Parquet C++ is largely going to fall on my shoulders. Since this
> > is a
> > > > > > large patch with many touch-points in the C++ codebase, and there
> > > > > > hasn't been much feedback over the course of its development, I
> > would
> > > > > > like to leave thoughtful code review so that we can avoid issues
> > that
> > > > > > may lead to burdensome maintenance issues.
> > > > > >
> > > > > > On the OpenSSL issue -- there are many CMake projects in the wild
> > > that
> > > > > > use OpenSSL and there are many examples of how to incorporate
> other
> > > > > > dependencies into the Arrow build system. I don't have time to
> work
> > > on
> > > > > > this right now so someone else will have to sort out the
> toolchain
> > > > > > issue
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <
> gg5070@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > <re-sending from another account>
> > > > > > >
> > > > > > > Tham, thank you for this! and for volunteering early for the
> C++
> > > > > version
> > > > > > > work, driving it forward and creating the bulk of the
> parquet-cpp
> > > > > > > encryption code along the way.
> > > > > > >
> > > > > > > @All - this announcement means that two implementations of
> > Parquet
> > > > > > > encryption, fully conforming to the formal specification, are
> > > > available
> > > > > > > today. Thanks to Revital for contributing to C++ version
> > compliance
> > > > > with
> > > > > > > the encryption spec, and for running a set of basic Java-C++
> > > > encryption
> > > > > > > interoperability tests. We have tested plaintext and encrypted
> > > footer
> > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy readers.
> Files
> > > > > written
> > > > > > > with parquet-cpp are successfully parsed by parquet-mr, and
> vice
> > > > versa.
> > > > > > >
> > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for their
> > support
> > > > and
> > > > > > vote
> > > > > > > for the encryption specification - along with the PMC folks.
> > > > > > >
> > > > > > > All parquet-format pull requests are merged by now into the
> > > > encryption
> > > > > > > branch,
> > > > > > > https://github.com/apache/parquet-format/tree/encryption
> > > > > > >
> > > > > > > The community is welcome to review the parquet-mr pull
> requests,
> > in
> > > > the
> > > > > > > following order:
> > > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > > >
> > > > > > >
> > > > > > > Currently, an end-to-end implementation of Java (mr) Parquet
> > > > encryption
> > > > > > is
> > > > > > > collected in this branch:
> > > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > > Thanks to Xinli for working with this branch code, and
> > contributing
> > > > to
> > > > > it
> > > > > > > based on his field experience. Everybody is welcome to do the
> > same.
> > > > > > > @All - it would be helpful to review & merge the above PRs in
> > > > > > > apache/parquet-mr/encryption, so that folks can work with it
> > > instead
> > > > of
> > > > > > my
> > > > > > > private branch..
> > > > > > >
> > > > > > > And I certainly second Tham's call to review & merge the
> > > parquet-cpp
> > > > > pull
> > > > > > > requests. By now, we have a number of companies starting to
> > utilize
> > > > > > Parquet
> > > > > > > encryption (both C++ and Java), including IBM.
> > > > > > >
> > > > > > >
> > > > > > > Cheers, Gidon.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <th...@emotiv.com>
> wrote:
> > > > > > >
> > > > > > > > Hi community,
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > After a long time of development, I'm honor to announce that
> we
> > > > have
> > > > > > just
> > > > > > > > completed C++ parquet encryption module which implements
> > > encryption
> > > > > in
> > > > > > low
> > > > > > > > level api and with examples included.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > To have this feature completed, I would like to thank Gidon
> and
> > > > > > Revital for
> > > > > > > > their contribution.
> > > > > > > >
> > > > > > > > Gidon had a key role in encryption design and in writing Java
> > > > version
> > > > > > code
> > > > > > > > on which we based on to write C++ version. He also wrote
> crypto
> > > > > > package in
> > > > > > > > C++ version.
> > > > > > > >
> > > > > > > > Revital and me has been joining together in writing C++
> > version.
> > > > > > Revital
> > > > > > > > was responsible for AAD calculations, API updating (to be the
> > > same
> > > > > with
> > > > > > > > Java version) and Java-C++ inter-operation testing. I was
> > writing
> > > > the
> > > > > > first
> > > > > > > > draft (properties, metadata, writer, reader) and keeping them
> > > > updated
> > > > > > when
> > > > > > > > crypto package change.
> > > > > > > >
> > > > > > > > We have had a great time to cooperate. Thank Gidon and
> Revital
> > > for
> > > > > all
> > > > > > > > guide and experience I have received from them, too.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Here are the links of pull requests:
> > > > > > > >
> > > > > > > > 1) encryption module (properties, metadata, writer, reader):
> > > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > > >
> > > > > > > > 2) some merged pull requests for new thrift structure and
> > crypto
> > > > > > algorithm,
> > > > > > > > and one still open:
> https://github.com/apache/arrow/pull/3520
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > However, in order to make (1) buildable with current build
> > > scripts,
> > > > > we
> > > > > > need
> > > > > > > > “adding openssl in C++ build toolchain” which is mentioned in
> > > this
> > > > > > jira:
> > > > > > > > https://issues.apache.org/jira/browse/ARROW-4302. I will be
> > > > grateful
> > > > > > if
> > > > > > > > someone could help fullfill this work.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > About current pull requests, they has been currently using in
> > our
> > > > > > > > development phase at Emotiv (https://www.emotiv.com/). We
> love
> > > > using
> > > > > > > > parquet files to store EEG data. We are going to release a
> > > product
> > > > > with
> > > > > > > > encrypted parquet files soon and look forward to the official
> > > > release
> > > > > > of
> > > > > > > > parquet encryption feature. So it will be many thank and
> great
> > > > honor
> > > > > to
> > > > > > > > have you review and merge them (if qualified).
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Thank you very much!
> > > > > > > >
> > > > > > > > Tham
> > > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > regards,
> > > > > Deepak Majeti
> > > > >
> > > >
> > >
> > >
> > > --
> > > regards,
> > > Deepak Majeti
> > >
> >
>
>
> --
> regards,
> Deepak Majeti
>


Re: Announcement for c++ parquet encryption

Posted by Gidon Gershinsky <gg...@gmail.com>.
Wes, thanks for reviewing and merging the 3520. Tham, can you please rebase
2555 to take these and Deepak's changes in, so your PR becomes the staging
branch.
Revital will then send an additional input to your branch. Afterwards, with
your changes and additions, please let us know when PR 2555 is ready for a
review.

Cheers, Gidon.

On Tue, May 28, 2019 at 7:25 AM Deepak Majeti <ma...@gmail.com>
wrote:

> Hi Gidon,
>
> I agree with your point 2). I merged the PR for OpenSSL build support. We
> can do the same for PR-3520 after you push the changes this week. We can
> then simply rebase PR-2555 and make the final review.
> For 1) I would like to take a look at the inter-op testing framework first.
> Can you share the inter-op sample? Thanks.
>
>
> On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> > Hi Deepak,
> >
> > If it simplifies the code review, we can make this change. Actually,
> we're
> > already creating a single branch that has the parquet-cpp with encryption
> > in one place, ready for build and sample run. This is a private branch -
> > but we can do the same for the pull request (2555), making it the staging
> > branch. However, there are a number of technical details we need to
> address
> > first:
> >
> > 1) The private branch we're creating contains not only the cpp encryption
> > sample, but also an additional interop sample that checks
> interoperability
> > with parquet-mr encryption.
> > Do we want to add this interop sample to the staging branch? Should this
> > sample be merged in arrow/parquet-cpp for future interop testing with
> Java
> > encryption? Or this sample can stay in the staging branch for the
> duration
> > of the review process, and be removed just before the staging PR is
> merged
> > in arrow/parquet-cpp? Or we dont add this interop sample to the staging
> > branch at all, and keep our private branch for interop testing? Let us
> know
> > your preference.
> >
> > 2) PR 3520 has been reviewed. We'll make the requested changes this week,
> > and update the PR. It adds 2 new files, and doesnt touch the existing
> ones
> > - if you guys will be ok with the updated version, you can merge it, and
> > then the staging PR (2555) will automatically pick the 3520 patch upon
> > rebasing to the then-current master state. But, if further changes will
> be
> > required in the #3520 files, we'll add them to the staging PR.
> >
> > Cheers, Gidon.
> >
> > On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <ma...@gmail.com>
> > wrote:
> >
> > > Hi Gidon and Tham,
> > >
> > > I opened a PR for OpenSSL build support here:
> > > https://github.com/apache/arrow/pull/4384/files
> > > However, your PRs need to be rebased/merged with master.
> > > Why don't we keep one staging PR for C++? Probably make
> > > https://github.com/apache/arrow/pull/2555 the staging PR and
> cherry-pick
> > > https://github.com/apache/arrow/pull/3520/files and the above PR onto
> > it.
> > > The later PRs are tiny anyway. This simplifies code review, syncing
> with
> > > the master, as well as building the encryption work while reviewing.
> > >
> > >
> > > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <gg...@gmail.com>
> > > wrote:
> > >
> > > > Thanks guys. We'll create a doc with a description of the current
> > interop
> > > > tests, including details on how to build/run them.
> > > >
> > > >
> > > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <
> majeti.deepak@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > I will make the first attempt to review the C++ code. I reviewed
> part
> > > of
> > > > it
> > > > > a while ago.
> > > > > Java-C++ inter-operation testing is my main concern. Can you share
> > some
> > > > > details on what has been tested?
> > > > > I need to update myself with the key management part and see if we
> > can
> > > > add
> > > > > some test parquet files written in Java along with the keys to the
> > data
> > > > > folder.
> > > > >
> > > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <wesmckinn@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > hi folks,
> > > > > >
> > > > > > I'm extraordinarily busy but it seems likely that the code review
> > for
> > > > > > Parquet C++ is largely going to fall on my shoulders. Since this
> > is a
> > > > > > large patch with many touch-points in the C++ codebase, and there
> > > > > > hasn't been much feedback over the course of its development, I
> > would
> > > > > > like to leave thoughtful code review so that we can avoid issues
> > that
> > > > > > may lead to burdensome maintenance issues.
> > > > > >
> > > > > > On the OpenSSL issue -- there are many CMake projects in the wild
> > > that
> > > > > > use OpenSSL and there are many examples of how to incorporate
> other
> > > > > > dependencies into the Arrow build system. I don't have time to
> work
> > > on
> > > > > > this right now so someone else will have to sort out the
> toolchain
> > > > > > issue
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <
> gg5070@gmail.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > <re-sending from another account>
> > > > > > >
> > > > > > > Tham, thank you for this! and for volunteering early for the
> C++
> > > > > version
> > > > > > > work, driving it forward and creating the bulk of the
> parquet-cpp
> > > > > > > encryption code along the way.
> > > > > > >
> > > > > > > @All - this announcement means that two implementations of
> > Parquet
> > > > > > > encryption, fully conforming to the formal specification, are
> > > > available
> > > > > > > today. Thanks to Revital for contributing to C++ version
> > compliance
> > > > > with
> > > > > > > the encryption spec, and for running a set of basic Java-C++
> > > > encryption
> > > > > > > interoperability tests. We have tested plaintext and encrypted
> > > footer
> > > > > > > modes, GCM and GCM_CTR algorithms, new and legacy readers.
> Files
> > > > > written
> > > > > > > with parquet-cpp are successfully parsed by parquet-mr, and
> vice
> > > > versa.
> > > > > > >
> > > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for their
> > support
> > > > and
> > > > > > vote
> > > > > > > for the encryption specification - along with the PMC folks.
> > > > > > >
> > > > > > > All parquet-format pull requests are merged by now into the
> > > > encryption
> > > > > > > branch,
> > > > > > > https://github.com/apache/parquet-format/tree/encryption
> > > > > > >
> > > > > > > The community is welcome to review the parquet-mr pull
> requests,
> > in
> > > > the
> > > > > > > following order:
> > > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > > >
> > > > > > >
> > > > > > > Currently, an end-to-end implementation of Java (mr) Parquet
> > > > encryption
> > > > > > is
> > > > > > > collected in this branch:
> > > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > > Thanks to Xinli for working with this branch code, and
> > contributing
> > > > to
> > > > > it
> > > > > > > based on his field experience. Everybody is welcome to do the
> > same.
> > > > > > > @All - it would be helpful to review & merge the above PRs in
> > > > > > > apache/parquet-mr/encryption, so that folks can work with it
> > > instead
> > > > of
> > > > > > my
> > > > > > > private branch..
> > > > > > >
> > > > > > > And I certainly second Tham's call to review & merge the
> > > parquet-cpp
> > > > > pull
> > > > > > > requests. By now, we have a number of companies starting to
> > utilize
> > > > > > Parquet
> > > > > > > encryption (both C++ and Java), including IBM.
> > > > > > >
> > > > > > >
> > > > > > > Cheers, Gidon.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <th...@emotiv.com>
> wrote:
> > > > > > >
> > > > > > > > Hi community,
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > After a long time of development, I'm honor to announce that
> we
> > > > have
> > > > > > just
> > > > > > > > completed C++ parquet encryption module which implements
> > > encryption
> > > > > in
> > > > > > low
> > > > > > > > level api and with examples included.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > To have this feature completed, I would like to thank Gidon
> and
> > > > > > Revital for
> > > > > > > > their contribution.
> > > > > > > >
> > > > > > > > Gidon had a key role in encryption design and in writing Java
> > > > version
> > > > > > code
> > > > > > > > on which we based on to write C++ version. He also wrote
> crypto
> > > > > > package in
> > > > > > > > C++ version.
> > > > > > > >
> > > > > > > > Revital and me has been joining together in writing C++
> > version.
> > > > > > Revital
> > > > > > > > was responsible for AAD calculations, API updating (to be the
> > > same
> > > > > with
> > > > > > > > Java version) and Java-C++ inter-operation testing. I was
> > writing
> > > > the
> > > > > > first
> > > > > > > > draft (properties, metadata, writer, reader) and keeping them
> > > > updated
> > > > > > when
> > > > > > > > crypto package change.
> > > > > > > >
> > > > > > > > We have had a great time to cooperate. Thank Gidon and
> Revital
> > > for
> > > > > all
> > > > > > > > guide and experience I have received from them, too.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Here are the links of pull requests:
> > > > > > > >
> > > > > > > > 1) encryption module (properties, metadata, writer, reader):
> > > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > > >
> > > > > > > > 2) some merged pull requests for new thrift structure and
> > crypto
> > > > > > algorithm,
> > > > > > > > and one still open:
> https://github.com/apache/arrow/pull/3520
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > However, in order to make (1) buildable with current build
> > > scripts,
> > > > > we
> > > > > > need
> > > > > > > > “adding openssl in C++ build toolchain” which is mentioned in
> > > this
> > > > > > jira:
> > > > > > > > https://issues.apache.org/jira/browse/ARROW-4302. I will be
> > > > grateful
> > > > > > if
> > > > > > > > someone could help fullfill this work.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > About current pull requests, they has been currently using in
> > our
> > > > > > > > development phase at Emotiv (https://www.emotiv.com/). We
> love
> > > > using
> > > > > > > > parquet files to store EEG data. We are going to release a
> > > product
> > > > > with
> > > > > > > > encrypted parquet files soon and look forward to the official
> > > > release
> > > > > > of
> > > > > > > > parquet encryption feature. So it will be many thank and
> great
> > > > honor
> > > > > to
> > > > > > > > have you review and merge them (if qualified).
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Thank you very much!
> > > > > > > >
> > > > > > > > Tham
> > > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > regards,
> > > > > Deepak Majeti
> > > > >
> > > >
> > >
> > >
> > > --
> > > regards,
> > > Deepak Majeti
> > >
> >
>
>
> --
> regards,
> Deepak Majeti
>

Re: Announcement for c++ parquet encryption

Posted by Deepak Majeti <ma...@gmail.com>.
Hi Gidon,

I agree with your point 2). I merged the PR for OpenSSL build support. We
can do the same for PR-3520 after you push the changes this week. We can
then simply rebase PR-2555 and make the final review.
For 1) I would like to take a look at the inter-op testing framework first.
Can you share the inter-op sample? Thanks.


On Mon, May 27, 2019 at 7:21 PM Gidon Gershinsky <gg...@gmail.com> wrote:

> Hi Deepak,
>
> If it simplifies the code review, we can make this change. Actually, we're
> already creating a single branch that has the parquet-cpp with encryption
> in one place, ready for build and sample run. This is a private branch -
> but we can do the same for the pull request (2555), making it the staging
> branch. However, there are a number of technical details we need to address
> first:
>
> 1) The private branch we're creating contains not only the cpp encryption
> sample, but also an additional interop sample that checks interoperability
> with parquet-mr encryption.
> Do we want to add this interop sample to the staging branch? Should this
> sample be merged in arrow/parquet-cpp for future interop testing with Java
> encryption? Or this sample can stay in the staging branch for the duration
> of the review process, and be removed just before the staging PR is merged
> in arrow/parquet-cpp? Or we dont add this interop sample to the staging
> branch at all, and keep our private branch for interop testing? Let us know
> your preference.
>
> 2) PR 3520 has been reviewed. We'll make the requested changes this week,
> and update the PR. It adds 2 new files, and doesnt touch the existing ones
> - if you guys will be ok with the updated version, you can merge it, and
> then the staging PR (2555) will automatically pick the 3520 patch upon
> rebasing to the then-current master state. But, if further changes will be
> required in the #3520 files, we'll add them to the staging PR.
>
> Cheers, Gidon.
>
> On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <ma...@gmail.com>
> wrote:
>
> > Hi Gidon and Tham,
> >
> > I opened a PR for OpenSSL build support here:
> > https://github.com/apache/arrow/pull/4384/files
> > However, your PRs need to be rebased/merged with master.
> > Why don't we keep one staging PR for C++? Probably make
> > https://github.com/apache/arrow/pull/2555 the staging PR and cherry-pick
> > https://github.com/apache/arrow/pull/3520/files and the above PR onto
> it.
> > The later PRs are tiny anyway. This simplifies code review, syncing with
> > the master, as well as building the encryption work while reviewing.
> >
> >
> > On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <gg...@gmail.com>
> > wrote:
> >
> > > Thanks guys. We'll create a doc with a description of the current
> interop
> > > tests, including details on how to build/run them.
> > >
> > >
> > > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <majeti.deepak@gmail.com
> >
> > > wrote:
> > >
> > > > I will make the first attempt to review the C++ code. I reviewed part
> > of
> > > it
> > > > a while ago.
> > > > Java-C++ inter-operation testing is my main concern. Can you share
> some
> > > > details on what has been tested?
> > > > I need to update myself with the key management part and see if we
> can
> > > add
> > > > some test parquet files written in Java along with the keys to the
> data
> > > > folder.
> > > >
> > > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <we...@gmail.com>
> > > wrote:
> > > >
> > > > > hi folks,
> > > > >
> > > > > I'm extraordinarily busy but it seems likely that the code review
> for
> > > > > Parquet C++ is largely going to fall on my shoulders. Since this
> is a
> > > > > large patch with many touch-points in the C++ codebase, and there
> > > > > hasn't been much feedback over the course of its development, I
> would
> > > > > like to leave thoughtful code review so that we can avoid issues
> that
> > > > > may lead to burdensome maintenance issues.
> > > > >
> > > > > On the OpenSSL issue -- there are many CMake projects in the wild
> > that
> > > > > use OpenSSL and there are many examples of how to incorporate other
> > > > > dependencies into the Arrow build system. I don't have time to work
> > on
> > > > > this right now so someone else will have to sort out the toolchain
> > > > > issue
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <gg5070@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > <re-sending from another account>
> > > > > >
> > > > > > Tham, thank you for this! and for volunteering early for the C++
> > > > version
> > > > > > work, driving it forward and creating the bulk of the parquet-cpp
> > > > > > encryption code along the way.
> > > > > >
> > > > > > @All - this announcement means that two implementations of
> Parquet
> > > > > > encryption, fully conforming to the formal specification, are
> > > available
> > > > > > today. Thanks to Revital for contributing to C++ version
> compliance
> > > > with
> > > > > > the encryption spec, and for running a set of basic Java-C++
> > > encryption
> > > > > > interoperability tests. We have tested plaintext and encrypted
> > footer
> > > > > > modes, GCM and GCM_CTR algorithms, new and legacy readers. Files
> > > > written
> > > > > > with parquet-cpp are successfully parsed by parquet-mr, and vice
> > > versa.
> > > > > >
> > > > > > Let me also thank Junjie, Nandor, Anna and Xinli for their
> support
> > > and
> > > > > vote
> > > > > > for the encryption specification - along with the PMC folks.
> > > > > >
> > > > > > All parquet-format pull requests are merged by now into the
> > > encryption
> > > > > > branch,
> > > > > > https://github.com/apache/parquet-format/tree/encryption
> > > > > >
> > > > > > The community is welcome to review the parquet-mr pull requests,
> in
> > > the
> > > > > > following order:
> > > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > > https://github.com/apache/parquet-mr/pull/643
> > > > > >
> > > > > >
> > > > > > Currently, an end-to-end implementation of Java (mr) Parquet
> > > encryption
> > > > > is
> > > > > > collected in this branch:
> > > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > > Thanks to Xinli for working with this branch code, and
> contributing
> > > to
> > > > it
> > > > > > based on his field experience. Everybody is welcome to do the
> same.
> > > > > > @All - it would be helpful to review & merge the above PRs in
> > > > > > apache/parquet-mr/encryption, so that folks can work with it
> > instead
> > > of
> > > > > my
> > > > > > private branch..
> > > > > >
> > > > > > And I certainly second Tham's call to review & merge the
> > parquet-cpp
> > > > pull
> > > > > > requests. By now, we have a number of companies starting to
> utilize
> > > > > Parquet
> > > > > > encryption (both C++ and Java), including IBM.
> > > > > >
> > > > > >
> > > > > > Cheers, Gidon.
> > > > > >
> > > > > >
> > > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <th...@emotiv.com> wrote:
> > > > > >
> > > > > > > Hi community,
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > After a long time of development, I'm honor to announce that we
> > > have
> > > > > just
> > > > > > > completed C++ parquet encryption module which implements
> > encryption
> > > > in
> > > > > low
> > > > > > > level api and with examples included.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > To have this feature completed, I would like to thank Gidon and
> > > > > Revital for
> > > > > > > their contribution.
> > > > > > >
> > > > > > > Gidon had a key role in encryption design and in writing Java
> > > version
> > > > > code
> > > > > > > on which we based on to write C++ version. He also wrote crypto
> > > > > package in
> > > > > > > C++ version.
> > > > > > >
> > > > > > > Revital and me has been joining together in writing C++
> version.
> > > > > Revital
> > > > > > > was responsible for AAD calculations, API updating (to be the
> > same
> > > > with
> > > > > > > Java version) and Java-C++ inter-operation testing. I was
> writing
> > > the
> > > > > first
> > > > > > > draft (properties, metadata, writer, reader) and keeping them
> > > updated
> > > > > when
> > > > > > > crypto package change.
> > > > > > >
> > > > > > > We have had a great time to cooperate. Thank Gidon and Revital
> > for
> > > > all
> > > > > > > guide and experience I have received from them, too.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Here are the links of pull requests:
> > > > > > >
> > > > > > > 1) encryption module (properties, metadata, writer, reader):
> > > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > > >
> > > > > > > 2) some merged pull requests for new thrift structure and
> crypto
> > > > > algorithm,
> > > > > > > and one still open: https://github.com/apache/arrow/pull/3520
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > However, in order to make (1) buildable with current build
> > scripts,
> > > > we
> > > > > need
> > > > > > > “adding openssl in C++ build toolchain” which is mentioned in
> > this
> > > > > jira:
> > > > > > > https://issues.apache.org/jira/browse/ARROW-4302. I will be
> > > grateful
> > > > > if
> > > > > > > someone could help fullfill this work.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > About current pull requests, they has been currently using in
> our
> > > > > > > development phase at Emotiv (https://www.emotiv.com/). We love
> > > using
> > > > > > > parquet files to store EEG data. We are going to release a
> > product
> > > > with
> > > > > > > encrypted parquet files soon and look forward to the official
> > > release
> > > > > of
> > > > > > > parquet encryption feature. So it will be many thank and great
> > > honor
> > > > to
> > > > > > > have you review and merge them (if qualified).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thank you very much!
> > > > > > >
> > > > > > > Tham
> > > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > regards,
> > > > Deepak Majeti
> > > >
> > >
> >
> >
> > --
> > regards,
> > Deepak Majeti
> >
>


-- 
regards,
Deepak Majeti

Re: Announcement for c++ parquet encryption

Posted by Gidon Gershinsky <gg...@gmail.com>.
Hi Deepak,

If it simplifies the code review, we can make this change. Actually, we're
already creating a single branch that has the parquet-cpp with encryption
in one place, ready for build and sample run. This is a private branch -
but we can do the same for the pull request (2555), making it the staging
branch. However, there are a number of technical details we need to address
first:

1) The private branch we're creating contains not only the cpp encryption
sample, but also an additional interop sample that checks interoperability
with parquet-mr encryption.
Do we want to add this interop sample to the staging branch? Should this
sample be merged in arrow/parquet-cpp for future interop testing with Java
encryption? Or this sample can stay in the staging branch for the duration
of the review process, and be removed just before the staging PR is merged
in arrow/parquet-cpp? Or we dont add this interop sample to the staging
branch at all, and keep our private branch for interop testing? Let us know
your preference.

2) PR 3520 has been reviewed. We'll make the requested changes this week,
and update the PR. It adds 2 new files, and doesnt touch the existing ones
- if you guys will be ok with the updated version, you can merge it, and
then the staging PR (2555) will automatically pick the 3520 patch upon
rebasing to the then-current master state. But, if further changes will be
required in the #3520 files, we'll add them to the staging PR.

Cheers, Gidon.

On Fri, May 24, 2019 at 9:02 PM Deepak Majeti <ma...@gmail.com>
wrote:

> Hi Gidon and Tham,
>
> I opened a PR for OpenSSL build support here:
> https://github.com/apache/arrow/pull/4384/files
> However, your PRs need to be rebased/merged with master.
> Why don't we keep one staging PR for C++? Probably make
> https://github.com/apache/arrow/pull/2555 the staging PR and cherry-pick
> https://github.com/apache/arrow/pull/3520/files and the above PR onto it.
> The later PRs are tiny anyway. This simplifies code review, syncing with
> the master, as well as building the encryption work while reviewing.
>
>
> On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <gg...@gmail.com>
> wrote:
>
> > Thanks guys. We'll create a doc with a description of the current interop
> > tests, including details on how to build/run them.
> >
> >
> > On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <ma...@gmail.com>
> > wrote:
> >
> > > I will make the first attempt to review the C++ code. I reviewed part
> of
> > it
> > > a while ago.
> > > Java-C++ inter-operation testing is my main concern. Can you share some
> > > details on what has been tested?
> > > I need to update myself with the key management part and see if we can
> > add
> > > some test parquet files written in Java along with the keys to the data
> > > folder.
> > >
> > > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <we...@gmail.com>
> > wrote:
> > >
> > > > hi folks,
> > > >
> > > > I'm extraordinarily busy but it seems likely that the code review for
> > > > Parquet C++ is largely going to fall on my shoulders. Since this is a
> > > > large patch with many touch-points in the C++ codebase, and there
> > > > hasn't been much feedback over the course of its development, I would
> > > > like to leave thoughtful code review so that we can avoid issues that
> > > > may lead to burdensome maintenance issues.
> > > >
> > > > On the OpenSSL issue -- there are many CMake projects in the wild
> that
> > > > use OpenSSL and there are many examples of how to incorporate other
> > > > dependencies into the Arrow build system. I don't have time to work
> on
> > > > this right now so someone else will have to sort out the toolchain
> > > > issue
> > > >
> > > > Thanks
> > > >
> > > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <gg...@gmail.com>
> > > wrote:
> > > > >
> > > > > <re-sending from another account>
> > > > >
> > > > > Tham, thank you for this! and for volunteering early for the C++
> > > version
> > > > > work, driving it forward and creating the bulk of the parquet-cpp
> > > > > encryption code along the way.
> > > > >
> > > > > @All - this announcement means that two implementations of Parquet
> > > > > encryption, fully conforming to the formal specification, are
> > available
> > > > > today. Thanks to Revital for contributing to C++ version compliance
> > > with
> > > > > the encryption spec, and for running a set of basic Java-C++
> > encryption
> > > > > interoperability tests. We have tested plaintext and encrypted
> footer
> > > > > modes, GCM and GCM_CTR algorithms, new and legacy readers. Files
> > > written
> > > > > with parquet-cpp are successfully parsed by parquet-mr, and vice
> > versa.
> > > > >
> > > > > Let me also thank Junjie, Nandor, Anna and Xinli for their support
> > and
> > > > vote
> > > > > for the encryption specification - along with the PMC folks.
> > > > >
> > > > > All parquet-format pull requests are merged by now into the
> > encryption
> > > > > branch,
> > > > > https://github.com/apache/parquet-format/tree/encryption
> > > > >
> > > > > The community is welcome to review the parquet-mr pull requests, in
> > the
> > > > > following order:
> > > > > https://github.com/apache/parquet-mr/pull/613
> > > > > https://github.com/apache/parquet-mr/pull/614
> > > > > https://github.com/apache/parquet-mr/pull/643
> > > > >
> > > > >
> > > > > Currently, an end-to-end implementation of Java (mr) Parquet
> > encryption
> > > > is
> > > > > collected in this branch:
> > > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > > Thanks to Xinli for working with this branch code, and contributing
> > to
> > > it
> > > > > based on his field experience. Everybody is welcome to do the same.
> > > > > @All - it would be helpful to review & merge the above PRs in
> > > > > apache/parquet-mr/encryption, so that folks can work with it
> instead
> > of
> > > > my
> > > > > private branch..
> > > > >
> > > > > And I certainly second Tham's call to review & merge the
> parquet-cpp
> > > pull
> > > > > requests. By now, we have a number of companies starting to utilize
> > > > Parquet
> > > > > encryption (both C++ and Java), including IBM.
> > > > >
> > > > >
> > > > > Cheers, Gidon.
> > > > >
> > > > >
> > > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <th...@emotiv.com> wrote:
> > > > >
> > > > > > Hi community,
> > > > > >
> > > > > >
> > > > > >
> > > > > > After a long time of development, I'm honor to announce that we
> > have
> > > > just
> > > > > > completed C++ parquet encryption module which implements
> encryption
> > > in
> > > > low
> > > > > > level api and with examples included.
> > > > > >
> > > > > >
> > > > > >
> > > > > > To have this feature completed, I would like to thank Gidon and
> > > > Revital for
> > > > > > their contribution.
> > > > > >
> > > > > > Gidon had a key role in encryption design and in writing Java
> > version
> > > > code
> > > > > > on which we based on to write C++ version. He also wrote crypto
> > > > package in
> > > > > > C++ version.
> > > > > >
> > > > > > Revital and me has been joining together in writing C++ version.
> > > > Revital
> > > > > > was responsible for AAD calculations, API updating (to be the
> same
> > > with
> > > > > > Java version) and Java-C++ inter-operation testing. I was writing
> > the
> > > > first
> > > > > > draft (properties, metadata, writer, reader) and keeping them
> > updated
> > > > when
> > > > > > crypto package change.
> > > > > >
> > > > > > We have had a great time to cooperate. Thank Gidon and Revital
> for
> > > all
> > > > > > guide and experience I have received from them, too.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Here are the links of pull requests:
> > > > > >
> > > > > > 1) encryption module (properties, metadata, writer, reader):
> > > > > > https://github.com/apache/arrow/pull/2555.
> > > > > >
> > > > > > 2) some merged pull requests for new thrift structure and crypto
> > > > algorithm,
> > > > > > and one still open: https://github.com/apache/arrow/pull/3520
> > > > > >
> > > > > >
> > > > > >
> > > > > > However, in order to make (1) buildable with current build
> scripts,
> > > we
> > > > need
> > > > > > “adding openssl in C++ build toolchain” which is mentioned in
> this
> > > > jira:
> > > > > > https://issues.apache.org/jira/browse/ARROW-4302. I will be
> > grateful
> > > > if
> > > > > > someone could help fullfill this work.
> > > > > >
> > > > > >
> > > > > >
> > > > > > About current pull requests, they has been currently using in our
> > > > > > development phase at Emotiv (https://www.emotiv.com/). We love
> > using
> > > > > > parquet files to store EEG data. We are going to release a
> product
> > > with
> > > > > > encrypted parquet files soon and look forward to the official
> > release
> > > > of
> > > > > > parquet encryption feature. So it will be many thank and great
> > honor
> > > to
> > > > > > have you review and merge them (if qualified).
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thank you very much!
> > > > > >
> > > > > > Tham
> > > > > >
> > > >
> > >
> > >
> > > --
> > > regards,
> > > Deepak Majeti
> > >
> >
>
>
> --
> regards,
> Deepak Majeti
>

Re: Announcement for c++ parquet encryption

Posted by Deepak Majeti <ma...@gmail.com>.
Hi Gidon and Tham,

I opened a PR for OpenSSL build support here:
https://github.com/apache/arrow/pull/4384/files
However, your PRs need to be rebased/merged with master.
Why don't we keep one staging PR for C++? Probably make
https://github.com/apache/arrow/pull/2555 the staging PR and cherry-pick
https://github.com/apache/arrow/pull/3520/files and the above PR onto it.
The later PRs are tiny anyway. This simplifies code review, syncing with
the master, as well as building the encryption work while reviewing.


On Wed, May 22, 2019 at 12:43 PM Gidon Gershinsky <gg...@gmail.com> wrote:

> Thanks guys. We'll create a doc with a description of the current interop
> tests, including details on how to build/run them.
>
>
> On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <ma...@gmail.com>
> wrote:
>
> > I will make the first attempt to review the C++ code. I reviewed part of
> it
> > a while ago.
> > Java-C++ inter-operation testing is my main concern. Can you share some
> > details on what has been tested?
> > I need to update myself with the key management part and see if we can
> add
> > some test parquet files written in Java along with the keys to the data
> > folder.
> >
> > On Wed, May 22, 2019 at 10:18 AM Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > hi folks,
> > >
> > > I'm extraordinarily busy but it seems likely that the code review for
> > > Parquet C++ is largely going to fall on my shoulders. Since this is a
> > > large patch with many touch-points in the C++ codebase, and there
> > > hasn't been much feedback over the course of its development, I would
> > > like to leave thoughtful code review so that we can avoid issues that
> > > may lead to burdensome maintenance issues.
> > >
> > > On the OpenSSL issue -- there are many CMake projects in the wild that
> > > use OpenSSL and there are many examples of how to incorporate other
> > > dependencies into the Arrow build system. I don't have time to work on
> > > this right now so someone else will have to sort out the toolchain
> > > issue
> > >
> > > Thanks
> > >
> > > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <gg...@gmail.com>
> > wrote:
> > > >
> > > > <re-sending from another account>
> > > >
> > > > Tham, thank you for this! and for volunteering early for the C++
> > version
> > > > work, driving it forward and creating the bulk of the parquet-cpp
> > > > encryption code along the way.
> > > >
> > > > @All - this announcement means that two implementations of Parquet
> > > > encryption, fully conforming to the formal specification, are
> available
> > > > today. Thanks to Revital for contributing to C++ version compliance
> > with
> > > > the encryption spec, and for running a set of basic Java-C++
> encryption
> > > > interoperability tests. We have tested plaintext and encrypted footer
> > > > modes, GCM and GCM_CTR algorithms, new and legacy readers. Files
> > written
> > > > with parquet-cpp are successfully parsed by parquet-mr, and vice
> versa.
> > > >
> > > > Let me also thank Junjie, Nandor, Anna and Xinli for their support
> and
> > > vote
> > > > for the encryption specification - along with the PMC folks.
> > > >
> > > > All parquet-format pull requests are merged by now into the
> encryption
> > > > branch,
> > > > https://github.com/apache/parquet-format/tree/encryption
> > > >
> > > > The community is welcome to review the parquet-mr pull requests, in
> the
> > > > following order:
> > > > https://github.com/apache/parquet-mr/pull/613
> > > > https://github.com/apache/parquet-mr/pull/614
> > > > https://github.com/apache/parquet-mr/pull/643
> > > >
> > > >
> > > > Currently, an end-to-end implementation of Java (mr) Parquet
> encryption
> > > is
> > > > collected in this branch:
> > > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > > Thanks to Xinli for working with this branch code, and contributing
> to
> > it
> > > > based on his field experience. Everybody is welcome to do the same.
> > > > @All - it would be helpful to review & merge the above PRs in
> > > > apache/parquet-mr/encryption, so that folks can work with it instead
> of
> > > my
> > > > private branch..
> > > >
> > > > And I certainly second Tham's call to review & merge the parquet-cpp
> > pull
> > > > requests. By now, we have a number of companies starting to utilize
> > > Parquet
> > > > encryption (both C++ and Java), including IBM.
> > > >
> > > >
> > > > Cheers, Gidon.
> > > >
> > > >
> > > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <th...@emotiv.com> wrote:
> > > >
> > > > > Hi community,
> > > > >
> > > > >
> > > > >
> > > > > After a long time of development, I'm honor to announce that we
> have
> > > just
> > > > > completed C++ parquet encryption module which implements encryption
> > in
> > > low
> > > > > level api and with examples included.
> > > > >
> > > > >
> > > > >
> > > > > To have this feature completed, I would like to thank Gidon and
> > > Revital for
> > > > > their contribution.
> > > > >
> > > > > Gidon had a key role in encryption design and in writing Java
> version
> > > code
> > > > > on which we based on to write C++ version. He also wrote crypto
> > > package in
> > > > > C++ version.
> > > > >
> > > > > Revital and me has been joining together in writing C++ version.
> > > Revital
> > > > > was responsible for AAD calculations, API updating (to be the same
> > with
> > > > > Java version) and Java-C++ inter-operation testing. I was writing
> the
> > > first
> > > > > draft (properties, metadata, writer, reader) and keeping them
> updated
> > > when
> > > > > crypto package change.
> > > > >
> > > > > We have had a great time to cooperate. Thank Gidon and Revital for
> > all
> > > > > guide and experience I have received from them, too.
> > > > >
> > > > >
> > > > >
> > > > > Here are the links of pull requests:
> > > > >
> > > > > 1) encryption module (properties, metadata, writer, reader):
> > > > > https://github.com/apache/arrow/pull/2555.
> > > > >
> > > > > 2) some merged pull requests for new thrift structure and crypto
> > > algorithm,
> > > > > and one still open: https://github.com/apache/arrow/pull/3520
> > > > >
> > > > >
> > > > >
> > > > > However, in order to make (1) buildable with current build scripts,
> > we
> > > need
> > > > > “adding openssl in C++ build toolchain” which is mentioned in this
> > > jira:
> > > > > https://issues.apache.org/jira/browse/ARROW-4302. I will be
> grateful
> > > if
> > > > > someone could help fullfill this work.
> > > > >
> > > > >
> > > > >
> > > > > About current pull requests, they has been currently using in our
> > > > > development phase at Emotiv (https://www.emotiv.com/). We love
> using
> > > > > parquet files to store EEG data. We are going to release a product
> > with
> > > > > encrypted parquet files soon and look forward to the official
> release
> > > of
> > > > > parquet encryption feature. So it will be many thank and great
> honor
> > to
> > > > > have you review and merge them (if qualified).
> > > > >
> > > > >
> > > > >
> > > > > Thank you very much!
> > > > >
> > > > > Tham
> > > > >
> > >
> >
> >
> > --
> > regards,
> > Deepak Majeti
> >
>


-- 
regards,
Deepak Majeti

Re: Announcement for c++ parquet encryption

Posted by Wes McKinney <we...@gmail.com>.
I've just left code review on

https://github.com/apache/arrow/pull/2555

The issues with it are not too serious, but it would have benefited
from more frequently feedback throughout the 6+ month development
process.

There are still quite a few test failures and it doesn't build at all
on Windows, so I think getting a green checkmark is higher precedence
than addressing my review feedback

On Mon, Jun 10, 2019 at 6:47 AM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> As promised, a doc that describes how to get & build Parquet encryption in
> C++ and Java, and how to run the samples:
> https://docs.google.com/document/d/10HygpT6e9UNCn0uy9dkFkpDigNgYggccAwsS5Eqej68/edit?usp=sharing
>
> All questions are welcome.
>
> Cheers, Gidon.
>
> ---------- Forwarded message ---------
> From: Gidon Gershinsky <gg...@gmail.com>
> Date: Wed, May 22, 2019 at 7:43 PM
> Subject: Re: Announcement for c++ parquet encryption
> To: <de...@parquet.apache.org>
>
>
> Thanks guys. We'll create a doc with a description of the current interop
> tests, including details on how to build/run them.

Fwd: Announcement for c++ parquet encryption

Posted by Gidon Gershinsky <gg...@gmail.com>.
As promised, a doc that describes how to get & build Parquet encryption in
C++ and Java, and how to run the samples:
https://docs.google.com/document/d/10HygpT6e9UNCn0uy9dkFkpDigNgYggccAwsS5Eqej68/edit?usp=sharing

All questions are welcome.

Cheers, Gidon.

---------- Forwarded message ---------
From: Gidon Gershinsky <gg...@gmail.com>
Date: Wed, May 22, 2019 at 7:43 PM
Subject: Re: Announcement for c++ parquet encryption
To: <de...@parquet.apache.org>


Thanks guys. We'll create a doc with a description of the current interop
tests, including details on how to build/run them.

Re: Announcement for c++ parquet encryption

Posted by Gidon Gershinsky <gg...@gmail.com>.
Thanks guys. We'll create a doc with a description of the current interop
tests, including details on how to build/run them.


On Wed, May 22, 2019 at 7:22 PM Deepak Majeti <ma...@gmail.com>
wrote:

> I will make the first attempt to review the C++ code. I reviewed part of it
> a while ago.
> Java-C++ inter-operation testing is my main concern. Can you share some
> details on what has been tested?
> I need to update myself with the key management part and see if we can add
> some test parquet files written in Java along with the keys to the data
> folder.
>
> On Wed, May 22, 2019 at 10:18 AM Wes McKinney <we...@gmail.com> wrote:
>
> > hi folks,
> >
> > I'm extraordinarily busy but it seems likely that the code review for
> > Parquet C++ is largely going to fall on my shoulders. Since this is a
> > large patch with many touch-points in the C++ codebase, and there
> > hasn't been much feedback over the course of its development, I would
> > like to leave thoughtful code review so that we can avoid issues that
> > may lead to burdensome maintenance issues.
> >
> > On the OpenSSL issue -- there are many CMake projects in the wild that
> > use OpenSSL and there are many examples of how to incorporate other
> > dependencies into the Arrow build system. I don't have time to work on
> > this right now so someone else will have to sort out the toolchain
> > issue
> >
> > Thanks
> >
> > On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <gg...@gmail.com>
> wrote:
> > >
> > > <re-sending from another account>
> > >
> > > Tham, thank you for this! and for volunteering early for the C++
> version
> > > work, driving it forward and creating the bulk of the parquet-cpp
> > > encryption code along the way.
> > >
> > > @All - this announcement means that two implementations of Parquet
> > > encryption, fully conforming to the formal specification, are available
> > > today. Thanks to Revital for contributing to C++ version compliance
> with
> > > the encryption spec, and for running a set of basic Java-C++ encryption
> > > interoperability tests. We have tested plaintext and encrypted footer
> > > modes, GCM and GCM_CTR algorithms, new and legacy readers. Files
> written
> > > with parquet-cpp are successfully parsed by parquet-mr, and vice versa.
> > >
> > > Let me also thank Junjie, Nandor, Anna and Xinli for their support and
> > vote
> > > for the encryption specification - along with the PMC folks.
> > >
> > > All parquet-format pull requests are merged by now into the encryption
> > > branch,
> > > https://github.com/apache/parquet-format/tree/encryption
> > >
> > > The community is welcome to review the parquet-mr pull requests, in the
> > > following order:
> > > https://github.com/apache/parquet-mr/pull/613
> > > https://github.com/apache/parquet-mr/pull/614
> > > https://github.com/apache/parquet-mr/pull/643
> > >
> > >
> > > Currently, an end-to-end implementation of Java (mr) Parquet encryption
> > is
> > > collected in this branch:
> > > https://github.com/ggershinsky/parquet-mr/tree/encr
> > > Thanks to Xinli for working with this branch code, and contributing to
> it
> > > based on his field experience. Everybody is welcome to do the same.
> > > @All - it would be helpful to review & merge the above PRs in
> > > apache/parquet-mr/encryption, so that folks can work with it instead of
> > my
> > > private branch..
> > >
> > > And I certainly second Tham's call to review & merge the parquet-cpp
> pull
> > > requests. By now, we have a number of companies starting to utilize
> > Parquet
> > > encryption (both C++ and Java), including IBM.
> > >
> > >
> > > Cheers, Gidon.
> > >
> > >
> > > On Mon, May 20, 2019 at 1:40 PM Tham Ha <th...@emotiv.com> wrote:
> > >
> > > > Hi community,
> > > >
> > > >
> > > >
> > > > After a long time of development, I'm honor to announce that we have
> > just
> > > > completed C++ parquet encryption module which implements encryption
> in
> > low
> > > > level api and with examples included.
> > > >
> > > >
> > > >
> > > > To have this feature completed, I would like to thank Gidon and
> > Revital for
> > > > their contribution.
> > > >
> > > > Gidon had a key role in encryption design and in writing Java version
> > code
> > > > on which we based on to write C++ version. He also wrote crypto
> > package in
> > > > C++ version.
> > > >
> > > > Revital and me has been joining together in writing C++ version.
> > Revital
> > > > was responsible for AAD calculations, API updating (to be the same
> with
> > > > Java version) and Java-C++ inter-operation testing. I was writing the
> > first
> > > > draft (properties, metadata, writer, reader) and keeping them updated
> > when
> > > > crypto package change.
> > > >
> > > > We have had a great time to cooperate. Thank Gidon and Revital for
> all
> > > > guide and experience I have received from them, too.
> > > >
> > > >
> > > >
> > > > Here are the links of pull requests:
> > > >
> > > > 1) encryption module (properties, metadata, writer, reader):
> > > > https://github.com/apache/arrow/pull/2555.
> > > >
> > > > 2) some merged pull requests for new thrift structure and crypto
> > algorithm,
> > > > and one still open: https://github.com/apache/arrow/pull/3520
> > > >
> > > >
> > > >
> > > > However, in order to make (1) buildable with current build scripts,
> we
> > need
> > > > “adding openssl in C++ build toolchain” which is mentioned in this
> > jira:
> > > > https://issues.apache.org/jira/browse/ARROW-4302. I will be grateful
> > if
> > > > someone could help fullfill this work.
> > > >
> > > >
> > > >
> > > > About current pull requests, they has been currently using in our
> > > > development phase at Emotiv (https://www.emotiv.com/). We love using
> > > > parquet files to store EEG data. We are going to release a product
> with
> > > > encrypted parquet files soon and look forward to the official release
> > of
> > > > parquet encryption feature. So it will be many thank and great honor
> to
> > > > have you review and merge them (if qualified).
> > > >
> > > >
> > > >
> > > > Thank you very much!
> > > >
> > > > Tham
> > > >
> >
>
>
> --
> regards,
> Deepak Majeti
>

Re: Announcement for c++ parquet encryption

Posted by Deepak Majeti <ma...@gmail.com>.
I will make the first attempt to review the C++ code. I reviewed part of it
a while ago.
Java-C++ inter-operation testing is my main concern. Can you share some
details on what has been tested?
I need to update myself with the key management part and see if we can add
some test parquet files written in Java along with the keys to the data
folder.

On Wed, May 22, 2019 at 10:18 AM Wes McKinney <we...@gmail.com> wrote:

> hi folks,
>
> I'm extraordinarily busy but it seems likely that the code review for
> Parquet C++ is largely going to fall on my shoulders. Since this is a
> large patch with many touch-points in the C++ codebase, and there
> hasn't been much feedback over the course of its development, I would
> like to leave thoughtful code review so that we can avoid issues that
> may lead to burdensome maintenance issues.
>
> On the OpenSSL issue -- there are many CMake projects in the wild that
> use OpenSSL and there are many examples of how to incorporate other
> dependencies into the Arrow build system. I don't have time to work on
> this right now so someone else will have to sort out the toolchain
> issue
>
> Thanks
>
> On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <gg...@gmail.com> wrote:
> >
> > <re-sending from another account>
> >
> > Tham, thank you for this! and for volunteering early for the C++ version
> > work, driving it forward and creating the bulk of the parquet-cpp
> > encryption code along the way.
> >
> > @All - this announcement means that two implementations of Parquet
> > encryption, fully conforming to the formal specification, are available
> > today. Thanks to Revital for contributing to C++ version compliance with
> > the encryption spec, and for running a set of basic Java-C++ encryption
> > interoperability tests. We have tested plaintext and encrypted footer
> > modes, GCM and GCM_CTR algorithms, new and legacy readers. Files written
> > with parquet-cpp are successfully parsed by parquet-mr, and vice versa.
> >
> > Let me also thank Junjie, Nandor, Anna and Xinli for their support and
> vote
> > for the encryption specification - along with the PMC folks.
> >
> > All parquet-format pull requests are merged by now into the encryption
> > branch,
> > https://github.com/apache/parquet-format/tree/encryption
> >
> > The community is welcome to review the parquet-mr pull requests, in the
> > following order:
> > https://github.com/apache/parquet-mr/pull/613
> > https://github.com/apache/parquet-mr/pull/614
> > https://github.com/apache/parquet-mr/pull/643
> >
> >
> > Currently, an end-to-end implementation of Java (mr) Parquet encryption
> is
> > collected in this branch:
> > https://github.com/ggershinsky/parquet-mr/tree/encr
> > Thanks to Xinli for working with this branch code, and contributing to it
> > based on his field experience. Everybody is welcome to do the same.
> > @All - it would be helpful to review & merge the above PRs in
> > apache/parquet-mr/encryption, so that folks can work with it instead of
> my
> > private branch..
> >
> > And I certainly second Tham's call to review & merge the parquet-cpp pull
> > requests. By now, we have a number of companies starting to utilize
> Parquet
> > encryption (both C++ and Java), including IBM.
> >
> >
> > Cheers, Gidon.
> >
> >
> > On Mon, May 20, 2019 at 1:40 PM Tham Ha <th...@emotiv.com> wrote:
> >
> > > Hi community,
> > >
> > >
> > >
> > > After a long time of development, I'm honor to announce that we have
> just
> > > completed C++ parquet encryption module which implements encryption in
> low
> > > level api and with examples included.
> > >
> > >
> > >
> > > To have this feature completed, I would like to thank Gidon and
> Revital for
> > > their contribution.
> > >
> > > Gidon had a key role in encryption design and in writing Java version
> code
> > > on which we based on to write C++ version. He also wrote crypto
> package in
> > > C++ version.
> > >
> > > Revital and me has been joining together in writing C++ version.
> Revital
> > > was responsible for AAD calculations, API updating (to be the same with
> > > Java version) and Java-C++ inter-operation testing. I was writing the
> first
> > > draft (properties, metadata, writer, reader) and keeping them updated
> when
> > > crypto package change.
> > >
> > > We have had a great time to cooperate. Thank Gidon and Revital for all
> > > guide and experience I have received from them, too.
> > >
> > >
> > >
> > > Here are the links of pull requests:
> > >
> > > 1) encryption module (properties, metadata, writer, reader):
> > > https://github.com/apache/arrow/pull/2555.
> > >
> > > 2) some merged pull requests for new thrift structure and crypto
> algorithm,
> > > and one still open: https://github.com/apache/arrow/pull/3520
> > >
> > >
> > >
> > > However, in order to make (1) buildable with current build scripts, we
> need
> > > “adding openssl in C++ build toolchain” which is mentioned in this
> jira:
> > > https://issues.apache.org/jira/browse/ARROW-4302. I will be grateful
> if
> > > someone could help fullfill this work.
> > >
> > >
> > >
> > > About current pull requests, they has been currently using in our
> > > development phase at Emotiv (https://www.emotiv.com/). We love using
> > > parquet files to store EEG data. We are going to release a product with
> > > encrypted parquet files soon and look forward to the official release
> of
> > > parquet encryption feature. So it will be many thank and great honor to
> > > have you review and merge them (if qualified).
> > >
> > >
> > >
> > > Thank you very much!
> > >
> > > Tham
> > >
>


-- 
regards,
Deepak Majeti

Re: Announcement for c++ parquet encryption

Posted by Wes McKinney <we...@gmail.com>.
hi folks,

I'm extraordinarily busy but it seems likely that the code review for
Parquet C++ is largely going to fall on my shoulders. Since this is a
large patch with many touch-points in the C++ codebase, and there
hasn't been much feedback over the course of its development, I would
like to leave thoughtful code review so that we can avoid issues that
may lead to burdensome maintenance issues.

On the OpenSSL issue -- there are many CMake projects in the wild that
use OpenSSL and there are many examples of how to incorporate other
dependencies into the Arrow build system. I don't have time to work on
this right now so someone else will have to sort out the toolchain
issue

Thanks

On Mon, May 20, 2019 at 8:48 AM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> <re-sending from another account>
>
> Tham, thank you for this! and for volunteering early for the C++ version
> work, driving it forward and creating the bulk of the parquet-cpp
> encryption code along the way.
>
> @All - this announcement means that two implementations of Parquet
> encryption, fully conforming to the formal specification, are available
> today. Thanks to Revital for contributing to C++ version compliance with
> the encryption spec, and for running a set of basic Java-C++ encryption
> interoperability tests. We have tested plaintext and encrypted footer
> modes, GCM and GCM_CTR algorithms, new and legacy readers. Files written
> with parquet-cpp are successfully parsed by parquet-mr, and vice versa.
>
> Let me also thank Junjie, Nandor, Anna and Xinli for their support and vote
> for the encryption specification - along with the PMC folks.
>
> All parquet-format pull requests are merged by now into the encryption
> branch,
> https://github.com/apache/parquet-format/tree/encryption
>
> The community is welcome to review the parquet-mr pull requests, in the
> following order:
> https://github.com/apache/parquet-mr/pull/613
> https://github.com/apache/parquet-mr/pull/614
> https://github.com/apache/parquet-mr/pull/643
>
>
> Currently, an end-to-end implementation of Java (mr) Parquet encryption is
> collected in this branch:
> https://github.com/ggershinsky/parquet-mr/tree/encr
> Thanks to Xinli for working with this branch code, and contributing to it
> based on his field experience. Everybody is welcome to do the same.
> @All - it would be helpful to review & merge the above PRs in
> apache/parquet-mr/encryption, so that folks can work with it instead of my
> private branch..
>
> And I certainly second Tham's call to review & merge the parquet-cpp pull
> requests. By now, we have a number of companies starting to utilize Parquet
> encryption (both C++ and Java), including IBM.
>
>
> Cheers, Gidon.
>
>
> On Mon, May 20, 2019 at 1:40 PM Tham Ha <th...@emotiv.com> wrote:
>
> > Hi community,
> >
> >
> >
> > After a long time of development, I'm honor to announce that we have just
> > completed C++ parquet encryption module which implements encryption in low
> > level api and with examples included.
> >
> >
> >
> > To have this feature completed, I would like to thank Gidon and Revital for
> > their contribution.
> >
> > Gidon had a key role in encryption design and in writing Java version code
> > on which we based on to write C++ version. He also wrote crypto package in
> > C++ version.
> >
> > Revital and me has been joining together in writing C++ version. Revital
> > was responsible for AAD calculations, API updating (to be the same with
> > Java version) and Java-C++ inter-operation testing. I was writing the first
> > draft (properties, metadata, writer, reader) and keeping them updated when
> > crypto package change.
> >
> > We have had a great time to cooperate. Thank Gidon and Revital for all
> > guide and experience I have received from them, too.
> >
> >
> >
> > Here are the links of pull requests:
> >
> > 1) encryption module (properties, metadata, writer, reader):
> > https://github.com/apache/arrow/pull/2555.
> >
> > 2) some merged pull requests for new thrift structure and crypto algorithm,
> > and one still open: https://github.com/apache/arrow/pull/3520
> >
> >
> >
> > However, in order to make (1) buildable with current build scripts, we need
> > “adding openssl in C++ build toolchain” which is mentioned in this jira:
> > https://issues.apache.org/jira/browse/ARROW-4302. I will be grateful if
> > someone could help fullfill this work.
> >
> >
> >
> > About current pull requests, they has been currently using in our
> > development phase at Emotiv (https://www.emotiv.com/). We love using
> > parquet files to store EEG data. We are going to release a product with
> > encrypted parquet files soon and look forward to the official release of
> > parquet encryption feature. So it will be many thank and great honor to
> > have you review and merge them (if qualified).
> >
> >
> >
> > Thank you very much!
> >
> > Tham
> >

Re: Announcement for c++ parquet encryption

Posted by Gidon Gershinsky <gg...@gmail.com>.
<re-sending from another account>

Tham, thank you for this! and for volunteering early for the C++ version
work, driving it forward and creating the bulk of the parquet-cpp
encryption code along the way.

@All - this announcement means that two implementations of Parquet
encryption, fully conforming to the formal specification, are available
today. Thanks to Revital for contributing to C++ version compliance with
the encryption spec, and for running a set of basic Java-C++ encryption
interoperability tests. We have tested plaintext and encrypted footer
modes, GCM and GCM_CTR algorithms, new and legacy readers. Files written
with parquet-cpp are successfully parsed by parquet-mr, and vice versa.

Let me also thank Junjie, Nandor, Anna and Xinli for their support and vote
for the encryption specification - along with the PMC folks.

All parquet-format pull requests are merged by now into the encryption
branch,
https://github.com/apache/parquet-format/tree/encryption

The community is welcome to review the parquet-mr pull requests, in the
following order:
https://github.com/apache/parquet-mr/pull/613
https://github.com/apache/parquet-mr/pull/614
https://github.com/apache/parquet-mr/pull/643


Currently, an end-to-end implementation of Java (mr) Parquet encryption is
collected in this branch:
https://github.com/ggershinsky/parquet-mr/tree/encr
Thanks to Xinli for working with this branch code, and contributing to it
based on his field experience. Everybody is welcome to do the same.
@All - it would be helpful to review & merge the above PRs in
apache/parquet-mr/encryption, so that folks can work with it instead of my
private branch..

And I certainly second Tham's call to review & merge the parquet-cpp pull
requests. By now, we have a number of companies starting to utilize Parquet
encryption (both C++ and Java), including IBM.


Cheers, Gidon.


On Mon, May 20, 2019 at 1:40 PM Tham Ha <th...@emotiv.com> wrote:

> Hi community,
>
>
>
> After a long time of development, I'm honor to announce that we have just
> completed C++ parquet encryption module which implements encryption in low
> level api and with examples included.
>
>
>
> To have this feature completed, I would like to thank Gidon and Revital for
> their contribution.
>
> Gidon had a key role in encryption design and in writing Java version code
> on which we based on to write C++ version. He also wrote crypto package in
> C++ version.
>
> Revital and me has been joining together in writing C++ version. Revital
> was responsible for AAD calculations, API updating (to be the same with
> Java version) and Java-C++ inter-operation testing. I was writing the first
> draft (properties, metadata, writer, reader) and keeping them updated when
> crypto package change.
>
> We have had a great time to cooperate. Thank Gidon and Revital for all
> guide and experience I have received from them, too.
>
>
>
> Here are the links of pull requests:
>
> 1) encryption module (properties, metadata, writer, reader):
> https://github.com/apache/arrow/pull/2555.
>
> 2) some merged pull requests for new thrift structure and crypto algorithm,
> and one still open: https://github.com/apache/arrow/pull/3520
>
>
>
> However, in order to make (1) buildable with current build scripts, we need
> “adding openssl in C++ build toolchain” which is mentioned in this jira:
> https://issues.apache.org/jira/browse/ARROW-4302. I will be grateful if
> someone could help fullfill this work.
>
>
>
> About current pull requests, they has been currently using in our
> development phase at Emotiv (https://www.emotiv.com/). We love using
> parquet files to store EEG data. We are going to release a product with
> encrypted parquet files soon and look forward to the official release of
> parquet encryption feature. So it will be many thank and great honor to
> have you review and merge them (if qualified).
>
>
>
> Thank you very much!
>
> Tham
>