You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Zoltan Ivanfi <zi...@cloudera.com.INVALID> on 2018/09/25 17:10:42 UTC

Parquet-cpp has already released unofficial thrift changes

Hi,

On the Parquet sync we discussed that the practice of maintaining a copy of
parquet.thrift in parquet-cpp is dangerous and that we must take care to
not release parquet-format changes in parquet-cpp before we officially
release them in parquet-format. As I got back to my computer and started to
create a JIRA about this, I noticed that unfortunately this has already
happened.

The encryption-releated parquet.thrift changes have not only been added to
only parquet-format, but to parquet-cpp as well, and these changes got
released in parquet-cpp 1.5.0. This is very unfortunate, because
PARQUET-1419 would change the encryption in a breaking way, which is only
acceptable as long as the original is not released. Additionally, it has
been discussed that a formal voting should take place before incorporating
the encryption features in the format.

Now that parquet-cpp has already shipped these changes, we must choose the
lesser evil of the following two options:

   - Release a parquet-cpp 1.6.0 with a breaking change and risk that
   encrypted data files already written with parquet-cpp 1.5.0 will not be
   readable any more.
   - Release the encryption in parquet-format as it is, regardless of
   voting results and discard PARQUET-1419.

Personally I have a hard time deciding which one I consider lesser evil.
What are your opinions?

Thanks,

Zoltan

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Gidon Gershinsky <gg...@gmail.com>.
Sounds good. I agree a feature branch is the right thing to do in
these cases.

Cheers, Gidon.

On Wed, Sep 26, 2018 at 2:42 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
wrote:

> Hi,
>
> If the encryption code release in parquet-cpp is unused at this moment then
> I think we are fine. It means that we are still free to decide any way
> about the data structures without the risk of incompatility issues. In this
> case I would suggest to proceed as we planned at the Parquet sync.
>
> Thanks,
>
> Zoltan
>
> On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> > That's correct. A layer that runs the crypto classes to encrypt pages and
> > structures is not merged yet. Even if the parquet.thrift is out, there is
> > virtually no chance encrypted files are created with it, and certainly
> not
> > in production.
> >
> > Given a (hopefully mild) headache this last-minute feature/requirement
> has
> > caused, let me add details on where it is coming from. I received the
> > requirement from a company that got an internal pushback from teams
> working
> > with unencrypted columns only, and unwilling to update Parquet libs. They
> > need support for a transition period for the current readers.
> >
> > We have an option to say no, and stick to the original goal list. But I
> > think the requirement makes sense, even if coming very late - and will
> > significantly ease adoption of the Parquet encryption. Also, it turns out
> > to be easy to implement, with minor changes in the code.
> >
> > In any case, its certainly a good idea to keep a single copy of
> > parquet.thrift in the format repo.
> >
> > Cheers, Gidon
> >
> > On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
> > > reading and writing files with encryption is not possible. Am I wrong
> > > about that?
> > >
> > > I'm wholly supportive of changing the project to use a released
> > > version of the Parquet format, so let's do that ASAP.
> > > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <gg...@gmail.com>
> > wrote:
> > > >
> > > > Yep! (sent in parallel :)
> > > >
> > > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi
> <zi@cloudera.com.invalid
> > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > As a short update, I just checked the PR for PARQUET-1419 and
> > although
> > > in
> > > > > its current form it is a breaking change, it can be easily
> rewritten
> > to
> > > > > become backwards-compatible so this part of the problem does not
> > apply
> > > any
> > > > > more.
> > > > >
> > > > > Br,
> > > > >
> > > > > Zoltan
> > > > >
> > > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <zi...@cloudera.com>
> > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On the Parquet sync we discussed that the practice of
> maintaining a
> > > copy
> > > > > > of parquet.thrift in parquet-cpp is dangerous and that we must
> take
> > > care
> > > > > to
> > > > > > not release parquet-format changes in parquet-cpp before we
> > > officially
> > > > > > release them in parquet-format. As I got back to my computer and
> > > started
> > > > > to
> > > > > > create a JIRA about this, I noticed that unfortunately this has
> > > already
> > > > > > happened.
> > > > > >
> > > > > > The encryption-releated parquet.thrift changes have not only been
> > > added
> > > > > to
> > > > > > only parquet-format, but to parquet-cpp as well, and these
> changes
> > > got
> > > > > > released in parquet-cpp 1.5.0. This is very unfortunate, because
> > > > > > PARQUET-1419 would change the encryption in a breaking way, which
> > is
> > > only
> > > > > > acceptable as long as the original is not released. Additionally,
> > it
> > > has
> > > > > > been discussed that a formal voting should take place before
> > > > > incorporating
> > > > > > the encryption features in the format.
> > > > > >
> > > > > > Now that parquet-cpp has already shipped these changes, we must
> > > choose
> > > > > the
> > > > > > lesser evil of the following two options:
> > > > > >
> > > > > >    - Release a parquet-cpp 1.6.0 with a breaking change and risk
> > that
> > > > > >    encrypted data files already written with parquet-cpp 1.5.0
> will
> > > not
> > > > > be
> > > > > >    readable any more.
> > > > > >    - Release the encryption in parquet-format as it is,
> regardless
> > of
> > > > > >    voting results and discard PARQUET-1419.
> > > > > >
> > > > > > Personally I have a hard time deciding which one I consider
> lesser
> > > evil.
> > > > > > What are your opinions?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Zoltan
> > > > > >
> > > > >
> > >
> >
>

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Gidon Gershinsky <gg...@gmail.com>.
Oh, I see now. Moving the crypto structure from position 8 to 9. Let me
process that, and we can switch to a direct channel to continue this
discussion.

On Wed, Sep 26, 2018 at 3:30 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
wrote:

> Hi,
>
> I think it's safer if we skip id 8 altogether and use id 9 for the new
> crypto structure. This way we don't have to worry about remaining backwards
> compatible with the accidentally released structure.
>
> Br,
>
> Zoltan
>
> On Wed, Sep 26, 2018 at 2:25 PM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> > Hi,
> >
> > I think we should use this id for its current purpose. This field had
> been
> > defined and merged months ago, and should be there is any scenario.
> > Except for the last week's change, the encryption format had been stable
> > for a while now. The timing of this change was unfortunate; but the
> change
> > was minor and done for a good reason.
> > I hope there won't be new disturbances from now on.
> >
> > Cheers, Gidon.
> >
> >
> > On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
> > wrote:
> >
> > > Hi,
> > >
> > > It seems that I spoke too early. I just noticed that a new field was
> > added
> > > to the ColumnChunk struct in
> > >
> > >
> >
> https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708
> > > Although the field is optional, we can't pretend it was never released,
> > > because parquet-cpp already expects that field for the id 8. Therefore,
> > if
> > > we were to add a different field with id 8 to the ColumnChunk struct,
> the
> > > thrift parser of parquet-cpp 1.5.0 would probably throw an error when
> > > reading new files, because it would try to read the ColumnChunk struct
> > > including all of its fields (regardless of whether the code actually
> uses
> > > them or not) and the field with id 8 would not match its Thrift schema.
> > >
> > > We still can avoid breaking changes in the ColumnChunk struct by either
> > > using id 8 for its current purpose or not using it at all (skip id 8
> and
> > > use 9 for the next field we add).
> > >
> > > Br,
> > >
> > > Zoltan
> > >
> > > On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > If the encryption code release in parquet-cpp is unused at this
> moment
> > > > then I think we are fine. It means that we are still free to decide
> any
> > > way
> > > > about the data structures without the risk of incompatility issues.
> In
> > > this
> > > > case I would suggest to proceed as we planned at the Parquet sync.
> > > >
> > > > Thanks,
> > > >
> > > > Zoltan
> > > >
> > > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky <gg...@gmail.com>
> > > wrote:
> > > >
> > > >> That's correct. A layer that runs the crypto classes to encrypt
> pages
> > > and
> > > >> structures is not merged yet. Even if the parquet.thrift is out,
> there
> > > is
> > > >> virtually no chance encrypted files are created with it, and
> certainly
> > > not
> > > >> in production.
> > > >>
> > > >> Given a (hopefully mild) headache this last-minute
> feature/requirement
> > > has
> > > >> caused, let me add details on where it is coming from. I received
> the
> > > >> requirement from a company that got an internal pushback from teams
> > > >> working
> > > >> with unencrypted columns only, and unwilling to update Parquet libs.
> > > They
> > > >> need support for a transition period for the current readers.
> > > >>
> > > >> We have an option to say no, and stick to the original goal list.
> But
> > I
> > > >> think the requirement makes sense, even if coming very late - and
> will
> > > >> significantly ease adoption of the Parquet encryption. Also, it
> turns
> > > out
> > > >> to be easy to implement, with minor changes in the code.
> > > >>
> > > >> In any case, its certainly a good idea to keep a single copy of
> > > >> parquet.thrift in the format repo.
> > > >>
> > > >> Cheers, Gidon
> > > >>
> > > >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney <we...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
> > > >> > reading and writing files with encryption is not possible. Am I
> > wrong
> > > >> > about that?
> > > >> >
> > > >> > I'm wholly supportive of changing the project to use a released
> > > >> > version of the Parquet format, so let's do that ASAP.
> > > >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <
> gg5070@gmail.com>
> > > >> wrote:
> > > >> > >
> > > >> > > Yep! (sent in parallel :)
> > > >> > >
> > > >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi
> > > <zi@cloudera.com.invalid
> > > >> >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi,
> > > >> > > >
> > > >> > > > As a short update, I just checked the PR for PARQUET-1419 and
> > > >> although
> > > >> > in
> > > >> > > > its current form it is a breaking change, it can be easily
> > > >> rewritten to
> > > >> > > > become backwards-compatible so this part of the problem does
> not
> > > >> apply
> > > >> > any
> > > >> > > > more.
> > > >> > > >
> > > >> > > > Br,
> > > >> > > >
> > > >> > > > Zoltan
> > > >> > > >
> > > >> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <
> zi@cloudera.com>
> > > >> wrote:
> > > >> > > >
> > > >> > > > > Hi,
> > > >> > > > >
> > > >> > > > > On the Parquet sync we discussed that the practice of
> > > maintaining
> > > >> a
> > > >> > copy
> > > >> > > > > of parquet.thrift in parquet-cpp is dangerous and that we
> must
> > > >> take
> > > >> > care
> > > >> > > > to
> > > >> > > > > not release parquet-format changes in parquet-cpp before we
> > > >> > officially
> > > >> > > > > release them in parquet-format. As I got back to my computer
> > and
> > > >> > started
> > > >> > > > to
> > > >> > > > > create a JIRA about this, I noticed that unfortunately this
> > has
> > > >> > already
> > > >> > > > > happened.
> > > >> > > > >
> > > >> > > > > The encryption-releated parquet.thrift changes have not only
> > > been
> > > >> > added
> > > >> > > > to
> > > >> > > > > only parquet-format, but to parquet-cpp as well, and these
> > > changes
> > > >> > got
> > > >> > > > > released in parquet-cpp 1.5.0. This is very unfortunate,
> > because
> > > >> > > > > PARQUET-1419 would change the encryption in a breaking way,
> > > which
> > > >> is
> > > >> > only
> > > >> > > > > acceptable as long as the original is not released.
> > > Additionally,
> > > >> it
> > > >> > has
> > > >> > > > > been discussed that a formal voting should take place before
> > > >> > > > incorporating
> > > >> > > > > the encryption features in the format.
> > > >> > > > >
> > > >> > > > > Now that parquet-cpp has already shipped these changes, we
> > must
> > > >> > choose
> > > >> > > > the
> > > >> > > > > lesser evil of the following two options:
> > > >> > > > >
> > > >> > > > >    - Release a parquet-cpp 1.6.0 with a breaking change and
> > risk
> > > >> that
> > > >> > > > >    encrypted data files already written with parquet-cpp
> 1.5.0
> > > >> will
> > > >> > not
> > > >> > > > be
> > > >> > > > >    readable any more.
> > > >> > > > >    - Release the encryption in parquet-format as it is,
> > > >> regardless of
> > > >> > > > >    voting results and discard PARQUET-1419.
> > > >> > > > >
> > > >> > > > > Personally I have a hard time deciding which one I consider
> > > lesser
> > > >> > evil.
> > > >> > > > > What are your opinions?
> > > >> > > > >
> > > >> > > > > Thanks,
> > > >> > > > >
> > > >> > > > > Zoltan
> > > >> > > > >
> > > >> > > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Wes McKinney <we...@gmail.com>.
+1. The Thrift structures are private to parquet-cpp, so it should not
be an issue
On Wed, Sep 26, 2018 at 12:23 PM Zoltan Ivanfi <zi...@cloudera.com.invalid> wrote:
>
> Hi,
>
> I just had a conversation with Nandor and he pointed out to me that even if
> we broke _reading_ in parquet-cpp 1.5.0, we could simply release a 1.5.1
> version that fixes it. The important thing is that _writing_ is good and
> parquet-format-compliant in parquet-cpp 1.5.0, therefore we do not have to
> worry about breaking changes that would make existing data files unreadable.
>
> Based on this, I would [once again :)] suggest moving forward as planned.
>
> Thanks,
>
> Zoltan
>
> On Wed, Sep 26, 2018 at 4:56 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:
>
> > Hi,
> >
> > Please let me know your opinions as well. So far all concerns were only
> > raised by me, which may indicate that other community members do not
> > consider this issue serious and in this case my suggestions may be
> > excessive and unjustified.
> >
> > Just to clarify: The data structures for the encrypion feature are
> > unlikely to change. I solely suggest these measures to remedy the situation
> > of parquet.thrift having been released in parquet-cpp before officially
> > getting released in parquet-format.
> >
> > Thanks,
> >
> > Zoltan
> >
> > On Wed, Sep 26, 2018 at 2:29 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:
> >
> >> Hi,
> >>
> >> I think it's safer if we skip id 8 altogether and use id 9 for the new
> >> crypto structure. This way we don't have to worry about remaining backwards
> >> compatible with the accidentally released structure.
> >>
> >> Br,
> >>
> >> Zoltan
> >>
> >> On Wed, Sep 26, 2018 at 2:25 PM Gidon Gershinsky <gg...@gmail.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I think we should use this id for its current purpose. This field had
> >>> been
> >>> defined and merged months ago, and should be there is any scenario.
> >>> Except for the last week's change, the encryption format had been stable
> >>> for a while now. The timing of this change was unfortunate; but the
> >>> change
> >>> was minor and done for a good reason.
> >>> I hope there won't be new disturbances from now on.
> >>>
> >>> Cheers, Gidon.
> >>>
> >>>
> >>> On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
> >>> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > It seems that I spoke too early. I just noticed that a new field was
> >>> added
> >>> > to the ColumnChunk struct in
> >>> >
> >>> >
> >>> https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708
> >>> > Although the field is optional, we can't pretend it was never released,
> >>> > because parquet-cpp already expects that field for the id 8.
> >>> Therefore, if
> >>> > we were to add a different field with id 8 to the ColumnChunk struct,
> >>> the
> >>> > thrift parser of parquet-cpp 1.5.0 would probably throw an error when
> >>> > reading new files, because it would try to read the ColumnChunk struct
> >>> > including all of its fields (regardless of whether the code actually
> >>> uses
> >>> > them or not) and the field with id 8 would not match its Thrift schema.
> >>> >
> >>> > We still can avoid breaking changes in the ColumnChunk struct by either
> >>> > using id 8 for its current purpose or not using it at all (skip id 8
> >>> and
> >>> > use 9 for the next field we add).
> >>> >
> >>> > Br,
> >>> >
> >>> > Zoltan
> >>> >
> >>> > On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:
> >>> >
> >>> > > Hi,
> >>> > >
> >>> > > If the encryption code release in parquet-cpp is unused at this
> >>> moment
> >>> > > then I think we are fine. It means that we are still free to decide
> >>> any
> >>> > way
> >>> > > about the data structures without the risk of incompatility issues.
> >>> In
> >>> > this
> >>> > > case I would suggest to proceed as we planned at the Parquet sync.
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > > Zoltan
> >>> > >
> >>> > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky <gg...@gmail.com>
> >>> > wrote:
> >>> > >
> >>> > >> That's correct. A layer that runs the crypto classes to encrypt
> >>> pages
> >>> > and
> >>> > >> structures is not merged yet. Even if the parquet.thrift is out,
> >>> there
> >>> > is
> >>> > >> virtually no chance encrypted files are created with it, and
> >>> certainly
> >>> > not
> >>> > >> in production.
> >>> > >>
> >>> > >> Given a (hopefully mild) headache this last-minute
> >>> feature/requirement
> >>> > has
> >>> > >> caused, let me add details on where it is coming from. I received
> >>> the
> >>> > >> requirement from a company that got an internal pushback from teams
> >>> > >> working
> >>> > >> with unencrypted columns only, and unwilling to update Parquet libs.
> >>> > They
> >>> > >> need support for a transition period for the current readers.
> >>> > >>
> >>> > >> We have an option to say no, and stick to the original goal list.
> >>> But I
> >>> > >> think the requirement makes sense, even if coming very late - and
> >>> will
> >>> > >> significantly ease adoption of the Parquet encryption. Also, it
> >>> turns
> >>> > out
> >>> > >> to be easy to implement, with minor changes in the code.
> >>> > >>
> >>> > >> In any case, its certainly a good idea to keep a single copy of
> >>> > >> parquet.thrift in the format repo.
> >>> > >>
> >>> > >> Cheers, Gidon
> >>> > >>
> >>> > >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney <we...@gmail.com>
> >>> > >> wrote:
> >>> > >>
> >>> > >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
> >>> > >> > reading and writing files with encryption is not possible. Am I
> >>> wrong
> >>> > >> > about that?
> >>> > >> >
> >>> > >> > I'm wholly supportive of changing the project to use a released
> >>> > >> > version of the Parquet format, so let's do that ASAP.
> >>> > >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <
> >>> gg5070@gmail.com>
> >>> > >> wrote:
> >>> > >> > >
> >>> > >> > > Yep! (sent in parallel :)
> >>> > >> > >
> >>> > >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi
> >>> > <zi@cloudera.com.invalid
> >>> > >> >
> >>> > >> > > wrote:
> >>> > >> > >
> >>> > >> > > > Hi,
> >>> > >> > > >
> >>> > >> > > > As a short update, I just checked the PR for PARQUET-1419 and
> >>> > >> although
> >>> > >> > in
> >>> > >> > > > its current form it is a breaking change, it can be easily
> >>> > >> rewritten to
> >>> > >> > > > become backwards-compatible so this part of the problem does
> >>> not
> >>> > >> apply
> >>> > >> > any
> >>> > >> > > > more.
> >>> > >> > > >
> >>> > >> > > > Br,
> >>> > >> > > >
> >>> > >> > > > Zoltan
> >>> > >> > > >
> >>> > >> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <
> >>> zi@cloudera.com>
> >>> > >> wrote:
> >>> > >> > > >
> >>> > >> > > > > Hi,
> >>> > >> > > > >
> >>> > >> > > > > On the Parquet sync we discussed that the practice of
> >>> > maintaining
> >>> > >> a
> >>> > >> > copy
> >>> > >> > > > > of parquet.thrift in parquet-cpp is dangerous and that we
> >>> must
> >>> > >> take
> >>> > >> > care
> >>> > >> > > > to
> >>> > >> > > > > not release parquet-format changes in parquet-cpp before we
> >>> > >> > officially
> >>> > >> > > > > release them in parquet-format. As I got back to my
> >>> computer and
> >>> > >> > started
> >>> > >> > > > to
> >>> > >> > > > > create a JIRA about this, I noticed that unfortunately this
> >>> has
> >>> > >> > already
> >>> > >> > > > > happened.
> >>> > >> > > > >
> >>> > >> > > > > The encryption-releated parquet.thrift changes have not only
> >>> > been
> >>> > >> > added
> >>> > >> > > > to
> >>> > >> > > > > only parquet-format, but to parquet-cpp as well, and these
> >>> > changes
> >>> > >> > got
> >>> > >> > > > > released in parquet-cpp 1.5.0. This is very unfortunate,
> >>> because
> >>> > >> > > > > PARQUET-1419 would change the encryption in a breaking way,
> >>> > which
> >>> > >> is
> >>> > >> > only
> >>> > >> > > > > acceptable as long as the original is not released.
> >>> > Additionally,
> >>> > >> it
> >>> > >> > has
> >>> > >> > > > > been discussed that a formal voting should take place before
> >>> > >> > > > incorporating
> >>> > >> > > > > the encryption features in the format.
> >>> > >> > > > >
> >>> > >> > > > > Now that parquet-cpp has already shipped these changes, we
> >>> must
> >>> > >> > choose
> >>> > >> > > > the
> >>> > >> > > > > lesser evil of the following two options:
> >>> > >> > > > >
> >>> > >> > > > >    - Release a parquet-cpp 1.6.0 with a breaking change and
> >>> risk
> >>> > >> that
> >>> > >> > > > >    encrypted data files already written with parquet-cpp
> >>> 1.5.0
> >>> > >> will
> >>> > >> > not
> >>> > >> > > > be
> >>> > >> > > > >    readable any more.
> >>> > >> > > > >    - Release the encryption in parquet-format as it is,
> >>> > >> regardless of
> >>> > >> > > > >    voting results and discard PARQUET-1419.
> >>> > >> > > > >
> >>> > >> > > > > Personally I have a hard time deciding which one I consider
> >>> > lesser
> >>> > >> > evil.
> >>> > >> > > > > What are your opinions?
> >>> > >> > > > >
> >>> > >> > > > > Thanks,
> >>> > >> > > > >
> >>> > >> > > > > Zoltan
> >>> > >> > > > >
> >>> > >> > > >
> >>> > >> >
> >>> > >>
> >>> > >
> >>> >
> >>>
> >>

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Zoltan Ivanfi <zi...@cloudera.com.INVALID>.
Hi,

I just had a conversation with Nandor and he pointed out to me that even if
we broke _reading_ in parquet-cpp 1.5.0, we could simply release a 1.5.1
version that fixes it. The important thing is that _writing_ is good and
parquet-format-compliant in parquet-cpp 1.5.0, therefore we do not have to
worry about breaking changes that would make existing data files unreadable.

Based on this, I would [once again :)] suggest moving forward as planned.

Thanks,

Zoltan

On Wed, Sep 26, 2018 at 4:56 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:

> Hi,
>
> Please let me know your opinions as well. So far all concerns were only
> raised by me, which may indicate that other community members do not
> consider this issue serious and in this case my suggestions may be
> excessive and unjustified.
>
> Just to clarify: The data structures for the encrypion feature are
> unlikely to change. I solely suggest these measures to remedy the situation
> of parquet.thrift having been released in parquet-cpp before officially
> getting released in parquet-format.
>
> Thanks,
>
> Zoltan
>
> On Wed, Sep 26, 2018 at 2:29 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:
>
>> Hi,
>>
>> I think it's safer if we skip id 8 altogether and use id 9 for the new
>> crypto structure. This way we don't have to worry about remaining backwards
>> compatible with the accidentally released structure.
>>
>> Br,
>>
>> Zoltan
>>
>> On Wed, Sep 26, 2018 at 2:25 PM Gidon Gershinsky <gg...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I think we should use this id for its current purpose. This field had
>>> been
>>> defined and merged months ago, and should be there is any scenario.
>>> Except for the last week's change, the encryption format had been stable
>>> for a while now. The timing of this change was unfortunate; but the
>>> change
>>> was minor and done for a good reason.
>>> I hope there won't be new disturbances from now on.
>>>
>>> Cheers, Gidon.
>>>
>>>
>>> On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > It seems that I spoke too early. I just noticed that a new field was
>>> added
>>> > to the ColumnChunk struct in
>>> >
>>> >
>>> https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708
>>> > Although the field is optional, we can't pretend it was never released,
>>> > because parquet-cpp already expects that field for the id 8.
>>> Therefore, if
>>> > we were to add a different field with id 8 to the ColumnChunk struct,
>>> the
>>> > thrift parser of parquet-cpp 1.5.0 would probably throw an error when
>>> > reading new files, because it would try to read the ColumnChunk struct
>>> > including all of its fields (regardless of whether the code actually
>>> uses
>>> > them or not) and the field with id 8 would not match its Thrift schema.
>>> >
>>> > We still can avoid breaking changes in the ColumnChunk struct by either
>>> > using id 8 for its current purpose or not using it at all (skip id 8
>>> and
>>> > use 9 for the next field we add).
>>> >
>>> > Br,
>>> >
>>> > Zoltan
>>> >
>>> > On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > If the encryption code release in parquet-cpp is unused at this
>>> moment
>>> > > then I think we are fine. It means that we are still free to decide
>>> any
>>> > way
>>> > > about the data structures without the risk of incompatility issues.
>>> In
>>> > this
>>> > > case I would suggest to proceed as we planned at the Parquet sync.
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Zoltan
>>> > >
>>> > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky <gg...@gmail.com>
>>> > wrote:
>>> > >
>>> > >> That's correct. A layer that runs the crypto classes to encrypt
>>> pages
>>> > and
>>> > >> structures is not merged yet. Even if the parquet.thrift is out,
>>> there
>>> > is
>>> > >> virtually no chance encrypted files are created with it, and
>>> certainly
>>> > not
>>> > >> in production.
>>> > >>
>>> > >> Given a (hopefully mild) headache this last-minute
>>> feature/requirement
>>> > has
>>> > >> caused, let me add details on where it is coming from. I received
>>> the
>>> > >> requirement from a company that got an internal pushback from teams
>>> > >> working
>>> > >> with unencrypted columns only, and unwilling to update Parquet libs.
>>> > They
>>> > >> need support for a transition period for the current readers.
>>> > >>
>>> > >> We have an option to say no, and stick to the original goal list.
>>> But I
>>> > >> think the requirement makes sense, even if coming very late - and
>>> will
>>> > >> significantly ease adoption of the Parquet encryption. Also, it
>>> turns
>>> > out
>>> > >> to be easy to implement, with minor changes in the code.
>>> > >>
>>> > >> In any case, its certainly a good idea to keep a single copy of
>>> > >> parquet.thrift in the format repo.
>>> > >>
>>> > >> Cheers, Gidon
>>> > >>
>>> > >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney <we...@gmail.com>
>>> > >> wrote:
>>> > >>
>>> > >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
>>> > >> > reading and writing files with encryption is not possible. Am I
>>> wrong
>>> > >> > about that?
>>> > >> >
>>> > >> > I'm wholly supportive of changing the project to use a released
>>> > >> > version of the Parquet format, so let's do that ASAP.
>>> > >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <
>>> gg5070@gmail.com>
>>> > >> wrote:
>>> > >> > >
>>> > >> > > Yep! (sent in parallel :)
>>> > >> > >
>>> > >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi
>>> > <zi@cloudera.com.invalid
>>> > >> >
>>> > >> > > wrote:
>>> > >> > >
>>> > >> > > > Hi,
>>> > >> > > >
>>> > >> > > > As a short update, I just checked the PR for PARQUET-1419 and
>>> > >> although
>>> > >> > in
>>> > >> > > > its current form it is a breaking change, it can be easily
>>> > >> rewritten to
>>> > >> > > > become backwards-compatible so this part of the problem does
>>> not
>>> > >> apply
>>> > >> > any
>>> > >> > > > more.
>>> > >> > > >
>>> > >> > > > Br,
>>> > >> > > >
>>> > >> > > > Zoltan
>>> > >> > > >
>>> > >> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <
>>> zi@cloudera.com>
>>> > >> wrote:
>>> > >> > > >
>>> > >> > > > > Hi,
>>> > >> > > > >
>>> > >> > > > > On the Parquet sync we discussed that the practice of
>>> > maintaining
>>> > >> a
>>> > >> > copy
>>> > >> > > > > of parquet.thrift in parquet-cpp is dangerous and that we
>>> must
>>> > >> take
>>> > >> > care
>>> > >> > > > to
>>> > >> > > > > not release parquet-format changes in parquet-cpp before we
>>> > >> > officially
>>> > >> > > > > release them in parquet-format. As I got back to my
>>> computer and
>>> > >> > started
>>> > >> > > > to
>>> > >> > > > > create a JIRA about this, I noticed that unfortunately this
>>> has
>>> > >> > already
>>> > >> > > > > happened.
>>> > >> > > > >
>>> > >> > > > > The encryption-releated parquet.thrift changes have not only
>>> > been
>>> > >> > added
>>> > >> > > > to
>>> > >> > > > > only parquet-format, but to parquet-cpp as well, and these
>>> > changes
>>> > >> > got
>>> > >> > > > > released in parquet-cpp 1.5.0. This is very unfortunate,
>>> because
>>> > >> > > > > PARQUET-1419 would change the encryption in a breaking way,
>>> > which
>>> > >> is
>>> > >> > only
>>> > >> > > > > acceptable as long as the original is not released.
>>> > Additionally,
>>> > >> it
>>> > >> > has
>>> > >> > > > > been discussed that a formal voting should take place before
>>> > >> > > > incorporating
>>> > >> > > > > the encryption features in the format.
>>> > >> > > > >
>>> > >> > > > > Now that parquet-cpp has already shipped these changes, we
>>> must
>>> > >> > choose
>>> > >> > > > the
>>> > >> > > > > lesser evil of the following two options:
>>> > >> > > > >
>>> > >> > > > >    - Release a parquet-cpp 1.6.0 with a breaking change and
>>> risk
>>> > >> that
>>> > >> > > > >    encrypted data files already written with parquet-cpp
>>> 1.5.0
>>> > >> will
>>> > >> > not
>>> > >> > > > be
>>> > >> > > > >    readable any more.
>>> > >> > > > >    - Release the encryption in parquet-format as it is,
>>> > >> regardless of
>>> > >> > > > >    voting results and discard PARQUET-1419.
>>> > >> > > > >
>>> > >> > > > > Personally I have a hard time deciding which one I consider
>>> > lesser
>>> > >> > evil.
>>> > >> > > > > What are your opinions?
>>> > >> > > > >
>>> > >> > > > > Thanks,
>>> > >> > > > >
>>> > >> > > > > Zoltan
>>> > >> > > > >
>>> > >> > > >
>>> > >> >
>>> > >>
>>> > >
>>> >
>>>
>>

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Zoltan Ivanfi <zi...@cloudera.com.INVALID>.
Hi,

Please let me know your opinions as well. So far all concerns were only
raised by me, which may indicate that other community members do not
consider this issue serious and in this case my suggestions may be
excessive and unjustified.

Just to clarify: The data structures for the encrypion feature are unlikely
to change. I solely suggest these measures to remedy the situation of
parquet.thrift having been released in parquet-cpp before officially
getting released in parquet-format.

Thanks,

Zoltan

On Wed, Sep 26, 2018 at 2:29 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:

> Hi,
>
> I think it's safer if we skip id 8 altogether and use id 9 for the new
> crypto structure. This way we don't have to worry about remaining backwards
> compatible with the accidentally released structure.
>
> Br,
>
> Zoltan
>
> On Wed, Sep 26, 2018 at 2:25 PM Gidon Gershinsky <gg...@gmail.com> wrote:
>
>> Hi,
>>
>> I think we should use this id for its current purpose. This field had been
>> defined and merged months ago, and should be there is any scenario.
>> Except for the last week's change, the encryption format had been stable
>> for a while now. The timing of this change was unfortunate; but the change
>> was minor and done for a good reason.
>> I hope there won't be new disturbances from now on.
>>
>> Cheers, Gidon.
>>
>>
>> On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
>> wrote:
>>
>> > Hi,
>> >
>> > It seems that I spoke too early. I just noticed that a new field was
>> added
>> > to the ColumnChunk struct in
>> >
>> >
>> https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708
>> > Although the field is optional, we can't pretend it was never released,
>> > because parquet-cpp already expects that field for the id 8. Therefore,
>> if
>> > we were to add a different field with id 8 to the ColumnChunk struct,
>> the
>> > thrift parser of parquet-cpp 1.5.0 would probably throw an error when
>> > reading new files, because it would try to read the ColumnChunk struct
>> > including all of its fields (regardless of whether the code actually
>> uses
>> > them or not) and the field with id 8 would not match its Thrift schema.
>> >
>> > We still can avoid breaking changes in the ColumnChunk struct by either
>> > using id 8 for its current purpose or not using it at all (skip id 8 and
>> > use 9 for the next field we add).
>> >
>> > Br,
>> >
>> > Zoltan
>> >
>> > On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > If the encryption code release in parquet-cpp is unused at this moment
>> > > then I think we are fine. It means that we are still free to decide
>> any
>> > way
>> > > about the data structures without the risk of incompatility issues. In
>> > this
>> > > case I would suggest to proceed as we planned at the Parquet sync.
>> > >
>> > > Thanks,
>> > >
>> > > Zoltan
>> > >
>> > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky <gg...@gmail.com>
>> > wrote:
>> > >
>> > >> That's correct. A layer that runs the crypto classes to encrypt pages
>> > and
>> > >> structures is not merged yet. Even if the parquet.thrift is out,
>> there
>> > is
>> > >> virtually no chance encrypted files are created with it, and
>> certainly
>> > not
>> > >> in production.
>> > >>
>> > >> Given a (hopefully mild) headache this last-minute
>> feature/requirement
>> > has
>> > >> caused, let me add details on where it is coming from. I received the
>> > >> requirement from a company that got an internal pushback from teams
>> > >> working
>> > >> with unencrypted columns only, and unwilling to update Parquet libs.
>> > They
>> > >> need support for a transition period for the current readers.
>> > >>
>> > >> We have an option to say no, and stick to the original goal list.
>> But I
>> > >> think the requirement makes sense, even if coming very late - and
>> will
>> > >> significantly ease adoption of the Parquet encryption. Also, it turns
>> > out
>> > >> to be easy to implement, with minor changes in the code.
>> > >>
>> > >> In any case, its certainly a good idea to keep a single copy of
>> > >> parquet.thrift in the format repo.
>> > >>
>> > >> Cheers, Gidon
>> > >>
>> > >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney <we...@gmail.com>
>> > >> wrote:
>> > >>
>> > >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
>> > >> > reading and writing files with encryption is not possible. Am I
>> wrong
>> > >> > about that?
>> > >> >
>> > >> > I'm wholly supportive of changing the project to use a released
>> > >> > version of the Parquet format, so let's do that ASAP.
>> > >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <gg5070@gmail.com
>> >
>> > >> wrote:
>> > >> > >
>> > >> > > Yep! (sent in parallel :)
>> > >> > >
>> > >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi
>> > <zi@cloudera.com.invalid
>> > >> >
>> > >> > > wrote:
>> > >> > >
>> > >> > > > Hi,
>> > >> > > >
>> > >> > > > As a short update, I just checked the PR for PARQUET-1419 and
>> > >> although
>> > >> > in
>> > >> > > > its current form it is a breaking change, it can be easily
>> > >> rewritten to
>> > >> > > > become backwards-compatible so this part of the problem does
>> not
>> > >> apply
>> > >> > any
>> > >> > > > more.
>> > >> > > >
>> > >> > > > Br,
>> > >> > > >
>> > >> > > > Zoltan
>> > >> > > >
>> > >> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <zi@cloudera.com
>> >
>> > >> wrote:
>> > >> > > >
>> > >> > > > > Hi,
>> > >> > > > >
>> > >> > > > > On the Parquet sync we discussed that the practice of
>> > maintaining
>> > >> a
>> > >> > copy
>> > >> > > > > of parquet.thrift in parquet-cpp is dangerous and that we
>> must
>> > >> take
>> > >> > care
>> > >> > > > to
>> > >> > > > > not release parquet-format changes in parquet-cpp before we
>> > >> > officially
>> > >> > > > > release them in parquet-format. As I got back to my computer
>> and
>> > >> > started
>> > >> > > > to
>> > >> > > > > create a JIRA about this, I noticed that unfortunately this
>> has
>> > >> > already
>> > >> > > > > happened.
>> > >> > > > >
>> > >> > > > > The encryption-releated parquet.thrift changes have not only
>> > been
>> > >> > added
>> > >> > > > to
>> > >> > > > > only parquet-format, but to parquet-cpp as well, and these
>> > changes
>> > >> > got
>> > >> > > > > released in parquet-cpp 1.5.0. This is very unfortunate,
>> because
>> > >> > > > > PARQUET-1419 would change the encryption in a breaking way,
>> > which
>> > >> is
>> > >> > only
>> > >> > > > > acceptable as long as the original is not released.
>> > Additionally,
>> > >> it
>> > >> > has
>> > >> > > > > been discussed that a formal voting should take place before
>> > >> > > > incorporating
>> > >> > > > > the encryption features in the format.
>> > >> > > > >
>> > >> > > > > Now that parquet-cpp has already shipped these changes, we
>> must
>> > >> > choose
>> > >> > > > the
>> > >> > > > > lesser evil of the following two options:
>> > >> > > > >
>> > >> > > > >    - Release a parquet-cpp 1.6.0 with a breaking change and
>> risk
>> > >> that
>> > >> > > > >    encrypted data files already written with parquet-cpp
>> 1.5.0
>> > >> will
>> > >> > not
>> > >> > > > be
>> > >> > > > >    readable any more.
>> > >> > > > >    - Release the encryption in parquet-format as it is,
>> > >> regardless of
>> > >> > > > >    voting results and discard PARQUET-1419.
>> > >> > > > >
>> > >> > > > > Personally I have a hard time deciding which one I consider
>> > lesser
>> > >> > evil.
>> > >> > > > > What are your opinions?
>> > >> > > > >
>> > >> > > > > Thanks,
>> > >> > > > >
>> > >> > > > > Zoltan
>> > >> > > > >
>> > >> > > >
>> > >> >
>> > >>
>> > >
>> >
>>
>

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Zoltan Ivanfi <zi...@cloudera.com.INVALID>.
Hi,

I think it's safer if we skip id 8 altogether and use id 9 for the new
crypto structure. This way we don't have to worry about remaining backwards
compatible with the accidentally released structure.

Br,

Zoltan

On Wed, Sep 26, 2018 at 2:25 PM Gidon Gershinsky <gg...@gmail.com> wrote:

> Hi,
>
> I think we should use this id for its current purpose. This field had been
> defined and merged months ago, and should be there is any scenario.
> Except for the last week's change, the encryption format had been stable
> for a while now. The timing of this change was unfortunate; but the change
> was minor and done for a good reason.
> I hope there won't be new disturbances from now on.
>
> Cheers, Gidon.
>
>
> On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
> wrote:
>
> > Hi,
> >
> > It seems that I spoke too early. I just noticed that a new field was
> added
> > to the ColumnChunk struct in
> >
> >
> https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708
> > Although the field is optional, we can't pretend it was never released,
> > because parquet-cpp already expects that field for the id 8. Therefore,
> if
> > we were to add a different field with id 8 to the ColumnChunk struct, the
> > thrift parser of parquet-cpp 1.5.0 would probably throw an error when
> > reading new files, because it would try to read the ColumnChunk struct
> > including all of its fields (regardless of whether the code actually uses
> > them or not) and the field with id 8 would not match its Thrift schema.
> >
> > We still can avoid breaking changes in the ColumnChunk struct by either
> > using id 8 for its current purpose or not using it at all (skip id 8 and
> > use 9 for the next field we add).
> >
> > Br,
> >
> > Zoltan
> >
> > On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:
> >
> > > Hi,
> > >
> > > If the encryption code release in parquet-cpp is unused at this moment
> > > then I think we are fine. It means that we are still free to decide any
> > way
> > > about the data structures without the risk of incompatility issues. In
> > this
> > > case I would suggest to proceed as we planned at the Parquet sync.
> > >
> > > Thanks,
> > >
> > > Zoltan
> > >
> > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky <gg...@gmail.com>
> > wrote:
> > >
> > >> That's correct. A layer that runs the crypto classes to encrypt pages
> > and
> > >> structures is not merged yet. Even if the parquet.thrift is out, there
> > is
> > >> virtually no chance encrypted files are created with it, and certainly
> > not
> > >> in production.
> > >>
> > >> Given a (hopefully mild) headache this last-minute feature/requirement
> > has
> > >> caused, let me add details on where it is coming from. I received the
> > >> requirement from a company that got an internal pushback from teams
> > >> working
> > >> with unencrypted columns only, and unwilling to update Parquet libs.
> > They
> > >> need support for a transition period for the current readers.
> > >>
> > >> We have an option to say no, and stick to the original goal list. But
> I
> > >> think the requirement makes sense, even if coming very late - and will
> > >> significantly ease adoption of the Parquet encryption. Also, it turns
> > out
> > >> to be easy to implement, with minor changes in the code.
> > >>
> > >> In any case, its certainly a good idea to keep a single copy of
> > >> parquet.thrift in the format repo.
> > >>
> > >> Cheers, Gidon
> > >>
> > >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney <we...@gmail.com>
> > >> wrote:
> > >>
> > >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
> > >> > reading and writing files with encryption is not possible. Am I
> wrong
> > >> > about that?
> > >> >
> > >> > I'm wholly supportive of changing the project to use a released
> > >> > version of the Parquet format, so let's do that ASAP.
> > >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <gg...@gmail.com>
> > >> wrote:
> > >> > >
> > >> > > Yep! (sent in parallel :)
> > >> > >
> > >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi
> > <zi@cloudera.com.invalid
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > Hi,
> > >> > > >
> > >> > > > As a short update, I just checked the PR for PARQUET-1419 and
> > >> although
> > >> > in
> > >> > > > its current form it is a breaking change, it can be easily
> > >> rewritten to
> > >> > > > become backwards-compatible so this part of the problem does not
> > >> apply
> > >> > any
> > >> > > > more.
> > >> > > >
> > >> > > > Br,
> > >> > > >
> > >> > > > Zoltan
> > >> > > >
> > >> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <zi...@cloudera.com>
> > >> wrote:
> > >> > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > On the Parquet sync we discussed that the practice of
> > maintaining
> > >> a
> > >> > copy
> > >> > > > > of parquet.thrift in parquet-cpp is dangerous and that we must
> > >> take
> > >> > care
> > >> > > > to
> > >> > > > > not release parquet-format changes in parquet-cpp before we
> > >> > officially
> > >> > > > > release them in parquet-format. As I got back to my computer
> and
> > >> > started
> > >> > > > to
> > >> > > > > create a JIRA about this, I noticed that unfortunately this
> has
> > >> > already
> > >> > > > > happened.
> > >> > > > >
> > >> > > > > The encryption-releated parquet.thrift changes have not only
> > been
> > >> > added
> > >> > > > to
> > >> > > > > only parquet-format, but to parquet-cpp as well, and these
> > changes
> > >> > got
> > >> > > > > released in parquet-cpp 1.5.0. This is very unfortunate,
> because
> > >> > > > > PARQUET-1419 would change the encryption in a breaking way,
> > which
> > >> is
> > >> > only
> > >> > > > > acceptable as long as the original is not released.
> > Additionally,
> > >> it
> > >> > has
> > >> > > > > been discussed that a formal voting should take place before
> > >> > > > incorporating
> > >> > > > > the encryption features in the format.
> > >> > > > >
> > >> > > > > Now that parquet-cpp has already shipped these changes, we
> must
> > >> > choose
> > >> > > > the
> > >> > > > > lesser evil of the following two options:
> > >> > > > >
> > >> > > > >    - Release a parquet-cpp 1.6.0 with a breaking change and
> risk
> > >> that
> > >> > > > >    encrypted data files already written with parquet-cpp 1.5.0
> > >> will
> > >> > not
> > >> > > > be
> > >> > > > >    readable any more.
> > >> > > > >    - Release the encryption in parquet-format as it is,
> > >> regardless of
> > >> > > > >    voting results and discard PARQUET-1419.
> > >> > > > >
> > >> > > > > Personally I have a hard time deciding which one I consider
> > lesser
> > >> > evil.
> > >> > > > > What are your opinions?
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > >
> > >> > > > > Zoltan
> > >> > > > >
> > >> > > >
> > >> >
> > >>
> > >
> >
>

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Gidon Gershinsky <gg...@gmail.com>.
Hi,

I think we should use this id for its current purpose. This field had been
defined and merged months ago, and should be there is any scenario.
Except for the last week's change, the encryption format had been stable
for a while now. The timing of this change was unfortunate; but the change
was minor and done for a good reason.
I hope there won't be new disturbances from now on.

Cheers, Gidon.


On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
wrote:

> Hi,
>
> It seems that I spoke too early. I just noticed that a new field was added
> to the ColumnChunk struct in
>
> https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708
> Although the field is optional, we can't pretend it was never released,
> because parquet-cpp already expects that field for the id 8. Therefore, if
> we were to add a different field with id 8 to the ColumnChunk struct, the
> thrift parser of parquet-cpp 1.5.0 would probably throw an error when
> reading new files, because it would try to read the ColumnChunk struct
> including all of its fields (regardless of whether the code actually uses
> them or not) and the field with id 8 would not match its Thrift schema.
>
> We still can avoid breaking changes in the ColumnChunk struct by either
> using id 8 for its current purpose or not using it at all (skip id 8 and
> use 9 for the next field we add).
>
> Br,
>
> Zoltan
>
> On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:
>
> > Hi,
> >
> > If the encryption code release in parquet-cpp is unused at this moment
> > then I think we are fine. It means that we are still free to decide any
> way
> > about the data structures without the risk of incompatility issues. In
> this
> > case I would suggest to proceed as we planned at the Parquet sync.
> >
> > Thanks,
> >
> > Zoltan
> >
> > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky <gg...@gmail.com>
> wrote:
> >
> >> That's correct. A layer that runs the crypto classes to encrypt pages
> and
> >> structures is not merged yet. Even if the parquet.thrift is out, there
> is
> >> virtually no chance encrypted files are created with it, and certainly
> not
> >> in production.
> >>
> >> Given a (hopefully mild) headache this last-minute feature/requirement
> has
> >> caused, let me add details on where it is coming from. I received the
> >> requirement from a company that got an internal pushback from teams
> >> working
> >> with unencrypted columns only, and unwilling to update Parquet libs.
> They
> >> need support for a transition period for the current readers.
> >>
> >> We have an option to say no, and stick to the original goal list. But I
> >> think the requirement makes sense, even if coming very late - and will
> >> significantly ease adoption of the Parquet encryption. Also, it turns
> out
> >> to be easy to implement, with minor changes in the code.
> >>
> >> In any case, its certainly a good idea to keep a single copy of
> >> parquet.thrift in the format repo.
> >>
> >> Cheers, Gidon
> >>
> >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney <we...@gmail.com>
> >> wrote:
> >>
> >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
> >> > reading and writing files with encryption is not possible. Am I wrong
> >> > about that?
> >> >
> >> > I'm wholly supportive of changing the project to use a released
> >> > version of the Parquet format, so let's do that ASAP.
> >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <gg...@gmail.com>
> >> wrote:
> >> > >
> >> > > Yep! (sent in parallel :)
> >> > >
> >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi
> <zi@cloudera.com.invalid
> >> >
> >> > > wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > As a short update, I just checked the PR for PARQUET-1419 and
> >> although
> >> > in
> >> > > > its current form it is a breaking change, it can be easily
> >> rewritten to
> >> > > > become backwards-compatible so this part of the problem does not
> >> apply
> >> > any
> >> > > > more.
> >> > > >
> >> > > > Br,
> >> > > >
> >> > > > Zoltan
> >> > > >
> >> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <zi...@cloudera.com>
> >> wrote:
> >> > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > On the Parquet sync we discussed that the practice of
> maintaining
> >> a
> >> > copy
> >> > > > > of parquet.thrift in parquet-cpp is dangerous and that we must
> >> take
> >> > care
> >> > > > to
> >> > > > > not release parquet-format changes in parquet-cpp before we
> >> > officially
> >> > > > > release them in parquet-format. As I got back to my computer and
> >> > started
> >> > > > to
> >> > > > > create a JIRA about this, I noticed that unfortunately this has
> >> > already
> >> > > > > happened.
> >> > > > >
> >> > > > > The encryption-releated parquet.thrift changes have not only
> been
> >> > added
> >> > > > to
> >> > > > > only parquet-format, but to parquet-cpp as well, and these
> changes
> >> > got
> >> > > > > released in parquet-cpp 1.5.0. This is very unfortunate, because
> >> > > > > PARQUET-1419 would change the encryption in a breaking way,
> which
> >> is
> >> > only
> >> > > > > acceptable as long as the original is not released.
> Additionally,
> >> it
> >> > has
> >> > > > > been discussed that a formal voting should take place before
> >> > > > incorporating
> >> > > > > the encryption features in the format.
> >> > > > >
> >> > > > > Now that parquet-cpp has already shipped these changes, we must
> >> > choose
> >> > > > the
> >> > > > > lesser evil of the following two options:
> >> > > > >
> >> > > > >    - Release a parquet-cpp 1.6.0 with a breaking change and risk
> >> that
> >> > > > >    encrypted data files already written with parquet-cpp 1.5.0
> >> will
> >> > not
> >> > > > be
> >> > > > >    readable any more.
> >> > > > >    - Release the encryption in parquet-format as it is,
> >> regardless of
> >> > > > >    voting results and discard PARQUET-1419.
> >> > > > >
> >> > > > > Personally I have a hard time deciding which one I consider
> lesser
> >> > evil.
> >> > > > > What are your opinions?
> >> > > > >
> >> > > > > Thanks,
> >> > > > >
> >> > > > > Zoltan
> >> > > > >
> >> > > >
> >> >
> >>
> >
>

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Zoltan Ivanfi <zi...@cloudera.com.INVALID>.
Hi,

It seems that I spoke too early. I just noticed that a new field was added
to the ColumnChunk struct in
https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708
Although the field is optional, we can't pretend it was never released,
because parquet-cpp already expects that field for the id 8. Therefore, if
we were to add a different field with id 8 to the ColumnChunk struct, the
thrift parser of parquet-cpp 1.5.0 would probably throw an error when
reading new files, because it would try to read the ColumnChunk struct
including all of its fields (regardless of whether the code actually uses
them or not) and the field with id 8 would not match its Thrift schema.

We still can avoid breaking changes in the ColumnChunk struct by either
using id 8 for its current purpose or not using it at all (skip id 8 and
use 9 for the next field we add).

Br,

Zoltan

On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:

> Hi,
>
> If the encryption code release in parquet-cpp is unused at this moment
> then I think we are fine. It means that we are still free to decide any way
> about the data structures without the risk of incompatility issues. In this
> case I would suggest to proceed as we planned at the Parquet sync.
>
> Thanks,
>
> Zoltan
>
> On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky <gg...@gmail.com> wrote:
>
>> That's correct. A layer that runs the crypto classes to encrypt pages and
>> structures is not merged yet. Even if the parquet.thrift is out, there is
>> virtually no chance encrypted files are created with it, and certainly not
>> in production.
>>
>> Given a (hopefully mild) headache this last-minute feature/requirement has
>> caused, let me add details on where it is coming from. I received the
>> requirement from a company that got an internal pushback from teams
>> working
>> with unencrypted columns only, and unwilling to update Parquet libs. They
>> need support for a transition period for the current readers.
>>
>> We have an option to say no, and stick to the original goal list. But I
>> think the requirement makes sense, even if coming very late - and will
>> significantly ease adoption of the Parquet encryption. Also, it turns out
>> to be easy to implement, with minor changes in the code.
>>
>> In any case, its certainly a good idea to keep a single copy of
>> parquet.thrift in the format repo.
>>
>> Cheers, Gidon
>>
>> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney <we...@gmail.com>
>> wrote:
>>
>> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
>> > reading and writing files with encryption is not possible. Am I wrong
>> > about that?
>> >
>> > I'm wholly supportive of changing the project to use a released
>> > version of the Parquet format, so let's do that ASAP.
>> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <gg...@gmail.com>
>> wrote:
>> > >
>> > > Yep! (sent in parallel :)
>> > >
>> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi <zi@cloudera.com.invalid
>> >
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > As a short update, I just checked the PR for PARQUET-1419 and
>> although
>> > in
>> > > > its current form it is a breaking change, it can be easily
>> rewritten to
>> > > > become backwards-compatible so this part of the problem does not
>> apply
>> > any
>> > > > more.
>> > > >
>> > > > Br,
>> > > >
>> > > > Zoltan
>> > > >
>> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <zi...@cloudera.com>
>> wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > On the Parquet sync we discussed that the practice of maintaining
>> a
>> > copy
>> > > > > of parquet.thrift in parquet-cpp is dangerous and that we must
>> take
>> > care
>> > > > to
>> > > > > not release parquet-format changes in parquet-cpp before we
>> > officially
>> > > > > release them in parquet-format. As I got back to my computer and
>> > started
>> > > > to
>> > > > > create a JIRA about this, I noticed that unfortunately this has
>> > already
>> > > > > happened.
>> > > > >
>> > > > > The encryption-releated parquet.thrift changes have not only been
>> > added
>> > > > to
>> > > > > only parquet-format, but to parquet-cpp as well, and these changes
>> > got
>> > > > > released in parquet-cpp 1.5.0. This is very unfortunate, because
>> > > > > PARQUET-1419 would change the encryption in a breaking way, which
>> is
>> > only
>> > > > > acceptable as long as the original is not released. Additionally,
>> it
>> > has
>> > > > > been discussed that a formal voting should take place before
>> > > > incorporating
>> > > > > the encryption features in the format.
>> > > > >
>> > > > > Now that parquet-cpp has already shipped these changes, we must
>> > choose
>> > > > the
>> > > > > lesser evil of the following two options:
>> > > > >
>> > > > >    - Release a parquet-cpp 1.6.0 with a breaking change and risk
>> that
>> > > > >    encrypted data files already written with parquet-cpp 1.5.0
>> will
>> > not
>> > > > be
>> > > > >    readable any more.
>> > > > >    - Release the encryption in parquet-format as it is,
>> regardless of
>> > > > >    voting results and discard PARQUET-1419.
>> > > > >
>> > > > > Personally I have a hard time deciding which one I consider lesser
>> > evil.
>> > > > > What are your opinions?
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Zoltan
>> > > > >
>> > > >
>> >
>>
>

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Zoltan Ivanfi <zi...@cloudera.com.INVALID>.
Hi,

If the encryption code release in parquet-cpp is unused at this moment then
I think we are fine. It means that we are still free to decide any way
about the data structures without the risk of incompatility issues. In this
case I would suggest to proceed as we planned at the Parquet sync.

Thanks,

Zoltan

On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky <gg...@gmail.com> wrote:

> That's correct. A layer that runs the crypto classes to encrypt pages and
> structures is not merged yet. Even if the parquet.thrift is out, there is
> virtually no chance encrypted files are created with it, and certainly not
> in production.
>
> Given a (hopefully mild) headache this last-minute feature/requirement has
> caused, let me add details on where it is coming from. I received the
> requirement from a company that got an internal pushback from teams working
> with unencrypted columns only, and unwilling to update Parquet libs. They
> need support for a transition period for the current readers.
>
> We have an option to say no, and stick to the original goal list. But I
> think the requirement makes sense, even if coming very late - and will
> significantly ease adoption of the Parquet encryption. Also, it turns out
> to be easy to implement, with minor changes in the code.
>
> In any case, its certainly a good idea to keep a single copy of
> parquet.thrift in the format repo.
>
> Cheers, Gidon
>
> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney <we...@gmail.com> wrote:
>
> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
> > reading and writing files with encryption is not possible. Am I wrong
> > about that?
> >
> > I'm wholly supportive of changing the project to use a released
> > version of the Parquet format, so let's do that ASAP.
> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <gg...@gmail.com>
> wrote:
> > >
> > > Yep! (sent in parallel :)
> > >
> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi <zi@cloudera.com.invalid
> >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > As a short update, I just checked the PR for PARQUET-1419 and
> although
> > in
> > > > its current form it is a breaking change, it can be easily rewritten
> to
> > > > become backwards-compatible so this part of the problem does not
> apply
> > any
> > > > more.
> > > >
> > > > Br,
> > > >
> > > > Zoltan
> > > >
> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <zi...@cloudera.com>
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > On the Parquet sync we discussed that the practice of maintaining a
> > copy
> > > > > of parquet.thrift in parquet-cpp is dangerous and that we must take
> > care
> > > > to
> > > > > not release parquet-format changes in parquet-cpp before we
> > officially
> > > > > release them in parquet-format. As I got back to my computer and
> > started
> > > > to
> > > > > create a JIRA about this, I noticed that unfortunately this has
> > already
> > > > > happened.
> > > > >
> > > > > The encryption-releated parquet.thrift changes have not only been
> > added
> > > > to
> > > > > only parquet-format, but to parquet-cpp as well, and these changes
> > got
> > > > > released in parquet-cpp 1.5.0. This is very unfortunate, because
> > > > > PARQUET-1419 would change the encryption in a breaking way, which
> is
> > only
> > > > > acceptable as long as the original is not released. Additionally,
> it
> > has
> > > > > been discussed that a formal voting should take place before
> > > > incorporating
> > > > > the encryption features in the format.
> > > > >
> > > > > Now that parquet-cpp has already shipped these changes, we must
> > choose
> > > > the
> > > > > lesser evil of the following two options:
> > > > >
> > > > >    - Release a parquet-cpp 1.6.0 with a breaking change and risk
> that
> > > > >    encrypted data files already written with parquet-cpp 1.5.0 will
> > not
> > > > be
> > > > >    readable any more.
> > > > >    - Release the encryption in parquet-format as it is, regardless
> of
> > > > >    voting results and discard PARQUET-1419.
> > > > >
> > > > > Personally I have a hard time deciding which one I consider lesser
> > evil.
> > > > > What are your opinions?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Zoltan
> > > > >
> > > >
> >
>

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Gidon Gershinsky <gg...@gmail.com>.
That's correct. A layer that runs the crypto classes to encrypt pages and
structures is not merged yet. Even if the parquet.thrift is out, there is
virtually no chance encrypted files are created with it, and certainly not
in production.

Given a (hopefully mild) headache this last-minute feature/requirement has
caused, let me add details on where it is coming from. I received the
requirement from a company that got an internal pushback from teams working
with unencrypted columns only, and unwilling to update Parquet libs. They
need support for a transition period for the current readers.

We have an option to say no, and stick to the original goal list. But I
think the requirement makes sense, even if coming very late - and will
significantly ease adoption of the Parquet encryption. Also, it turns out
to be easy to implement, with minor changes in the code.

In any case, its certainly a good idea to keep a single copy of
parquet.thrift in the format repo.

Cheers, Gidon

On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney <we...@gmail.com> wrote:

> AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
> reading and writing files with encryption is not possible. Am I wrong
> about that?
>
> I'm wholly supportive of changing the project to use a released
> version of the Parquet format, so let's do that ASAP.
> On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <gg...@gmail.com> wrote:
> >
> > Yep! (sent in parallel :)
> >
> > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
> > wrote:
> >
> > > Hi,
> > >
> > > As a short update, I just checked the PR for PARQUET-1419 and although
> in
> > > its current form it is a breaking change, it can be easily rewritten to
> > > become backwards-compatible so this part of the problem does not apply
> any
> > > more.
> > >
> > > Br,
> > >
> > > Zoltan
> > >
> > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > On the Parquet sync we discussed that the practice of maintaining a
> copy
> > > > of parquet.thrift in parquet-cpp is dangerous and that we must take
> care
> > > to
> > > > not release parquet-format changes in parquet-cpp before we
> officially
> > > > release them in parquet-format. As I got back to my computer and
> started
> > > to
> > > > create a JIRA about this, I noticed that unfortunately this has
> already
> > > > happened.
> > > >
> > > > The encryption-releated parquet.thrift changes have not only been
> added
> > > to
> > > > only parquet-format, but to parquet-cpp as well, and these changes
> got
> > > > released in parquet-cpp 1.5.0. This is very unfortunate, because
> > > > PARQUET-1419 would change the encryption in a breaking way, which is
> only
> > > > acceptable as long as the original is not released. Additionally, it
> has
> > > > been discussed that a formal voting should take place before
> > > incorporating
> > > > the encryption features in the format.
> > > >
> > > > Now that parquet-cpp has already shipped these changes, we must
> choose
> > > the
> > > > lesser evil of the following two options:
> > > >
> > > >    - Release a parquet-cpp 1.6.0 with a breaking change and risk that
> > > >    encrypted data files already written with parquet-cpp 1.5.0 will
> not
> > > be
> > > >    readable any more.
> > > >    - Release the encryption in parquet-format as it is, regardless of
> > > >    voting results and discard PARQUET-1419.
> > > >
> > > > Personally I have a hard time deciding which one I consider lesser
> evil.
> > > > What are your opinions?
> > > >
> > > > Thanks,
> > > >
> > > > Zoltan
> > > >
> > >
>

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Wes McKinney <we...@gmail.com>.
AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so
reading and writing files with encryption is not possible. Am I wrong
about that?

I'm wholly supportive of changing the project to use a released
version of the Parquet format, so let's do that ASAP.
On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky <gg...@gmail.com> wrote:
>
> Yep! (sent in parallel :)
>
> On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
> wrote:
>
> > Hi,
> >
> > As a short update, I just checked the PR for PARQUET-1419 and although in
> > its current form it is a breaking change, it can be easily rewritten to
> > become backwards-compatible so this part of the problem does not apply any
> > more.
> >
> > Br,
> >
> > Zoltan
> >
> > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:
> >
> > > Hi,
> > >
> > > On the Parquet sync we discussed that the practice of maintaining a copy
> > > of parquet.thrift in parquet-cpp is dangerous and that we must take care
> > to
> > > not release parquet-format changes in parquet-cpp before we officially
> > > release them in parquet-format. As I got back to my computer and started
> > to
> > > create a JIRA about this, I noticed that unfortunately this has already
> > > happened.
> > >
> > > The encryption-releated parquet.thrift changes have not only been added
> > to
> > > only parquet-format, but to parquet-cpp as well, and these changes got
> > > released in parquet-cpp 1.5.0. This is very unfortunate, because
> > > PARQUET-1419 would change the encryption in a breaking way, which is only
> > > acceptable as long as the original is not released. Additionally, it has
> > > been discussed that a formal voting should take place before
> > incorporating
> > > the encryption features in the format.
> > >
> > > Now that parquet-cpp has already shipped these changes, we must choose
> > the
> > > lesser evil of the following two options:
> > >
> > >    - Release a parquet-cpp 1.6.0 with a breaking change and risk that
> > >    encrypted data files already written with parquet-cpp 1.5.0 will not
> > be
> > >    readable any more.
> > >    - Release the encryption in parquet-format as it is, regardless of
> > >    voting results and discard PARQUET-1419.
> > >
> > > Personally I have a hard time deciding which one I consider lesser evil.
> > > What are your opinions?
> > >
> > > Thanks,
> > >
> > > Zoltan
> > >
> >

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Gidon Gershinsky <gg...@gmail.com>.
Yep! (sent in parallel :)

On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
wrote:

> Hi,
>
> As a short update, I just checked the PR for PARQUET-1419 and although in
> its current form it is a breaking change, it can be easily rewritten to
> become backwards-compatible so this part of the problem does not apply any
> more.
>
> Br,
>
> Zoltan
>
> On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:
>
> > Hi,
> >
> > On the Parquet sync we discussed that the practice of maintaining a copy
> > of parquet.thrift in parquet-cpp is dangerous and that we must take care
> to
> > not release parquet-format changes in parquet-cpp before we officially
> > release them in parquet-format. As I got back to my computer and started
> to
> > create a JIRA about this, I noticed that unfortunately this has already
> > happened.
> >
> > The encryption-releated parquet.thrift changes have not only been added
> to
> > only parquet-format, but to parquet-cpp as well, and these changes got
> > released in parquet-cpp 1.5.0. This is very unfortunate, because
> > PARQUET-1419 would change the encryption in a breaking way, which is only
> > acceptable as long as the original is not released. Additionally, it has
> > been discussed that a formal voting should take place before
> incorporating
> > the encryption features in the format.
> >
> > Now that parquet-cpp has already shipped these changes, we must choose
> the
> > lesser evil of the following two options:
> >
> >    - Release a parquet-cpp 1.6.0 with a breaking change and risk that
> >    encrypted data files already written with parquet-cpp 1.5.0 will not
> be
> >    readable any more.
> >    - Release the encryption in parquet-format as it is, regardless of
> >    voting results and discard PARQUET-1419.
> >
> > Personally I have a hard time deciding which one I consider lesser evil.
> > What are your opinions?
> >
> > Thanks,
> >
> > Zoltan
> >
>

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Zoltan Ivanfi <zi...@cloudera.com.INVALID>.
Hi,

As a short update, I just checked the PR for PARQUET-1419 and although in
its current form it is a breaking change, it can be easily rewritten to
become backwards-compatible so this part of the problem does not apply any
more.

Br,

Zoltan

On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi <zi...@cloudera.com> wrote:

> Hi,
>
> On the Parquet sync we discussed that the practice of maintaining a copy
> of parquet.thrift in parquet-cpp is dangerous and that we must take care to
> not release parquet-format changes in parquet-cpp before we officially
> release them in parquet-format. As I got back to my computer and started to
> create a JIRA about this, I noticed that unfortunately this has already
> happened.
>
> The encryption-releated parquet.thrift changes have not only been added to
> only parquet-format, but to parquet-cpp as well, and these changes got
> released in parquet-cpp 1.5.0. This is very unfortunate, because
> PARQUET-1419 would change the encryption in a breaking way, which is only
> acceptable as long as the original is not released. Additionally, it has
> been discussed that a formal voting should take place before incorporating
> the encryption features in the format.
>
> Now that parquet-cpp has already shipped these changes, we must choose the
> lesser evil of the following two options:
>
>    - Release a parquet-cpp 1.6.0 with a breaking change and risk that
>    encrypted data files already written with parquet-cpp 1.5.0 will not be
>    readable any more.
>    - Release the encryption in parquet-format as it is, regardless of
>    voting results and discard PARQUET-1419.
>
> Personally I have a hard time deciding which one I consider lesser evil.
> What are your opinions?
>
> Thanks,
>
> Zoltan
>

Re: Parquet-cpp has already released unofficial thrift changes

Posted by Gidon Gershinsky <gg...@gmail.com>.
Hi,

There is a third options. PARQUET-1419 doesn't have to be breaking. Removal
of the 'required boolean encrypted_footer" is not a must (it can be set to
true, and checked in readers). I can modify PARQUET-1419 pr to keep this
field - then we have only one new field, an optional one, which wouldn't
introduce a breaking change. Release of parquet-format with it would be
compatible with parquet-cpp 1.5.0.

Cheers, Gidon.

On Tue, Sep 25, 2018 at 8:10 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
wrote:

> Hi,
>
> On the Parquet sync we discussed that the practice of maintaining a copy of
> parquet.thrift in parquet-cpp is dangerous and that we must take care to
> not release parquet-format changes in parquet-cpp before we officially
> release them in parquet-format. As I got back to my computer and started to
> create a JIRA about this, I noticed that unfortunately this has already
> happened.
>
> The encryption-releated parquet.thrift changes have not only been added to
> only parquet-format, but to parquet-cpp as well, and these changes got
> released in parquet-cpp 1.5.0. This is very unfortunate, because
> PARQUET-1419 would change the encryption in a breaking way, which is only
> acceptable as long as the original is not released. Additionally, it has
> been discussed that a formal voting should take place before incorporating
> the encryption features in the format.
>
> Now that parquet-cpp has already shipped these changes, we must choose the
> lesser evil of the following two options:
>
>    - Release a parquet-cpp 1.6.0 with a breaking change and risk that
>    encrypted data files already written with parquet-cpp 1.5.0 will not be
>    readable any more.
>    - Release the encryption in parquet-format as it is, regardless of
>    voting results and discard PARQUET-1419.
>
> Personally I have a hard time deciding which one I consider lesser evil.
> What are your opinions?
>
> Thanks,
>
> Zoltan
>