You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Micah Kornfield <em...@gmail.com> on 2021/03/05 18:26:57 UTC

[C++] Changing the versioning string for Parquet-CPP

There has not been an official release of the Parquet C++ library in quite
some time.  I don't think this is a huge issue as the parquet bits are
packaged into each Arrow release.

However, one  practical concern is when bugs crop up for a particular
version writing a parquet file, it is impossible for readers to mitigate
them.  One practical example is a long standing bug (with a fix recently
merged) where the comparator for ByteArray/FLBA encoded Decimals was
incorrectly  implemented.  This means min/max statistics for these Decimal
values cannot be relied on.

I'd like to propose that we change the default version string [1] for
parquet-cpp to reflect arrow releases (e.g. "parquet-cpp-arrow version
3.0.0" instead of "parquet-cpp version 1.5.1-snapshot").

Any objections? An alternative would be to try to do releases of
parquet-cpp on the same timeline as Arrow releases.

Thanks,
Micah

[1]
https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/cpp/src/parquet/parquet_version.h.in

Re: [C++] Changing the versioning string for Parquet-CPP

Posted by Joris Van den Bossche <jo...@gmail.com>.
There is an issue about this:
https://issues.apache.org/jira/browse/ARROW-7830

+1 on changing this to follow the Arrow version number (the current
non-changing number is not particularly useful ..)

Joris

On Fri, 5 Mar 2021 at 19:27, Micah Kornfield <em...@gmail.com> wrote:

> There has not been an official release of the Parquet C++ library in quite
> some time.  I don't think this is a huge issue as the parquet bits are
> packaged into each Arrow release.
>
> However, one  practical concern is when bugs crop up for a particular
> version writing a parquet file, it is impossible for readers to mitigate
> them.  One practical example is a long standing bug (with a fix recently
> merged) where the comparator for ByteArray/FLBA encoded Decimals was
> incorrectly  implemented.  This means min/max statistics for these Decimal
> values cannot be relied on.
>
> I'd like to propose that we change the default version string [1] for
> parquet-cpp to reflect arrow releases (e.g. "parquet-cpp-arrow version
> 3.0.0" instead of "parquet-cpp version 1.5.1-snapshot").
>
> Any objections? An alternative would be to try to do releases of
> parquet-cpp on the same timeline as Arrow releases.
>
> Thanks,
> Micah
>
> [1]
>
> https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/cpp/src/parquet/parquet_version.h.in
>

Re: [C++] Changing the versioning string for Parquet-CPP

Posted by Antoine Pitrou <an...@python.org>.
Hi,

As a first step, I went ahead and renamed unreleased version
"cpp-1.6.0" to "cpp-4.0.0" on the Parquet JIRA.

Now we need to solve https://issues.apache.org/jira/browse/ARROW-7830.

Best regards

Antoine.


On Fri, 12 Mar 2021 22:09:27 +0100
"Uwe L. Korn" <uw...@xhochy.com> wrote:
> When we merged this into the Arrow repo, at least from my side, there was the intention to revert that maybe at some stage again. The though behind moving parquet-cpp out of the Arrow repo again was based on the idea that Parquet was one of the many interfaces Arrow does provide access to but not one of the outstanding ones. Nowadays, I have the feeling that Parquet and Arrow have a much more bound-together relationship than I initially expected. Thus we should probably accept that parquet-cpp will stay for a very long time in the Arrow repo and this should continue with the versioning.
> 
> Also we had the assumption that from time to time the parquet community would make separate releases. I have no memory anymore how we assumed that these releases would happen or why though.
> 
> Basically, we had some assumptions that supported keeping the version numbers separate makes sense. All of the assumptions I can think of turned out to be false, thus keeping the version in line with Arrow (C++) makes totally sense nowadays.
> 
> Uwe
> 
> On Tue, Mar 9, 2021, at 7:57 PM, Micah Kornfield wrote:
> > I think there might have been some old agreement on this when parquet-cpp
> > was moved into the Arrow repo.  I can't seem to find the thread, but it
> > would be nice for some PMC members to chime it to make sure this seems OK
> > to them.
> > 
> > On Sat, Mar 6, 2021 at 7:38 AM Antoine Pitrou <an...@python.org> wrote:
> >   
> > > On Fri, 5 Mar 2021 10:26:57 -0800
> > > Micah Kornfield <em...@gmail.com>
> > > wrote:  
> > > >
> > > > I'd like to propose that we change the default version string [1] for
> > > > parquet-cpp to reflect arrow releases (e.g. "parquet-cpp-arrow version
> > > > 3.0.0" instead of "parquet-cpp version 1.5.1-snapshot").  
> > >
> > > +1.  This definitely makes the most sense.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > >  
> >  
> 




Re: [C++] Changing the versioning string for Parquet-CPP

Posted by "Uwe L. Korn" <uw...@xhochy.com>.
When we merged this into the Arrow repo, at least from my side, there was the intention to revert that maybe at some stage again. The though behind moving parquet-cpp out of the Arrow repo again was based on the idea that Parquet was one of the many interfaces Arrow does provide access to but not one of the outstanding ones. Nowadays, I have the feeling that Parquet and Arrow have a much more bound-together relationship than I initially expected. Thus we should probably accept that parquet-cpp will stay for a very long time in the Arrow repo and this should continue with the versioning.

Also we had the assumption that from time to time the parquet community would make separate releases. I have no memory anymore how we assumed that these releases would happen or why though.

Basically, we had some assumptions that supported keeping the version numbers separate makes sense. All of the assumptions I can think of turned out to be false, thus keeping the version in line with Arrow (C++) makes totally sense nowadays.

Uwe

On Tue, Mar 9, 2021, at 7:57 PM, Micah Kornfield wrote:
> I think there might have been some old agreement on this when parquet-cpp
> was moved into the Arrow repo.  I can't seem to find the thread, but it
> would be nice for some PMC members to chime it to make sure this seems OK
> to them.
> 
> On Sat, Mar 6, 2021 at 7:38 AM Antoine Pitrou <an...@python.org> wrote:
> 
> > On Fri, 5 Mar 2021 10:26:57 -0800
> > Micah Kornfield <em...@gmail.com>
> > wrote:
> > >
> > > I'd like to propose that we change the default version string [1] for
> > > parquet-cpp to reflect arrow releases (e.g. "parquet-cpp-arrow version
> > > 3.0.0" instead of "parquet-cpp version 1.5.1-snapshot").
> >
> > +1.  This definitely makes the most sense.
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
>

Re: [C++] Changing the versioning string for Parquet-CPP

Posted by Micah Kornfield <em...@gmail.com>.
I think there might have been some old agreement on this when parquet-cpp
was moved into the Arrow repo.  I can't seem to find the thread, but it
would be nice for some PMC members to chime it to make sure this seems OK
to them.

On Sat, Mar 6, 2021 at 7:38 AM Antoine Pitrou <an...@python.org> wrote:

> On Fri, 5 Mar 2021 10:26:57 -0800
> Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > I'd like to propose that we change the default version string [1] for
> > parquet-cpp to reflect arrow releases (e.g. "parquet-cpp-arrow version
> > 3.0.0" instead of "parquet-cpp version 1.5.1-snapshot").
>
> +1.  This definitely makes the most sense.
>
> Regards
>
> Antoine.
>
>
>

Re: [C++] Changing the versioning string for Parquet-CPP

Posted by Antoine Pitrou <an...@python.org>.
On Fri, 5 Mar 2021 10:26:57 -0800
Micah Kornfield <em...@gmail.com>
wrote:
> 
> I'd like to propose that we change the default version string [1] for
> parquet-cpp to reflect arrow releases (e.g. "parquet-cpp-arrow version
> 3.0.0" instead of "parquet-cpp version 1.5.1-snapshot").

+1.  This definitely makes the most sense.

Regards

Antoine.