You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Gang Wu <us...@gmail.com> on 2023/06/16 02:50:38 UTC

Re: [Parquet C++] Plan to bump default write version from 2.4 -> 2.6 (include nanoseconds LogicalType)

+ dev@parquet

On Fri, Jun 16, 2023 at 7:43 AM Jacob Wujciak-Jens
<ja...@voltrondata.com.invalid> wrote:

> +1 on the update but also on properly communicating the change to avoid
> surprising issues :)
>
> On Thu, Jun 15, 2023 at 7:53 PM Joris Van den Bossche <
> jorisvandenbossche@gmail.com> wrote:
>
> > On Thu, 15 Jun 2023 at 19:08, Ian Cook <ia...@apache.org> wrote:
> > >
> > > It will still be possible to write files using Parquet 2.4 by
> > > explicitly specifying the 2.4 version to the Parquet writer, correct?
> > > If yes, that provides a simple workaround for users who encounter
> > > compatibility issues.
> >
> > Indeed. Using the pyarrow API, it would be something like
> > `pq.write_parquet(table, path, version="2.4")`
> >
> > >
> > > However we should take care to document this as a potentially breaking
> > > change, and document the workaround in release notes, release blog,
> > > etc.
> >
> > Certainly!
> >
> > >
> > > Ian
> > >
> > > On Thu, Jun 15, 2023 at 12:25 PM Joris Van den Bossche
> > > <jo...@gmail.com> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Bringing up https://github.com/apache/arrow/issues/35746 to the
> > > > mailing list: this issue proposes to bump the default Parquet version
> > > > we use for writing to Parquet files in the C++ library (and in the
> > > > various bindings including pyarrow and R arrow) from the current
> > > > default of "2.4" to "2.6".
> > > >
> > > > In practice, the only change is that the writer will, by default,
> > > > write the Timestamp LogicalType with NANOS unit
> > > > (
> >
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp
> > )
> > > > if your data uses timestamp("ns") (currently, such data gets coerced
> > > > to microsecond resolution when writing to Parquet).
> > > >
> > > > In theory this could cause compatibility issues if the files you are
> > > > writing need to be read by other Parquet implementations which don't
> > > > yet support nanoseconds. But the Parquet format 2.6 was released in
> > > > Sept 2018, and parquet-mr added support for it in 2018 as well.
> > > >
> > > > Unless there is pushback on this, we are currently planning to make
> > > > this change for the upcoming Arrow 13.0.0 release.
> > > >
> > > > Best,
> > > > Joris
> >
>