You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Ivan Gozali <iv...@lecida.com> on 2017/11/15 02:06:35 UTC

Pros/cons of setting parquet.writer.version=v2

Hi Parquet maintainers,

I was wondering if there are any advantages (e.g. performance increases) or
disadvantages (e.g. any stability issues) for setting the configuration
parquet.writer.version=v2 in apache-parquet-1.8.2 (particularly curious
about this version since Spark 2.2.0 uses it) or above?

Thank you in advance!

--
Regards,


Ivan Gozali
Lecida
Email: ivan@lecida.com

Re: Pros/cons of setting parquet.writer.version=v2

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
I agree with what Zoltan said, and I would add that we may use different
encodings in the future. We haven't officially closed v2, so we could add
different encodings to the spec and not require support for the existing
ones. Parquet Java would still be able to read data in those encodings, but
there's no guarantee that other readers would add support for them.

I also ran a few tests with some of our company data and didn't find a huge
benefit to the existing v2 encodings. That's why I build and proposed
different ones. If I were you, I'd stick with v1.

rb

On Wed, Nov 15, 2017 at 12:11 AM, Zoltan Ivanfi <zi...@cloudera.com> wrote:

> Hi,
>
> In my opinion, compatibility is the main thing to consider here. Some
> applications (Impala being a notable example) only support v1 at the
> moment. You should carefully consider what applications you might want to
> use in the future to process the data and check whether they all support
> v2.
>
> Regards,
>
> Zoltan
>
> On Wed, Nov 15, 2017 at 3:07 AM Ivan Gozali <iv...@lecida.com> wrote:
>
> > Hi Parquet maintainers,
> >
> > I was wondering if there are any advantages (e.g. performance increases)
> or
> > disadvantages (e.g. any stability issues) for setting the configuration
> > parquet.writer.version=v2 in apache-parquet-1.8.2 (particularly curious
> > about this version since Spark 2.2.0 uses it) or above?
> >
> > Thank you in advance!
> >
> > --
> > Regards,
> >
> >
> > Ivan Gozali
> > Lecida
> > Email: ivan@lecida.com
> >
>



-- 
Ryan Blue
Software Engineer
Netflix

Re: Pros/cons of setting parquet.writer.version=v2

Posted by Zoltan Ivanfi <zi...@cloudera.com>.
Hi,

In my opinion, compatibility is the main thing to consider here. Some
applications (Impala being a notable example) only support v1 at the
moment. You should carefully consider what applications you might want to
use in the future to process the data and check whether they all support v2.

Regards,

Zoltan

On Wed, Nov 15, 2017 at 3:07 AM Ivan Gozali <iv...@lecida.com> wrote:

> Hi Parquet maintainers,
>
> I was wondering if there are any advantages (e.g. performance increases) or
> disadvantages (e.g. any stability issues) for setting the configuration
> parquet.writer.version=v2 in apache-parquet-1.8.2 (particularly curious
> about this version since Spark 2.2.0 uses it) or above?
>
> Thank you in advance!
>
> --
> Regards,
>
>
> Ivan Gozali
> Lecida
> Email: ivan@lecida.com
>