You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Manik Singla <sm...@gmail.com> on 2019/10/15 09:52:08 UTC

custom CompressionCodec support

Hi

Current java code is not open to use custom compressor.
I believe mostly read/write is done by same team/company.  In that case, it
would be beneficial to add this support that user can plug new compressor
easily instead of doing local changes which will be prone to uses across
version upgrades.

Do you guys think it will be worth to add

Regards
Manik Singla
+91-9996008893
+91-9665639677

"Life doesn't consist in holding good cards but playing those you hold
well."

Re: custom CompressionCodec support

Posted by Manik Singla <sm...@gmail.com>.
We checked many levels like 3, 7, 10, 19. May be 1 or 2 more
I can retry the experiments.

Regards
Manik Singla
+91-9996008893
+91-9665639677

"Life doesn't consist in holding good cards but playing those you hold
well."


On Tue, Oct 22, 2019 at 1:49 AM Radev, Martin <ma...@tum.de> wrote:

> Hello Manik,
>
>
> If the compression level is really propagated to the library, what
> compression levels did you check?
>
>
> Regards,
>
> Martin
> ------------------------------
> *From:* Manik Singla <sm...@gmail.com>
> *Sent:* Monday, October 21, 2019 10:11:36 PM
> *To:* Parquet Dev
> *Cc:* falak@sumologic.com; Radev, Martin
> *Subject:* Re: custom CompressionCodec support
>
> Yes, thats the flag we tried and ensured its getting read and propagated.
>
> Regards
> Manik Singla
> +91-9996008893
> +91-9665639677
>
> "Life doesn't consist in holding good cards but playing those you hold
> well."
>
>
> On Mon, Oct 21, 2019 at 12:51 PM Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
>
>> Thanks Manik,
>>
>> Did you try setting the Hadoop io.compression.codec.zstd.level config?
>>
>> Cheers, Fokko
>>
>> Op za 19 okt. 2019 om 12:24 schreef Manik Singla <sm...@gmail.com>:
>>
>> > Hi Fokko and Martin
>> >
>> > We are using parquet-hadoop which support compressions from
>> parquet-format.
>> > In our case, we were getting same compression even after changing
>> > compression level of zstd.
>> > We confirmed that  set level is being passed by ZStandardCompressor in
>> init
>> > which is native call .
>> >
>> > To confirm the issue, we tried same by injecting own implementation of
>> zstd
>> > and that seem to work fine.
>> > We will have a look how its working for spark and not for us.
>> >
>> > Regards
>> > Manik Singla
>> > +91-9996008893
>> > +91-9665639677
>> >
>> > "Life doesn't consist in holding good cards but playing those you hold
>> > well."
>> >
>> >
>> > On Fri, Oct 18, 2019 at 5:20 PM Driesprong, Fokko <fokko@driesprong.frl
>> >
>> > wrote:
>> >
>> > > Hi Falak,
>> > >
>> > > I was able to set the compression level in Spark using
>> > > spark.io.compression.zstd.level.
>> > >
>> > > Cheers, Fokko
>> > >
>> > > Op do 17 okt. 2019 om 20:53 schreef Radev, Martin <
>> martin.radev@tum.de>:
>> > >
>> > > > Hi Falak,
>> > > >
>> > > >
>> > > > I was one of the people who recently exposed this to Arrow but this
>> is
>> > > not
>> > > > part of the Parquet specification.
>> > > >
>> > > > In particular, any implementation for writing parquet files can
>> decide
>> > > > whether to expose this or select a reasonable value internally.
>> > > >
>> > > >
>> > > > If you're using Arrow, you would have to read the documentation of
>> the
>> > > > specified compressor. Arrow doesn't do checks for whether specified
>> > > > compression level is within the range of what's supported by the
>> codec.
>> > > For
>> > > > ZSTD, the range should be [1, 22].
>> > > >
>> > > > Let me know if you're using Arrow and I can check locally that there
>> > > isn't
>> > > > by any chance a bug with propagating the value. At the moment there
>> are
>> > > > only smoke tests that nothing crashes.
>> > > >
>> > > >
>> > > > Regards,
>> > > >
>> > > > Martin
>> > > > ------------------------------
>> > > > *From:* Falak Kansal <fa...@sumologic.com>
>> > > > *Sent:* Thursday, October 17, 2019 4:43:54 PM
>> > > > *To:* Driesprong, Fokko
>> > > > *Cc:* dev@parquet.apache.org
>> > > > *Subject:* Re: custom CompressionCodec support
>> > > >
>> > > > Hi Fokko,
>> > > >
>> > > > Thanks for replying, yes sure.
>> > > > The problem we are facing is that with parquet zstd we are not able
>> to
>> > > > control the compression level, we tried setting different
>> compression
>> > > > levels but it doesn't make any difference in the size. We
>> tested/have
>> > > made
>> > > > sure that we are getting the same compression level in
>> > > > *ZStandardCompressor
>> > > > *as we are setting in the configuration file. Are we missing
>> something?
>> > > How
>> > > > can we set a different compression level of zstd? Help would be
>> > > > appreciated.
>> > > >
>> > > > Thanks
>> > > > Falak
>> > > >
>> > > > On Thu, Oct 17, 2019 at 7:47 PM Driesprong, Fokko
>> <fokko@driesprong.frl
>> > >
>> > > > wrote:
>> > > >
>> > > > > Hi Manik,
>> > > > >
>> > > > > The supported compression codecs that ship with Parquet are tested
>> > and
>> > > > > validated in the CI pipeline. Sometimes there are issues with
>> > > > compressors,
>> > > > > therefore they are not easily pluggable. Feel free to open up a
>> PR to
>> > > the
>> > > > > project if you believe if there are compressors missing, then we
>> can
>> > > > have a
>> > > > > discussion.
>> > > > >
>> > > > > It is part of the Thrift definition:
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/parquet-format/blob/37bdba0a18cff18da706a0d353c65e726c8edca6/src/main/thrift/parquet.thrift#L470-L478
>> > > > >
>> > > > > Hope this clarifies the design decision.
>> > > > >
>> > > > > Cheers, Fokko
>> > > > >
>> > > > > Op di 15 okt. 2019 om 11:52 schreef Manik Singla <
>> > smanik.im@gmail.com
>> > > >:
>> > > > >
>> > > > >> Hi
>> > > > >>
>> > > > >> Current java code is not open to use custom compressor.
>> > > > >> I believe mostly read/write is done by same team/company.  In
>> that
>> > > case,
>> > > > >> it
>> > > > >> would be beneficial to add this support that user can plug new
>> > > > compressor
>> > > > >> easily instead of doing local changes which will be prone to uses
>> > > across
>> > > > >> version upgrades.
>> > > > >>
>> > > > >> Do you guys think it will be worth to add
>> > > > >>
>> > > > >> Regards
>> > > > >> Manik Singla
>> > > > >> +91-9996008893
>> > > > >> +91-9665639677
>> > > > >>
>> > > > >> "Life doesn't consist in holding good cards but playing those you
>> > hold
>> > > > >> well."
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: custom CompressionCodec support

Posted by "Radev, Martin" <ma...@tum.de>.
Hello Manik,


If the compression level is really propagated to the library, what compression levels did you check?


Regards,

Martin

________________________________
From: Manik Singla <sm...@gmail.com>
Sent: Monday, October 21, 2019 10:11:36 PM
To: Parquet Dev
Cc: falak@sumologic.com; Radev, Martin
Subject: Re: custom CompressionCodec support

Yes, thats the flag we tried and ensured its getting read and propagated.

Regards
Manik Singla
+91-9996008893
+91-9665639677

"Life doesn't consist in holding good cards but playing those you hold well."


On Mon, Oct 21, 2019 at 12:51 PM Driesprong, Fokko <fo...@driesprong.frl> wrote:
Thanks Manik,

Did you try setting the Hadoop io.compression.codec.zstd.level config?

Cheers, Fokko

Op za 19 okt. 2019 om 12:24 schreef Manik Singla <sm...@gmail.com>>:

> Hi Fokko and Martin
>
> We are using parquet-hadoop which support compressions from parquet-format.
> In our case, we were getting same compression even after changing
> compression level of zstd.
> We confirmed that  set level is being passed by ZStandardCompressor in init
> which is native call .
>
> To confirm the issue, we tried same by injecting own implementation of zstd
> and that seem to work fine.
> We will have a look how its working for spark and not for us.
>
> Regards
> Manik Singla
> +91-9996008893
> +91-9665639677
>
> "Life doesn't consist in holding good cards but playing those you hold
> well."
>
>
> On Fri, Oct 18, 2019 at 5:20 PM Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
>
> > Hi Falak,
> >
> > I was able to set the compression level in Spark using
> > spark.io.compression.zstd.level.
> >
> > Cheers, Fokko
> >
> > Op do 17 okt. 2019 om 20:53 schreef Radev, Martin <ma...@tum.de>>:
> >
> > > Hi Falak,
> > >
> > >
> > > I was one of the people who recently exposed this to Arrow but this is
> > not
> > > part of the Parquet specification.
> > >
> > > In particular, any implementation for writing parquet files can decide
> > > whether to expose this or select a reasonable value internally.
> > >
> > >
> > > If you're using Arrow, you would have to read the documentation of the
> > > specified compressor. Arrow doesn't do checks for whether specified
> > > compression level is within the range of what's supported by the codec.
> > For
> > > ZSTD, the range should be [1, 22].
> > >
> > > Let me know if you're using Arrow and I can check locally that there
> > isn't
> > > by any chance a bug with propagating the value. At the moment there are
> > > only smoke tests that nothing crashes.
> > >
> > >
> > > Regards,
> > >
> > > Martin
> > > ------------------------------
> > > *From:* Falak Kansal <fa...@sumologic.com>>
> > > *Sent:* Thursday, October 17, 2019 4:43:54 PM
> > > *To:* Driesprong, Fokko
> > > *Cc:* dev@parquet.apache.org<ma...@parquet.apache.org>
> > > *Subject:* Re: custom CompressionCodec support
> > >
> > > Hi Fokko,
> > >
> > > Thanks for replying, yes sure.
> > > The problem we are facing is that with parquet zstd we are not able to
> > > control the compression level, we tried setting different compression
> > > levels but it doesn't make any difference in the size. We tested/have
> > made
> > > sure that we are getting the same compression level in
> > > *ZStandardCompressor
> > > *as we are setting in the configuration file. Are we missing something?
> > How
> > > can we set a different compression level of zstd? Help would be
> > > appreciated.
> > >
> > > Thanks
> > > Falak
> > >
> > > On Thu, Oct 17, 2019 at 7:47 PM Driesprong, Fokko <fokko@driesprong.frl
> >
> > > wrote:
> > >
> > > > Hi Manik,
> > > >
> > > > The supported compression codecs that ship with Parquet are tested
> and
> > > > validated in the CI pipeline. Sometimes there are issues with
> > > compressors,
> > > > therefore they are not easily pluggable. Feel free to open up a PR to
> > the
> > > > project if you believe if there are compressors missing, then we can
> > > have a
> > > > discussion.
> > > >
> > > > It is part of the Thrift definition:
> > > >
> > >
> >
> https://github.com/apache/parquet-format/blob/37bdba0a18cff18da706a0d353c65e726c8edca6/src/main/thrift/parquet.thrift#L470-L478
> > > >
> > > > Hope this clarifies the design decision.
> > > >
> > > > Cheers, Fokko
> > > >
> > > > Op di 15 okt. 2019 om 11:52 schreef Manik Singla <
> smanik.im@gmail.com<ma...@gmail.com>
> > >:
> > > >
> > > >> Hi
> > > >>
> > > >> Current java code is not open to use custom compressor.
> > > >> I believe mostly read/write is done by same team/company.  In that
> > case,
> > > >> it
> > > >> would be beneficial to add this support that user can plug new
> > > compressor
> > > >> easily instead of doing local changes which will be prone to uses
> > across
> > > >> version upgrades.
> > > >>
> > > >> Do you guys think it will be worth to add
> > > >>
> > > >> Regards
> > > >> Manik Singla
> > > >> +91-9996008893
> > > >> +91-9665639677
> > > >>
> > > >> "Life doesn't consist in holding good cards but playing those you
> hold
> > > >> well."
> > > >>
> > > >
> > >
> >
>

Re: custom CompressionCodec support

Posted by Manik Singla <sm...@gmail.com>.
Yes, thats the flag we tried and ensured its getting read and propagated.

Regards
Manik Singla
+91-9996008893
+91-9665639677

"Life doesn't consist in holding good cards but playing those you hold
well."


On Mon, Oct 21, 2019 at 12:51 PM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Thanks Manik,
>
> Did you try setting the Hadoop io.compression.codec.zstd.level config?
>
> Cheers, Fokko
>
> Op za 19 okt. 2019 om 12:24 schreef Manik Singla <sm...@gmail.com>:
>
> > Hi Fokko and Martin
> >
> > We are using parquet-hadoop which support compressions from
> parquet-format.
> > In our case, we were getting same compression even after changing
> > compression level of zstd.
> > We confirmed that  set level is being passed by ZStandardCompressor in
> init
> > which is native call .
> >
> > To confirm the issue, we tried same by injecting own implementation of
> zstd
> > and that seem to work fine.
> > We will have a look how its working for spark and not for us.
> >
> > Regards
> > Manik Singla
> > +91-9996008893
> > +91-9665639677
> >
> > "Life doesn't consist in holding good cards but playing those you hold
> > well."
> >
> >
> > On Fri, Oct 18, 2019 at 5:20 PM Driesprong, Fokko <fo...@driesprong.frl>
> > wrote:
> >
> > > Hi Falak,
> > >
> > > I was able to set the compression level in Spark using
> > > spark.io.compression.zstd.level.
> > >
> > > Cheers, Fokko
> > >
> > > Op do 17 okt. 2019 om 20:53 schreef Radev, Martin <martin.radev@tum.de
> >:
> > >
> > > > Hi Falak,
> > > >
> > > >
> > > > I was one of the people who recently exposed this to Arrow but this
> is
> > > not
> > > > part of the Parquet specification.
> > > >
> > > > In particular, any implementation for writing parquet files can
> decide
> > > > whether to expose this or select a reasonable value internally.
> > > >
> > > >
> > > > If you're using Arrow, you would have to read the documentation of
> the
> > > > specified compressor. Arrow doesn't do checks for whether specified
> > > > compression level is within the range of what's supported by the
> codec.
> > > For
> > > > ZSTD, the range should be [1, 22].
> > > >
> > > > Let me know if you're using Arrow and I can check locally that there
> > > isn't
> > > > by any chance a bug with propagating the value. At the moment there
> are
> > > > only smoke tests that nothing crashes.
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Martin
> > > > ------------------------------
> > > > *From:* Falak Kansal <fa...@sumologic.com>
> > > > *Sent:* Thursday, October 17, 2019 4:43:54 PM
> > > > *To:* Driesprong, Fokko
> > > > *Cc:* dev@parquet.apache.org
> > > > *Subject:* Re: custom CompressionCodec support
> > > >
> > > > Hi Fokko,
> > > >
> > > > Thanks for replying, yes sure.
> > > > The problem we are facing is that with parquet zstd we are not able
> to
> > > > control the compression level, we tried setting different compression
> > > > levels but it doesn't make any difference in the size. We tested/have
> > > made
> > > > sure that we are getting the same compression level in
> > > > *ZStandardCompressor
> > > > *as we are setting in the configuration file. Are we missing
> something?
> > > How
> > > > can we set a different compression level of zstd? Help would be
> > > > appreciated.
> > > >
> > > > Thanks
> > > > Falak
> > > >
> > > > On Thu, Oct 17, 2019 at 7:47 PM Driesprong, Fokko
> <fokko@driesprong.frl
> > >
> > > > wrote:
> > > >
> > > > > Hi Manik,
> > > > >
> > > > > The supported compression codecs that ship with Parquet are tested
> > and
> > > > > validated in the CI pipeline. Sometimes there are issues with
> > > > compressors,
> > > > > therefore they are not easily pluggable. Feel free to open up a PR
> to
> > > the
> > > > > project if you believe if there are compressors missing, then we
> can
> > > > have a
> > > > > discussion.
> > > > >
> > > > > It is part of the Thrift definition:
> > > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-format/blob/37bdba0a18cff18da706a0d353c65e726c8edca6/src/main/thrift/parquet.thrift#L470-L478
> > > > >
> > > > > Hope this clarifies the design decision.
> > > > >
> > > > > Cheers, Fokko
> > > > >
> > > > > Op di 15 okt. 2019 om 11:52 schreef Manik Singla <
> > smanik.im@gmail.com
> > > >:
> > > > >
> > > > >> Hi
> > > > >>
> > > > >> Current java code is not open to use custom compressor.
> > > > >> I believe mostly read/write is done by same team/company.  In that
> > > case,
> > > > >> it
> > > > >> would be beneficial to add this support that user can plug new
> > > > compressor
> > > > >> easily instead of doing local changes which will be prone to uses
> > > across
> > > > >> version upgrades.
> > > > >>
> > > > >> Do you guys think it will be worth to add
> > > > >>
> > > > >> Regards
> > > > >> Manik Singla
> > > > >> +91-9996008893
> > > > >> +91-9665639677
> > > > >>
> > > > >> "Life doesn't consist in holding good cards but playing those you
> > hold
> > > > >> well."
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: custom CompressionCodec support

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Thanks Manik,

Did you try setting the Hadoop io.compression.codec.zstd.level config?

Cheers, Fokko

Op za 19 okt. 2019 om 12:24 schreef Manik Singla <sm...@gmail.com>:

> Hi Fokko and Martin
>
> We are using parquet-hadoop which support compressions from parquet-format.
> In our case, we were getting same compression even after changing
> compression level of zstd.
> We confirmed that  set level is being passed by ZStandardCompressor in init
> which is native call .
>
> To confirm the issue, we tried same by injecting own implementation of zstd
> and that seem to work fine.
> We will have a look how its working for spark and not for us.
>
> Regards
> Manik Singla
> +91-9996008893
> +91-9665639677
>
> "Life doesn't consist in holding good cards but playing those you hold
> well."
>
>
> On Fri, Oct 18, 2019 at 5:20 PM Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
>
> > Hi Falak,
> >
> > I was able to set the compression level in Spark using
> > spark.io.compression.zstd.level.
> >
> > Cheers, Fokko
> >
> > Op do 17 okt. 2019 om 20:53 schreef Radev, Martin <ma...@tum.de>:
> >
> > > Hi Falak,
> > >
> > >
> > > I was one of the people who recently exposed this to Arrow but this is
> > not
> > > part of the Parquet specification.
> > >
> > > In particular, any implementation for writing parquet files can decide
> > > whether to expose this or select a reasonable value internally.
> > >
> > >
> > > If you're using Arrow, you would have to read the documentation of the
> > > specified compressor. Arrow doesn't do checks for whether specified
> > > compression level is within the range of what's supported by the codec.
> > For
> > > ZSTD, the range should be [1, 22].
> > >
> > > Let me know if you're using Arrow and I can check locally that there
> > isn't
> > > by any chance a bug with propagating the value. At the moment there are
> > > only smoke tests that nothing crashes.
> > >
> > >
> > > Regards,
> > >
> > > Martin
> > > ------------------------------
> > > *From:* Falak Kansal <fa...@sumologic.com>
> > > *Sent:* Thursday, October 17, 2019 4:43:54 PM
> > > *To:* Driesprong, Fokko
> > > *Cc:* dev@parquet.apache.org
> > > *Subject:* Re: custom CompressionCodec support
> > >
> > > Hi Fokko,
> > >
> > > Thanks for replying, yes sure.
> > > The problem we are facing is that with parquet zstd we are not able to
> > > control the compression level, we tried setting different compression
> > > levels but it doesn't make any difference in the size. We tested/have
> > made
> > > sure that we are getting the same compression level in
> > > *ZStandardCompressor
> > > *as we are setting in the configuration file. Are we missing something?
> > How
> > > can we set a different compression level of zstd? Help would be
> > > appreciated.
> > >
> > > Thanks
> > > Falak
> > >
> > > On Thu, Oct 17, 2019 at 7:47 PM Driesprong, Fokko <fokko@driesprong.frl
> >
> > > wrote:
> > >
> > > > Hi Manik,
> > > >
> > > > The supported compression codecs that ship with Parquet are tested
> and
> > > > validated in the CI pipeline. Sometimes there are issues with
> > > compressors,
> > > > therefore they are not easily pluggable. Feel free to open up a PR to
> > the
> > > > project if you believe if there are compressors missing, then we can
> > > have a
> > > > discussion.
> > > >
> > > > It is part of the Thrift definition:
> > > >
> > >
> >
> https://github.com/apache/parquet-format/blob/37bdba0a18cff18da706a0d353c65e726c8edca6/src/main/thrift/parquet.thrift#L470-L478
> > > >
> > > > Hope this clarifies the design decision.
> > > >
> > > > Cheers, Fokko
> > > >
> > > > Op di 15 okt. 2019 om 11:52 schreef Manik Singla <
> smanik.im@gmail.com
> > >:
> > > >
> > > >> Hi
> > > >>
> > > >> Current java code is not open to use custom compressor.
> > > >> I believe mostly read/write is done by same team/company.  In that
> > case,
> > > >> it
> > > >> would be beneficial to add this support that user can plug new
> > > compressor
> > > >> easily instead of doing local changes which will be prone to uses
> > across
> > > >> version upgrades.
> > > >>
> > > >> Do you guys think it will be worth to add
> > > >>
> > > >> Regards
> > > >> Manik Singla
> > > >> +91-9996008893
> > > >> +91-9665639677
> > > >>
> > > >> "Life doesn't consist in holding good cards but playing those you
> hold
> > > >> well."
> > > >>
> > > >
> > >
> >
>

Re: custom CompressionCodec support

Posted by Manik Singla <sm...@gmail.com>.
Hi Fokko and Martin

We are using parquet-hadoop which support compressions from parquet-format.
In our case, we were getting same compression even after changing
compression level of zstd.
We confirmed that  set level is being passed by ZStandardCompressor in init
which is native call .

To confirm the issue, we tried same by injecting own implementation of zstd
and that seem to work fine.
We will have a look how its working for spark and not for us.

Regards
Manik Singla
+91-9996008893
+91-9665639677

"Life doesn't consist in holding good cards but playing those you hold
well."


On Fri, Oct 18, 2019 at 5:20 PM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Hi Falak,
>
> I was able to set the compression level in Spark using
> spark.io.compression.zstd.level.
>
> Cheers, Fokko
>
> Op do 17 okt. 2019 om 20:53 schreef Radev, Martin <ma...@tum.de>:
>
> > Hi Falak,
> >
> >
> > I was one of the people who recently exposed this to Arrow but this is
> not
> > part of the Parquet specification.
> >
> > In particular, any implementation for writing parquet files can decide
> > whether to expose this or select a reasonable value internally.
> >
> >
> > If you're using Arrow, you would have to read the documentation of the
> > specified compressor. Arrow doesn't do checks for whether specified
> > compression level is within the range of what's supported by the codec.
> For
> > ZSTD, the range should be [1, 22].
> >
> > Let me know if you're using Arrow and I can check locally that there
> isn't
> > by any chance a bug with propagating the value. At the moment there are
> > only smoke tests that nothing crashes.
> >
> >
> > Regards,
> >
> > Martin
> > ------------------------------
> > *From:* Falak Kansal <fa...@sumologic.com>
> > *Sent:* Thursday, October 17, 2019 4:43:54 PM
> > *To:* Driesprong, Fokko
> > *Cc:* dev@parquet.apache.org
> > *Subject:* Re: custom CompressionCodec support
> >
> > Hi Fokko,
> >
> > Thanks for replying, yes sure.
> > The problem we are facing is that with parquet zstd we are not able to
> > control the compression level, we tried setting different compression
> > levels but it doesn't make any difference in the size. We tested/have
> made
> > sure that we are getting the same compression level in
> > *ZStandardCompressor
> > *as we are setting in the configuration file. Are we missing something?
> How
> > can we set a different compression level of zstd? Help would be
> > appreciated.
> >
> > Thanks
> > Falak
> >
> > On Thu, Oct 17, 2019 at 7:47 PM Driesprong, Fokko <fo...@driesprong.frl>
> > wrote:
> >
> > > Hi Manik,
> > >
> > > The supported compression codecs that ship with Parquet are tested and
> > > validated in the CI pipeline. Sometimes there are issues with
> > compressors,
> > > therefore they are not easily pluggable. Feel free to open up a PR to
> the
> > > project if you believe if there are compressors missing, then we can
> > have a
> > > discussion.
> > >
> > > It is part of the Thrift definition:
> > >
> >
> https://github.com/apache/parquet-format/blob/37bdba0a18cff18da706a0d353c65e726c8edca6/src/main/thrift/parquet.thrift#L470-L478
> > >
> > > Hope this clarifies the design decision.
> > >
> > > Cheers, Fokko
> > >
> > > Op di 15 okt. 2019 om 11:52 schreef Manik Singla <smanik.im@gmail.com
> >:
> > >
> > >> Hi
> > >>
> > >> Current java code is not open to use custom compressor.
> > >> I believe mostly read/write is done by same team/company.  In that
> case,
> > >> it
> > >> would be beneficial to add this support that user can plug new
> > compressor
> > >> easily instead of doing local changes which will be prone to uses
> across
> > >> version upgrades.
> > >>
> > >> Do you guys think it will be worth to add
> > >>
> > >> Regards
> > >> Manik Singla
> > >> +91-9996008893
> > >> +91-9665639677
> > >>
> > >> "Life doesn't consist in holding good cards but playing those you hold
> > >> well."
> > >>
> > >
> >
>

Re: custom CompressionCodec support

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Hi Falak,

I was able to set the compression level in Spark using
spark.io.compression.zstd.level.

Cheers, Fokko

Op do 17 okt. 2019 om 20:53 schreef Radev, Martin <ma...@tum.de>:

> Hi Falak,
>
>
> I was one of the people who recently exposed this to Arrow but this is not
> part of the Parquet specification.
>
> In particular, any implementation for writing parquet files can decide
> whether to expose this or select a reasonable value internally.
>
>
> If you're using Arrow, you would have to read the documentation of the
> specified compressor. Arrow doesn't do checks for whether specified
> compression level is within the range of what's supported by the codec. For
> ZSTD, the range should be [1, 22].
>
> Let me know if you're using Arrow and I can check locally that there isn't
> by any chance a bug with propagating the value. At the moment there are
> only smoke tests that nothing crashes.
>
>
> Regards,
>
> Martin
> ------------------------------
> *From:* Falak Kansal <fa...@sumologic.com>
> *Sent:* Thursday, October 17, 2019 4:43:54 PM
> *To:* Driesprong, Fokko
> *Cc:* dev@parquet.apache.org
> *Subject:* Re: custom CompressionCodec support
>
> Hi Fokko,
>
> Thanks for replying, yes sure.
> The problem we are facing is that with parquet zstd we are not able to
> control the compression level, we tried setting different compression
> levels but it doesn't make any difference in the size. We tested/have made
> sure that we are getting the same compression level in
> *ZStandardCompressor
> *as we are setting in the configuration file. Are we missing something? How
> can we set a different compression level of zstd? Help would be
> appreciated.
>
> Thanks
> Falak
>
> On Thu, Oct 17, 2019 at 7:47 PM Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
>
> > Hi Manik,
> >
> > The supported compression codecs that ship with Parquet are tested and
> > validated in the CI pipeline. Sometimes there are issues with
> compressors,
> > therefore they are not easily pluggable. Feel free to open up a PR to the
> > project if you believe if there are compressors missing, then we can
> have a
> > discussion.
> >
> > It is part of the Thrift definition:
> >
> https://github.com/apache/parquet-format/blob/37bdba0a18cff18da706a0d353c65e726c8edca6/src/main/thrift/parquet.thrift#L470-L478
> >
> > Hope this clarifies the design decision.
> >
> > Cheers, Fokko
> >
> > Op di 15 okt. 2019 om 11:52 schreef Manik Singla <sm...@gmail.com>:
> >
> >> Hi
> >>
> >> Current java code is not open to use custom compressor.
> >> I believe mostly read/write is done by same team/company.  In that case,
> >> it
> >> would be beneficial to add this support that user can plug new
> compressor
> >> easily instead of doing local changes which will be prone to uses across
> >> version upgrades.
> >>
> >> Do you guys think it will be worth to add
> >>
> >> Regards
> >> Manik Singla
> >> +91-9996008893
> >> +91-9665639677
> >>
> >> "Life doesn't consist in holding good cards but playing those you hold
> >> well."
> >>
> >
>

Re: custom CompressionCodec support

Posted by "Radev, Martin" <ma...@tum.de>.
Hi Falak,


I was one of the people who recently exposed this to Arrow but this is not part of the Parquet specification.

In particular, any implementation for writing parquet files can decide whether to expose this or select a reasonable value internally.


If you're using Arrow, you would have to read the documentation of the specified compressor. Arrow doesn't do checks for whether specified compression level is within the range of what's supported by the codec. For ZSTD, the range should be [1, 22].

Let me know if you're using Arrow and I can check locally that there isn't by any chance a bug with propagating the value. At the moment there are only smoke tests that nothing crashes.


Regards,

Martin

________________________________
From: Falak Kansal <fa...@sumologic.com>
Sent: Thursday, October 17, 2019 4:43:54 PM
To: Driesprong, Fokko
Cc: dev@parquet.apache.org
Subject: Re: custom CompressionCodec support

Hi Fokko,

Thanks for replying, yes sure.
The problem we are facing is that with parquet zstd we are not able to
control the compression level, we tried setting different compression
levels but it doesn't make any difference in the size. We tested/have made
sure that we are getting the same compression level in  *ZStandardCompressor
*as we are setting in the configuration file. Are we missing something? How
can we set a different compression level of zstd? Help would be appreciated.

Thanks
Falak

On Thu, Oct 17, 2019 at 7:47 PM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Hi Manik,
>
> The supported compression codecs that ship with Parquet are tested and
> validated in the CI pipeline. Sometimes there are issues with compressors,
> therefore they are not easily pluggable. Feel free to open up a PR to the
> project if you believe if there are compressors missing, then we can have a
> discussion.
>
> It is part of the Thrift definition:
> https://github.com/apache/parquet-format/blob/37bdba0a18cff18da706a0d353c65e726c8edca6/src/main/thrift/parquet.thrift#L470-L478
>
> Hope this clarifies the design decision.
>
> Cheers, Fokko
>
> Op di 15 okt. 2019 om 11:52 schreef Manik Singla <sm...@gmail.com>:
>
>> Hi
>>
>> Current java code is not open to use custom compressor.
>> I believe mostly read/write is done by same team/company.  In that case,
>> it
>> would be beneficial to add this support that user can plug new compressor
>> easily instead of doing local changes which will be prone to uses across
>> version upgrades.
>>
>> Do you guys think it will be worth to add
>>
>> Regards
>> Manik Singla
>> +91-9996008893
>> +91-9665639677
>>
>> "Life doesn't consist in holding good cards but playing those you hold
>> well."
>>
>

Re: custom CompressionCodec support

Posted by Falak Kansal <fa...@sumologic.com>.
Hi Fokko,

Thanks for replying, yes sure.
The problem we are facing is that with parquet zstd we are not able to
control the compression level, we tried setting different compression
levels but it doesn't make any difference in the size. We tested/have made
sure that we are getting the same compression level in  *ZStandardCompressor
*as we are setting in the configuration file. Are we missing something? How
can we set a different compression level of zstd? Help would be appreciated.

Thanks
Falak

On Thu, Oct 17, 2019 at 7:47 PM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Hi Manik,
>
> The supported compression codecs that ship with Parquet are tested and
> validated in the CI pipeline. Sometimes there are issues with compressors,
> therefore they are not easily pluggable. Feel free to open up a PR to the
> project if you believe if there are compressors missing, then we can have a
> discussion.
>
> It is part of the Thrift definition:
> https://github.com/apache/parquet-format/blob/37bdba0a18cff18da706a0d353c65e726c8edca6/src/main/thrift/parquet.thrift#L470-L478
>
> Hope this clarifies the design decision.
>
> Cheers, Fokko
>
> Op di 15 okt. 2019 om 11:52 schreef Manik Singla <sm...@gmail.com>:
>
>> Hi
>>
>> Current java code is not open to use custom compressor.
>> I believe mostly read/write is done by same team/company.  In that case,
>> it
>> would be beneficial to add this support that user can plug new compressor
>> easily instead of doing local changes which will be prone to uses across
>> version upgrades.
>>
>> Do you guys think it will be worth to add
>>
>> Regards
>> Manik Singla
>> +91-9996008893
>> +91-9665639677
>>
>> "Life doesn't consist in holding good cards but playing those you hold
>> well."
>>
>

Re: custom CompressionCodec support

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Hi Manik,

The supported compression codecs that ship with Parquet are tested and
validated in the CI pipeline. Sometimes there are issues with compressors,
therefore they are not easily pluggable. Feel free to open up a PR to the
project if you believe if there are compressors missing, then we can have a
discussion.

It is part of the Thrift definition:
https://github.com/apache/parquet-format/blob/37bdba0a18cff18da706a0d353c65e726c8edca6/src/main/thrift/parquet.thrift#L470-L478

Hope this clarifies the design decision.

Cheers, Fokko

Op di 15 okt. 2019 om 11:52 schreef Manik Singla <sm...@gmail.com>:

> Hi
>
> Current java code is not open to use custom compressor.
> I believe mostly read/write is done by same team/company.  In that case, it
> would be beneficial to add this support that user can plug new compressor
> easily instead of doing local changes which will be prone to uses across
> version upgrades.
>
> Do you guys think it will be worth to add
>
> Regards
> Manik Singla
> +91-9996008893
> +91-9665639677
>
> "Life doesn't consist in holding good cards but playing those you hold
> well."
>