You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2017/01/25 21:37:11 UTC

Parquet tables with snappy compression

Hi,

Has there been any study of how much compressing Hive Parquet tables with
snappy reduces storage space or simply the table size in quantitative terms?

Thanks

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Parquet tables with snappy compression

Posted by Gopal Vijayaraghavan <go...@apache.org>.
> Has there been any study of how much compressing Hive Parquet tables with snappy reduces storage space or simply the table size in quantitative terms?

http://www.slideshare.net/oom65/file-format-benchmarks-avro-json-orc-parquet/20

Since SNAPPY is just LZ77, I would assume it would be useful in cases of Parquet leaves containing text with large common sub-chunks (like URLs or log data).

If you want to experiment with that corner case, the L_COMMENT field from TPC-H lineitem is a good compression-thrasher.

Cheers,
Gopal



Re: Parquet tables with snappy compression

Posted by Owen O'Malley <om...@apache.org>.
Mich,
   Here are the benchmarks that I did using three different types of data:

http://www.slideshare.net/HadoopSummit/file-format-benchmark-avro-json-orc-parquet

I assume you are comparing parquet-snappy vs parquet-none.

.. Owen


On Wed, Jan 25, 2017 at 1:37 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Hi,
>
> Has there been any study of how much compressing Hive Parquet tables with
> snappy reduces storage space or simply the table size in quantitative terms?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>