You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Patrick Duin <pa...@gmail.com> on 2018/08/10 13:29:57 UTC

Enabling Snappy compression on Parquet

Hi,

I got some hive tables in Parquet format and I am trying to find out how
best to enable compression.

Done a bit of searching and the information is a bit scattered but I found
I can use this hive property to enable compression.It needs to be set
before doing an insert.

set parquet.compression=SNAPPY;

What other formats are supported?
How can I verify a file is compressed and with what algorithm?

Thanks,
Patrick

Re: Enabling Snappy compression on Parquet

Posted by Patrick Duin <pa...@gmail.com>.
Thanks both for explaining!

Snappy is doing fine for me at the moment but I was curious about the other
options.

I'll have look at the parquet tool and see if that can help me a bit as
well.



Op wo 22 aug. 2018 om 08:05 schreef Jörn Franke <jo...@gmail.com>:

> No parquet and orc have internal compression which must be used over the
> external compression that you are referring to.
>
>  Internal compression can be decompressed in parallel which is
> significantly faster. Internally parquet supports only snappy, gzip,lzo,
> brotli (2.4.), lz4 (2.4), zstd (2.4).
>
> On 22. Aug 2018, at 07:33, Tanvi Thacker <ta...@gmail.com> wrote:
>
> Hi Patrick,
>
> *What are other formats supported? *
> - As far as I know, you can set any compression with any format (ORC, Text
> with snappy ,gzip etc). Are you looking for any specific format or
> compression?
>
> How can I verify a file is compressed and with what algorithm?
> -  you may check parquet-tools
> <https://github.com/apache/parquet-mr/tree/master/parquet-tools> if they
> provide any meta information about compression.
>
> And, on another note, if you are already having an uncompressed data and
> you are creating a table with snappy compression, you need to do use
> "CREATE into new_compressed table as select * from un_compressed_table" in
> order to actually compress the data
>
> Regards,
> Tanvi Thacker
>
> On Fri, Aug 10, 2018 at 6:30 AM Patrick Duin <pa...@gmail.com> wrote:
>
>> Hi,
>>
>> I got some hive tables in Parquet format and I am trying to find out how
>> best to enable compression.
>>
>> Done a bit of searching and the information is a bit scattered but I
>> found I can use this hive property to enable compression.It needs to be set
>> before doing an insert.
>>
>> set parquet.compression=SNAPPY;
>>
>> What other formats are supported?
>> How can I verify a file is compressed and with what algorithm?
>>
>> Thanks,
>> Patrick
>>
>

Re: Enabling Snappy compression on Parquet

Posted by Jörn Franke <jo...@gmail.com>.
No parquet and orc have internal compression which must be used over the external compression that you are referring to.

 Internal compression can be decompressed in parallel which is significantly faster. Internally parquet supports only snappy, gzip,lzo, brotli (2.4.), lz4 (2.4), zstd (2.4).

> On 22. Aug 2018, at 07:33, Tanvi Thacker <ta...@gmail.com> wrote:
> 
> Hi Patrick,
> 
> What are other formats supported? 
> - As far as I know, you can set any compression with any format (ORC, Text with snappy ,gzip etc). Are you looking for any specific format or compression?
> 
> How can I verify a file is compressed and with what algorithm? 
> -  you may check parquet-tools if they provide any meta information about compression.
> 
> And, on another note, if you are already having an uncompressed data and you are creating a table with snappy compression, you need to do use "CREATE into new_compressed table as select * from un_compressed_table" in order to actually compress the data
> 
> Regards,
> Tanvi Thacker
> 
>> On Fri, Aug 10, 2018 at 6:30 AM Patrick Duin <pa...@gmail.com> wrote:
>> Hi,
>> 
>> I got some hive tables in Parquet format and I am trying to find out how best to enable compression.
>> 
>> Done a bit of searching and the information is a bit scattered but I found I can use this hive property to enable compression.It needs to be set before doing an insert.
>> 
>> set parquet.compression=SNAPPY;
>> 
>> What other formats are supported? 
>> How can I verify a file is compressed and with what algorithm? 
>> 
>> Thanks,
>> Patrick

Re: Enabling Snappy compression on Parquet

Posted by Tanvi Thacker <ta...@gmail.com>.
Hi Patrick,

*What are other formats supported? *
- As far as I know, you can set any compression with any format (ORC, Text
with snappy ,gzip etc). Are you looking for any specific format or
compression?

How can I verify a file is compressed and with what algorithm?
-  you may check parquet-tools
<https://github.com/apache/parquet-mr/tree/master/parquet-tools> if they
provide any meta information about compression.

And, on another note, if you are already having an uncompressed data and
you are creating a table with snappy compression, you need to do use
"CREATE into new_compressed table as select * from un_compressed_table" in
order to actually compress the data

Regards,
Tanvi Thacker

On Fri, Aug 10, 2018 at 6:30 AM Patrick Duin <pa...@gmail.com> wrote:

> Hi,
>
> I got some hive tables in Parquet format and I am trying to find out how
> best to enable compression.
>
> Done a bit of searching and the information is a bit scattered but I found
> I can use this hive property to enable compression.It needs to be set
> before doing an insert.
>
> set parquet.compression=SNAPPY;
>
> What other formats are supported?
> How can I verify a file is compressed and with what algorithm?
>
> Thanks,
> Patrick
>