You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by cathy zhu <cz...@gmail.com> on 2018/06/04 23:39:25 UTC
snappy compression & set parquet size not working on hive
I created a hive table, use insert select to load existing impala data to
hive table. I noticed 2 things.
1. The data size is more than twice the size of old data. Old data used
impala to do the compression.
2. No matter how large I set parquet block size, hive always generate
parquet files with similar file size.
I did this before inserting.
set hive.exec.dynamic.partition.mode=nonstrict;
SET parquet.column.index.access=true;
SET hive.merge.mapredfiles=true;
SET hive.exec.compress.output=true;
SET
mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
SET dfs.block.size=445644800;
SET parquet.block.size=445644800;
Can anyone please point out what I did wrong? I'm using version Hive 1.1.0
Thank you!