You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Juraj jiv <fa...@gmail.com> on 2015/08/18 17:28:13 UTC
Hive 12 - CDH 5.0.1 - many small files when using ORC table
Hello all,
i have question about ORC table format. We use it as for our datastore
tables but during maintenance i noticed there is many small files inside
tables which I presume doesn't contains any data. They are only 43bytes in
size and they takes around 70% of all files inside table folder.
For example (grep 43 bytes is size and other):
hadoop@hadoopnn:~$ hdfs dfs -du -h
/user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 |
grep "^43 " | wc -l
7448
hadoop@hadoopnn:~$ hdfs dfs -du -h
/user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 |
grep -v "^43 " | wc -l
4712
Why is that? Why is there those many 43bytes files?
Ascii content of the files is, which i guess is just ORC header:
0@▒▒▒"
▒▒ORC
hive version:
0.12.0+cdh5.0.1+315 1.cdh5.0.1.p0.31 CDH 5
Thanks
JV