You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Juraj jiv <fa...@gmail.com> on 2015/08/18 17:28:13 UTC

Hive 12 - CDH 5.0.1 - many small files when using ORC table

Hello all,

i have question about ORC table format. We use it as for our datastore
tables but during maintenance i noticed there is many small files inside
tables which I presume doesn't contains any data. They are only 43bytes in
size and they takes around 70% of all files inside table folder.

For example (grep 43 bytes is size and other):

hadoop@hadoopnn:~$ hdfs dfs -du -h
/user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 |
grep "^43 " | wc -l
7448
hadoop@hadoopnn:~$ hdfs dfs -du -h
/user/hive/warehouse/dwh.db/<table>/date_report_start_part=2015-07-30 |
grep -v "^43 " | wc -l
4712

Why is that? Why is there those many 43bytes files?

Ascii content of the files is, which i guess is just ORC header:
0@▒▒▒"
      ▒▒ORC

hive version:
0.12.0+cdh5.0.1+315     1.cdh5.0.1.p0.31     CDH 5

Thanks
JV