You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Rajesh Balamohan <ra...@gmail.com> on 2013/02/05 01:17:04 UTC

RCFile performance

Hi Experts,

I have a large file with 300+ columns. In order to query only few rows
efficiently, I am using RCFile format in Hive.

I have tried setting the RCFile rowgroup size from default size till 32 MB.

ex: set hive.io.rcfile.record.buffer.size = 134217728;

However, I do not see major changes in the amount of HDFS data scanned.
Moreover, the amount of data scanned with RCFile is not significantly
different from row based file.

Are there any other parameters which needs to be set for scanning only the
relevant fields in RCFile. Is there anything obvious I am missing?

Any pointers would be appreciated.


-- 
~Rajesh.B