You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Eric Chu <ec...@rocketfuel.com> on 2013/10/03 22:43:43 UTC

Insert into ORC partition from RCFile partition

Hi,

We're trying to convert our fact tables partitioned by date from RCFile to
ORCFile. Since they are really big in size and we retain the last N days
(partitions) of data, we don't want to re-process existing partitions.
There are two approaches using Hive ALTER and INSERT commands that I'm
comparing.

Approach A:

   1. Set fileformat of existing table to ORC (from RCFile)
   2. Add a new partition (default fileformat for that partition is ORC)
   3. Insert into new partition from source partition in another table (in
   RCFile)
   4. All partitions and table have ORC fileformat after N days

Approach B:

   1. Keep existing table fileformat as RCFile.
   2. Add a new partition (default fileformat for that partition is
   RCFile). Set the file format of this new partition to ORC
   3. Insert into new partition from source partition in another table (in
   RCFile)
   4. All partitions have ORC fileformat after N days, but table fileformat
   is still RCFile

Specifically, while A works as expected, when we tried B, we noticed that
fileformat for the new partition is ORC after 2), but switches back to
RCFile after 3). Is that a bug, or is it expected b/c of the different
fileformat for at table and partition level?

Also, if we do A, what's the expected behavior for queries that span across
RCFile and ORC partitions (with the table fileformat being ORC)? Are there
known issues? Initial testing seems fine, but I just want to check if
others have done this before.

Thanks a lot!

Eric