You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Gautam <ga...@gmail.com> on 2016/06/22 09:00:57 UTC

Hive Partition Restatement ..

Hello,

           I'v trying to solve an ETL problem using Hive wherein a
partition in a Hive table needs to be restated on account of delayed data.
This means a new version of an already existing partition needs to be
introduced to the table. I need to do this while serving queries on that
table which could be reading the previous version of the partition. The
intended behaviour is to allow current running queries finish with reading
previous partition version and new queries pick up the new partition
version. The data is in Parquet, which shouldn't really affect the
implementation. Moving directories causes MR/Tez jobs that are reading it
to fail.

Have folks had experience with such a use case? Are there things in Hive I
can leverage instead of having to implement the ETL myself?

One approach i'm looking at  is to never move partition directories. Only
introduce new directories as new versions of the partition and point the
table partition location to this new directory. Any currently running query
would continue reading from previous version directory since that was not
moved from it's original location.

thanks,
-Gautam.