You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Schubert Zhang <zs...@gmail.com> on 2009/09/28 18:47:02 UTC
Re: Dynamic Partitioning?

We have following use case:

1. We have a periodic MapReduce job to pre-process the source data (files)
and want put the output data files into HDFS directory. The HDFS directory
is correspond to a Hive table (this table should be partitioned). The above
MapReduce job shall output data into different partitions based on data
analysis.

2. We want Hive to recognise any new raised partitions from HDFS
sub-directories under the table's root directory. And the above MapReduce
job may add new files into new created partitions or existing partitions.

3. We also need a compact/merging process to periodic compact or merge the
existing partitions to get bigger files.



On Tue, Aug 18, 2009 at 9:46 AM, Prasad Chakka <pc...@facebook.com> wrote:

> Well, the code to infer partitions from HDFS directory exists in old
> version of Hive. You need to bring that back (and possibly make some
> modifications to reflect latest code). But the work involved here is to
> disallow tables being marked as EXTERNAL and also disallow setting Partition
> properties. There may be couple of other things that need to be taken care
> of that I can’t think of right now.
>
> It doesn’t look like much.
>
> Prasad
>
> ------------------------------
> *From: *Chris Goffinet <go...@digg.com>
> *Reply-To: *<hi...@hadoop.apache.org>
> *Date: *Mon, 17 Aug 2009 18:38:40 -0700
> *To: *<hi...@hadoop.apache.org>
> *Subject: *Re: Dynamic Partitioning?
>
> How much work is involved for such a feature?
>
> -Chris
>
> On Aug 17, 2009, at 6:19 PM, Prasad Chakka wrote:
>
>  We could make this feature per table property which doesn’t have the
> extended feature set supported...
>
>
> ------------------------------
> *From: *Frederick Oko <frederick.oko@gmail.com <x-msg:
> //89/frederick.oko@gmail.com <ht...@gmail.com>> >
> *Reply-To: *<hive-user@hadoop.apache.org <x-msg:
> //89/hive-user@hadoop.apache.org <ht...@hadoop.apache.org>>
> >
> *Date: *Thu, 13 Aug 2009 02:12:54 -0700
> *To: *<hive-user@hadoop.apache.org <x-...@hadoop.apache.org>>
> >
> *Subject: *Re: Dynamic Partitioning?
>
> Actually this is what Hive originally did -- it used to trust partitions it
> discovered via HDFS -- this blind trust could be leveraged for just what you
> are requesting as partions do follow a simple directory scheme (and there is
> precedent for such out-of-band data loading). However, this blind trust
> became incompatible with extended feature set of external tables and
> per-partition schemas introduced earlier this year. The re-enabling of this
> behavior based on configuration is currently tracked as
> https://issues.apache.org/jira/browse/HIVE-493 'automatically infer
> existing partitions of table from HDFS files'.
>
> On Tue, Aug 11, 2009 at 11:15 AM, Chris Goffinet <goffinet@digg.com<x-msg:
> //89/goffinet@digg.com <ht...@digg.com>> > wrote:
>
> Hi
>
> I was wondering if anyone has thought about the possibility of having
> dynamic partitioning in Hive? Right now you typically use LOAD DATA or ALTER
> TABLE to add new partitions. It would be great for applications like Scribe
> that can load data into HDFS, could just place the data into the correct
> folder structure for your partitions on HDFS. Has anyone investigated this?
> What is everyone else doing in regards to things like this? It seems a
> little error prone to have a cron job run everyday adding new partitions. It
> might not even be possible to do dynamic partitioning since its meta data
> read. But I'd love to hear thoughts?
>
> -Chris
>
>
>
>
>
>