You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jason Michael <jm...@videoegg.com> on 2009/07/28 21:47:40 UTC

Best way to duplicate a table?

I'd like to duplicate a very large, partitioned table in Hive, preserving all data and partitions.  What's the most efficient way to do this?

Re: Best way to duplicate a table?

Posted by Zheng Shao <zs...@gmail.com>.
If there are multiple partition keys:

INSERT OVERWRITE ABC_COPY PARTITION(partkey1 = "$partkey1", partkey2 =
"$partkey2")
SELECT `(partkey1|partkey2)?+.+` FROM ABC WHERE partkey = "$partkey";

See https://issues.apache.org/jira/browse/HIVE-420 for details on this.


If you already know the name of all columns, it does not hurt to list
all of them (except the partition columns).

Zheng

On Tue, Jul 28, 2009 at 1:36 PM, Zheng Shao<zs...@gmail.com> wrote:
> You can do something like this:
>
>  CREATE table ABC_COPY LIKE ABC;
>
>  SHOW PARTITIONS ABC;
>
>  for each partition:
>
>  INSERT OVERWRITE ABC_COPY PARTITION(partkey = "$partkey")
>  SELECT `(partkey)?+.+` FROM ABC WHERE partkey = "$partkey";
>
>  That back-quoted `(partkey)?+.+` is a special regular expression
> that matches every column except the partkey.
>
> Zheng
>
> On Tue, Jul 28, 2009 at 12:47 PM, Jason Michael<jm...@videoegg.com> wrote:
>> I’d like to duplicate a very large, partitioned table in Hive, preserving
>> all data and partitions.  What’s the most efficient way to do this?
>
>
>
> --
> Yours,
> Zheng
>



-- 
Yours,
Zheng

Re: Best way to duplicate a table?

Posted by Zheng Shao <zs...@gmail.com>.
You can do something like this:

  CREATE table ABC_COPY LIKE ABC;

  SHOW PARTITIONS ABC;

  for each partition:

  INSERT OVERWRITE ABC_COPY PARTITION(partkey = "$partkey")
  SELECT `(partkey)?+.+` FROM ABC WHERE partkey = "$partkey";

  That back-quoted `(partkey)?+.+` is a special regular expression
that matches every column except the partkey.

Zheng

On Tue, Jul 28, 2009 at 12:47 PM, Jason Michael<jm...@videoegg.com> wrote:
> I’d like to duplicate a very large, partitioned table in Hive, preserving
> all data and partitions.  What’s the most efficient way to do this?



-- 
Yours,
Zheng