You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jason Michael <jm...@videoegg.com> on 2009/07/30 01:07:35 UTC

Create external table over top of existing, partitioned table

Hello,

I noticed the following behavior when trying to create an external table over top of an existing, partitioned table:

<snip>
hive> create table foo (id1 int, id2 int) partitioned by(p1 int)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' stored as textfile;
OK
Time taken: 0.12 seconds
hive> load data local inpath '/temp/foo1' overwrite into table foo partition(p1=1);
Copying data from file:/temp/foo1
Loading data to table foo partition {p1=1}
OK
Time taken: 1.493 seconds
hive> load data local inpath '/temp/foo2' overwrite into table foo partition(p1=2);
Copying data from file:/temp/foo2
Loading data to table foo partition {p1=2}
OK
Time taken: 1.218 seconds
hive> select * from foo;
OK
1    1    1
2    1    1
3    2    2
4    3    2
Time taken: 0.781 seconds
hive> create external table foo2 (id1 int, id2 int) partitioned by(p1 int)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' stored as textfile location '/user/hive/warehouse/foo/';
OK
Time taken: 0.161 seconds
hive> select * from foo2;
OK
Time taken: 0.398 seconds
</snip>

I had hoped that the last select query would return the same results as the first one since, I thought, both tables would be pointing at the same data.  What I'm attempting to do is create what amounts to a symlink or table alias in Hive.  In our warehouse, we have several large tables which get populated with different versions of an ETL tool.  We hoped to have, at any given time, a single table name that was just an alias for whichever of the large tables we considered 'live' at the time.  That way users of the system would only have to remember one set of table names to work with.  Is something like this possible in Hive?  We noted that the above approach worked fine with non-partitioned tables, presumably because the data files are located exactly where the location parameter has them rather than underneath a layer of partition subdirs.

Jason

Re: Create external table over top of existing, partitioned table

Posted by Prasad Chakka <pc...@facebook.com>.
Hive does not produce any results because the external table does not have any partitions defined. Partitions are not inferred from file system but are created when 'load data local...' or 'alter table ... add partition ..' commands are executed. So you need to execute the later cmd on the external table along with the load command on the regular table.

When Hive supports views or table aliases, this step will not needed but until then you need to do this extra step.

________________________________
From: Jason Michael <jm...@videoegg.com>
Reply-To: <hi...@hadoop.apache.org>
Date: Wed, 29 Jul 2009 16:07:35 -0700
To: hive mailing list <hi...@hadoop.apache.org>
Subject: Create external table over top of existing, partitioned table

Hello,

I noticed the following behavior when trying to create an external table over top of an existing, partitioned table:

<snip>
hive> create table foo (id1 int, id2 int) partitioned by(p1 int)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' stored as textfile;
OK
Time taken: 0.12 seconds
hive> load data local inpath '/temp/foo1' overwrite into table foo partition(p1=1);
Copying data from file:/temp/foo1
Loading data to table foo partition {p1=1}
OK
Time taken: 1.493 seconds
hive> load data local inpath '/temp/foo2' overwrite into table foo partition(p1=2);
Copying data from file:/temp/foo2
Loading data to table foo partition {p1=2}
OK
Time taken: 1.218 seconds
hive> select * from foo;
OK
1    1    1
2    1    1
3    2    2
4    3    2
Time taken: 0.781 seconds
hive> create external table foo2 (id1 int, id2 int) partitioned by(p1 int)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES TERMINATED BY '\012' stored as textfile location '/user/hive/warehouse/foo/';
OK
Time taken: 0.161 seconds
hive> select * from foo2;
OK
Time taken: 0.398 seconds
</snip>

I had hoped that the last select query would return the same results as the first one since, I thought, both tables would be pointing at the same data.  What I'm attempting to do is create what amounts to a symlink or table alias in Hive.  In our warehouse, we have several large tables which get populated with different versions of an ETL tool.  We hoped to have, at any given time, a single table name that was just an alias for whichever of the large tables we considered 'live' at the time.  That way users of the system would only have to remember one set of table names to work with.  Is something like this possible in Hive?  We noted that the above approach worked fine with non-partitioned tables, presumably because the data files are located exactly where the location parameter has them rather than underneath a layer of partition subdirs.

Jason