You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Daniel,Wu" <ha...@163.com> on 2011/08/12 05:03:11 UTC

how to make the data in one table available to multiple tables?

  We have a table  name as sales, which is partitioned by period (YYYYMMDD), and we also need a table ly_sales(last year sales). To speed up the query, we don't use a view to join sales with last year mapping table( e.g 20110603 mapped to 20100603) for performance viewpoint. However we used the view which join the mapping table and then materialize the data into the table ly_sales. But since the data in hive is simple under some directory, such as
   sales/
         20110603/
         20110602/
         20110601/

I am wondering whether we can create an empty table ly_sales, and the structure will be like
   ly_sales/
         20100603/
         20100602/
         20100601/    

And then use soft link to link 20100603 under ly_sales to 20110603 under sales. Do you think it is doable?

RE: how to make the data in one table available to multiple tables?

Posted by Steven Wong <sw...@netflix.com>.
One way is to create ly_sales as an external table and use ADD PARTITION ... LOCATION to point to the sales partitions. Unlike a non-external ("managed") table, an external table does not own its data, meaning when you DROP an external table or one of its partitions, the Hive metadata is deleted but the data is not deleted. See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL for more info.


From: Daniel,Wu [mailto:hadoop_wu@163.com]
Sent: Thursday, August 11, 2011 8:03 PM
To: hive
Subject: how to make the data in one table available to multiple tables?

  We have a table  name as sales, which is partitioned by period (YYYYMMDD), and we also need a table ly_sales(last year sales). To speed up the query, we don't use a view to join sales with last year mapping table( e.g 20110603 mapped to 20100603) for performance viewpoint. However we used the view which join the mapping table and then materialize the data into the table ly_sales. But since the data in hive is simple under some directory, such as
   sales/
         20110603/
         20110602/
         20110601/

I am wondering whether we can create an empty table ly_sales, and the structure will be like
   ly_sales/
         20100603/
         20100602/
         20100601/

And then use soft link to link 20100603 under ly_sales to 20110603 under sales. Do you think it is doable?