You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ian <li...@yahoo.com> on 2013/03/28 18:29:02 UTC

External table for hourly log files

Hi,
 
We use Hive "Insert Overwrite Directory" to copy the hourly logs to hdfs. So there are lots of directories like these:
    /my/logs/2013-03-08/01/000000_0    /my/logs/2013-03-08/02/000000_0
    /my/logs/2013-03-08/03/000000_0
    ...
 
Now we want to create external table to query the log data. So we use the "Add Partition".
    CREATE EXTERNAL TABLE testpart (logline string) PARTITIONED BY(dt string);
    ALTER TABLE testpart ADD PARTITION(dt='2013-03-08-01') LOCATION '/my/logs/2013-03-08/01';
 
This works fine. However if we want say one week worth of logs, then we need to repeat "Add Partition" 24*7 times. I'm wondering if there is other way to avoid specifying Partition statements so many times, maybe something like wildcard "2013-03-08/*"? If not, what's the general practice to handle these hourly logs?
 
Thanks

Re: External table for hourly log files

Posted by Ian <li...@yahoo.com>.
Thanks, but it seems it relates only to inserts. How can I use dynamic partition on the query?
 
So if I changed the log path to use the Hive's directory naming convention (e.g., dt=2013-03-08/hr=01), I still need to Add Partition multiple times. How can dynamic partitions help in this case? 
    CREATE EXTERNAL TABLE testpart (logline string) PARTITIONED BY(dt string, hr string);
    ALTER TABLE testpart ADD PARTITION(dt='2013-03-08', hr='01);

Thanks.
  

________________________________
 From: Sanjay Subramanian <Sa...@wizecommerce.com>
To: "user@hive.apache.org" <us...@hive.apache.org>; Ian <li...@yahoo.com> 
Sent: Thursday, March 28, 2013 10:41 AM
Subject: Re: External table for hourly log files
  

Hi  
You may want to look at Dynamic partitions 
https://cwiki.apache.org/Hive/dynamicpartitions.html 
Thanks 
sanjay 
 From: Ian <li...@yahoo.com>
Reply-To: "user@hive.apache.org" <us...@hive.apache.org>, Ian <li...@yahoo.com>
Date: Thursday, March 28, 2013 10:29 AM
To: "user@hive.apache.org" <us...@hive.apache.org>
Subject: External table for hourly log files
 
 
Hi, 

We use Hive "Insert Overwrite Directory" to copy the hourly logs to hdfs. So there are lots of directories like these: 
    /my/logs/2013-03-08/01/000000_0     /my/logs/2013-03-08/02/000000_0 
    /my/logs/2013-03-08/03/000000_0 
    ... 
  
Now we want to create external table to query the log data. So we use the "Add Partition".  
    CREATE EXTERNAL TABLE testpart (logline string) PARTITIONED BY(dt string);
    ALTER TABLE testpart ADD PARTITION(dt='2013-03-08-01') LOCATION '/my/logs/2013-03-08/01'; 

This works fine. However if we want say one week worth of logs, then we need to repeat "Add Partition" 24*7 times. I'm wondering if there is other way to avoid specifying Partition statements so many times, maybe something like wildcard "2013-03-08/*"? If not, what's the general practice to handle these hourly logs? 

Thanks 
   

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient,
 please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review
 and disclosure by the sender's Email System Administrator.

Re: External table for hourly log files

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
Hi
You may want to look at Dynamic partitions
https://cwiki.apache.org/Hive/dynamicpartitions.html
Thanks
sanjay

From: Ian <li...@yahoo.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>, Ian <li...@yahoo.com>>
Date: Thursday, March 28, 2013 10:29 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: External table for hourly log files

Hi,

We use Hive "Insert Overwrite Directory" to copy the hourly logs to hdfs. So there are lots of directories like these:
    /my/logs/2013-03-08/01/000000_0
    /my/logs/2013-03-08/02/000000_0
    /my/logs/2013-03-08/03/000000_0
    ...

Now we want to create external table to query the log data. So we use the "Add Partition".
    CREATE EXTERNAL TABLE testpart (logline string) PARTITIONED BY(dt string);
    ALTER TABLE testpart ADD PARTITION(dt='2013-03-08-01') LOCATION '/my/logs/2013-03-08/01';

This works fine. However if we want say one week worth of logs, then we need to repeat "Add Partition" 24*7 times. I'm wondering if there is other way to avoid specifying Partition statements so many times, maybe something like wildcard "2013-03-08/*"? If not, what's the general practice to handle these hourly logs?

Thanks


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.