You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Koert Kuipers <ko...@tresata.com> on 2012/02/08 17:04:18 UTC

external partitioned table

hello all,

we have an external partitioned table in hive.

we add to this table by having map-reduce jobs (so not from hive) create
new subdirectories with the right format (partitionid=partitionvalue).

however hive doesn't pick them up automatically. we have to go into hive
shell and run "alter table sometable add partition
(partitionid=partitionvalue)". to make matter worse hive doesnt really lend
itself to running such an add-partition-operation from java (or for that
matter: hive doesn't lend itself to any easy programmatic manipulations...
grrr. but i will stop now before i go on a a rant).

any suggestions how to approach this? thanks!

best, koert

Re: external partitioned table

Posted by be...@yahoo.com.
Hi Koert
        As you are creating dir/sub dirs using mapreduce jobs out of hive, hive is unaware of these sub dirs. There is no other way in such cases other than an add partition DDL to register the dir with a hive partition. 
If you are using oozie or shell to trigger your jobs,you can accomplish  it as
-use java to come up with the correct add partition statement and write those statement(s) into a file 
-execute the file using hive -f <fileName>

Hope it helps!..


Regards
Bejoy K S

From handheld, Please excuse typos.

-----Original Message-----
From: Koert Kuipers <ko...@tresata.com>
Date: Wed, 8 Feb 2012 11:04:18 
To: <us...@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: external partitioned table

hello all,

we have an external partitioned table in hive.

we add to this table by having map-reduce jobs (so not from hive) create
new subdirectories with the right format (partitionid=partitionvalue).

however hive doesn't pick them up automatically. we have to go into hive
shell and run "alter table sometable add partition
(partitionid=partitionvalue)". to make matter worse hive doesnt really lend
itself to running such an add-partition-operation from java (or for that
matter: hive doesn't lend itself to any easy programmatic manipulations...
grrr. but i will stop now before i go on a a rant).

any suggestions how to approach this? thanks!

best, koert


Re: external partitioned table

Posted by Roberto Congiu <ro...@openx.com>.
Hi Koert,
we have a similar situation and this is what we did.
In our case, the partitions correspond to dates. We also have multiple
external tables set up this way.
The upstream process updates a status file with the earliest and
latest date available. I scan the DFS for new partitions (scan program
should know in advance how deep it should go to avoid excessive
namenode calls) and load any new partition that is not in HIVE and
it's less or equal than what the status file says.
This is done to avoid loading a partition that the upstream service is
not done processing (for this reason also the ALTER TABLE RECOVER
PARTITIONS won't work in our case unless we change how the upstream
process stores data).
Also, I drop the partitions that are older than the lower bound on the
status file. That's because the upstream system deletes older data,
and you don't want a stale partition in hive that points to data that
is not there anymore.
For all these operations I use hiveserver, the thrift interface. We've
been using this system for 2 years with this sync running hourly.
If it's of general interest, I can share the code, my company is open
source friendly.

Roberto

On Wed, Feb 8, 2012 at 9:23 AM, Mark Grover <mg...@oanda.com> wrote:
> Hi Koert,
> That's because Hive metastore doesn't know about the partitions you added. I was in a similar situation but I use Amazon EMR and in their version of Hive, one can run the command "alter table src recover partitions" that goes through the directory structure of table (src, in this case) and adds the partitions in Hive metastore. You can look into getting a patch for that, if at all available.
>
> Instead of running the command through shell, you could try to connect to Hive from within Java (I personally use the Hive JDBC driver to do so) and then issue the desired add partition command but it might be an overkill if this the only reason you want to connect to Hive from Java.
>
> Mark
>
> Mark Grover, Business Intelligence Analyst
> OANDA Corporation
>
> www: oanda.com www: fxtrade.com
> e: mgrover@oanda.com
>
> "Best Trading Platform" - World Finance's Forex Awards 2009.
> "The One to Watch" - Treasury Today's Adam Smith Awards 2009.
>
>
> ----- Original Message -----
> From: "Koert Kuipers" <ko...@tresata.com>
> To: user@hive.apache.org
> Sent: Wednesday, February 8, 2012 11:04:18 AM
> Subject: external partitioned table
>
> hello all,
>
> we have an external partitioned table in hive.
>
> we add to this table by having map-reduce jobs (so not from hive) create new subdirectories with the right format (partitionid=partitionvalue).
>
> however hive doesn't pick them up automatically. we have to go into hive shell and run "alter table sometable add partition (partitionid=partitionvalue)". to make matter worse hive doesnt really lend itself to running such an add-partition-operation from java (or for that matter: hive doesn't lend itself to any easy programmatic manipulations... grrr. but i will stop now before i go on a a rant).
>
> any suggestions how to approach this? thanks!
>
> best, koert
>
>

Re: external partitioned table

Posted by Mark Grover <mg...@oanda.com>.
Hi Koert,
That's because Hive metastore doesn't know about the partitions you added. I was in a similar situation but I use Amazon EMR and in their version of Hive, one can run the command "alter table src recover partitions" that goes through the directory structure of table (src, in this case) and adds the partitions in Hive metastore. You can look into getting a patch for that, if at all available.

Instead of running the command through shell, you could try to connect to Hive from within Java (I personally use the Hive JDBC driver to do so) and then issue the desired add partition command but it might be an overkill if this the only reason you want to connect to Hive from Java.

Mark

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgrover@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


----- Original Message -----
From: "Koert Kuipers" <ko...@tresata.com>
To: user@hive.apache.org
Sent: Wednesday, February 8, 2012 11:04:18 AM
Subject: external partitioned table

hello all, 

we have an external partitioned table in hive. 

we add to this table by having map-reduce jobs (so not from hive) create new subdirectories with the right format (partitionid=partitionvalue). 

however hive doesn't pick them up automatically. we have to go into hive shell and run "alter table sometable add partition (partitionid=partitionvalue)". to make matter worse hive doesnt really lend itself to running such an add-partition-operation from java (or for that matter: hive doesn't lend itself to any easy programmatic manipulations... grrr. but i will stop now before i go on a a rant). 

any suggestions how to approach this? thanks! 

best, koert