You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Manish Malhotra <ma...@gmail.com> on 2012/12/07 08:17:10 UTC

Locking in HIVE : How to use locking/unlocking features using hive java API ?

Hi,

I'm building / designing a back-up and restore tool for hive data for
Disaster Recovery scenarios.

I'm trying to understand the locking behavior of HIVE that is currently
supporting ZooKeeper for locking.

My thought process if like this ( early design.)

1. Backing up the meta-data of hive.
2. Backing up the data for hive tables on s3 or hdfs or NFS
3. Restoring table(s):
    a. Only Data
    b. Schema and data

So, to achieve 1st task, this is the flow I'm thinking.

a. Check whether there is any exclusive lock on the Table, whose meta-data
needs to be backed up.
         if YES then don't do any thing, wait and retry for configured
no/frequency
         if NO: Then get the meta-data of the table and create the DDL
statement for HIVE including table / partition etc.

For 2nd task:

a. Check whether the table has any exclusive lock,
        if NOT take shared lock and start copy, once done release the
shared lock.
        if YES then then wait and retry.

For 3rd: Restoring:

a. Only Data: Check if there is any lock on the table.
                     if NO, then take the exclusive lock, insert the data
into table, release the lock.
                     if YES then wait and retry.

b. Schema and Data:

                Check if there is any lock on table/partition.
                      if NO then Drop and create table/partitions.
                      if YES then wait and retry.
                 Once schema is created:
                      take the exclusive lock, insert data, release lock.


Now I'm going to run this kind of job from my scheduler / WF engine.
I need input on following questions:

a. Is this overall approach looks good?
b. How can I take and release different locks explicitly using HIVE API.
ref: https://cwiki.apache.org/confluence/display/Hive/Locking

If I understood correctly, As per this still HIVE doesn't support locking
explicitly at API level.
Is there any plan or patch to get this done.

I saw some classes like *ZooKeeperHiveLock *etc.but need to dig further to
see, if can use these classes for locking features.

Thanks for your time and effort.

Regards,
Manish

Re: Locking in HIVE : How to use locking/unlocking features using hive java API ?

Posted by Manish Malhotra <ma...@gmail.com>.
Thanks Ruslan,

Please see my inline comments,

Why do you need metadata backup? Can't you just store all the table create
statements in an init file?

MM: Because I don't want to depend on the init script that will have all
the entries for all the tables.
And this backup tool should be independent of any application or process to
be follows like maintaining all the tables in init file.
Secondly, I want to club the metadata and data backup, so to restore data,
user can say give me User data for these dates.

 If you care about Partitions that have been created dynamically then you
can restore them from data by RECOVER PARTITIONS (if using Amazon EMR) or
an analog check command for a regular distro of Hadoop (I don't remember
what the name is).

MM: Dont want to go to EMR route, will check the hadoop/hive based way of
doing.

Cheers,
Manish

Re: Locking in HIVE : How to use locking/unlocking features using hive java API ?

Posted by Ruslan Al-Fakikh <me...@gmail.com>.
Hi Manish!

Why do you need metadata backup? Can't you just store all the table create
statements in an init file? If you care about Partitions that have been
created dynamically then you can restore them from data by RECOVER
PARTITIONS (if using Amazon EMR) or an analog check command for a regular
distro of Hadoop (I don't remember what the name is).

Ruslan


On Mon, Dec 10, 2012 at 12:48 PM, Manish Malhotra <
manish.hadoop.work@gmail.com> wrote:

> Sending again, as got no response.
>
> Can somebody from Hive dev group please review my approach and reply?
>
> Cheers,
> Manish
>
>
> On Thu, Dec 6, 2012 at 11:17 PM, Manish Malhotra <
> manish.hadoop.work@gmail.com> wrote:
>
>> Hi,
>>
>> I'm building / designing a back-up and restore tool for hive data for
>> Disaster Recovery scenarios.
>>
>> I'm trying to understand the locking behavior of HIVE that is currently
>> supporting ZooKeeper for locking.
>>
>> My thought process if like this ( early design.)
>>
>> 1. Backing up the meta-data of hive.
>> 2. Backing up the data for hive tables on s3 or hdfs or NFS
>> 3. Restoring table(s):
>>     a. Only Data
>>     b. Schema and data
>>
>> So, to achieve 1st task, this is the flow I'm thinking.
>>
>> a. Check whether there is any exclusive lock on the Table, whose
>> meta-data needs to be backed up.
>>          if YES then don't do any thing, wait and retry for configured
>> no/frequency
>>          if NO: Then get the meta-data of the table and create the DDL
>> statement for HIVE including table / partition etc.
>>
>> For 2nd task:
>>
>> a. Check whether the table has any exclusive lock,
>>         if NOT take shared lock and start copy, once done release the
>> shared lock.
>>         if YES then then wait and retry.
>>
>> For 3rd: Restoring:
>>
>> a. Only Data: Check if there is any lock on the table.
>>                      if NO, then take the exclusive lock, insert the data
>> into table, release the lock.
>>                      if YES then wait and retry.
>>
>> b. Schema and Data:
>>
>>                 Check if there is any lock on table/partition.
>>                       if NO then Drop and create table/partitions.
>>                       if YES then wait and retry.
>>                  Once schema is created:
>>                       take the exclusive lock, insert data, release lock.
>>
>>
>> Now I'm going to run this kind of job from my scheduler / WF engine.
>> I need input on following questions:
>>
>> a. Is this overall approach looks good?
>> b. How can I take and release different locks explicitly using HIVE API.
>> ref: https://cwiki.apache.org/confluence/display/Hive/Locking
>>
>> If I understood correctly, As per this still HIVE doesn't support locking
>> explicitly at API level.
>> Is there any plan or patch to get this done.
>>
>> I saw some classes like *ZooKeeperHiveLock *etc.but need to dig further
>> to see, if can use these classes for locking features.
>>
>> Thanks for your time and effort.
>>
>> Regards,
>> Manish
>>
>>
>>
>

Re: Locking in HIVE : How to use locking/unlocking features using hive java API ?

Posted by Manish Malhotra <ma...@gmail.com>.
Sending again, as got no response.

Can somebody from Hive dev group please review my approach and reply?

Cheers,
Manish


On Thu, Dec 6, 2012 at 11:17 PM, Manish Malhotra <
manish.hadoop.work@gmail.com> wrote:

> Hi,
>
> I'm building / designing a back-up and restore tool for hive data for
> Disaster Recovery scenarios.
>
> I'm trying to understand the locking behavior of HIVE that is currently
> supporting ZooKeeper for locking.
>
> My thought process if like this ( early design.)
>
> 1. Backing up the meta-data of hive.
> 2. Backing up the data for hive tables on s3 or hdfs or NFS
> 3. Restoring table(s):
>     a. Only Data
>     b. Schema and data
>
> So, to achieve 1st task, this is the flow I'm thinking.
>
> a. Check whether there is any exclusive lock on the Table, whose meta-data
> needs to be backed up.
>          if YES then don't do any thing, wait and retry for configured
> no/frequency
>          if NO: Then get the meta-data of the table and create the DDL
> statement for HIVE including table / partition etc.
>
> For 2nd task:
>
> a. Check whether the table has any exclusive lock,
>         if NOT take shared lock and start copy, once done release the
> shared lock.
>         if YES then then wait and retry.
>
> For 3rd: Restoring:
>
> a. Only Data: Check if there is any lock on the table.
>                      if NO, then take the exclusive lock, insert the data
> into table, release the lock.
>                      if YES then wait and retry.
>
> b. Schema and Data:
>
>                 Check if there is any lock on table/partition.
>                       if NO then Drop and create table/partitions.
>                       if YES then wait and retry.
>                  Once schema is created:
>                       take the exclusive lock, insert data, release lock.
>
>
> Now I'm going to run this kind of job from my scheduler / WF engine.
> I need input on following questions:
>
> a. Is this overall approach looks good?
> b. How can I take and release different locks explicitly using HIVE API.
> ref: https://cwiki.apache.org/confluence/display/Hive/Locking
>
> If I understood correctly, As per this still HIVE doesn't support locking
> explicitly at API level.
> Is there any plan or patch to get this done.
>
> I saw some classes like *ZooKeeperHiveLock *etc.but need to dig further
> to see, if can use these classes for locking features.
>
> Thanks for your time and effort.
>
> Regards,
> Manish
>
>
>