You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Elliot West <te...@gmail.com> on 2015/04/17 17:05:54 UTC

Transactional table read lifecycle

Hi, I'm working on a Cascading Tap that reads the data that backs a
transactional Hive table. I've successfully utilised the in-built
OrcInputFormat functionality to read and merge the deltas with the base and
optionally pull in the RecordIdentifiers. However, I'm now considering what
other steps I may need to take to collaborate with an active Hive instance
that could be writing to or compacting the table as I'm trying to read it.

I recently became aware of the need to obtain a list of valid transaction
IDs but now wonder if I must also acquire a read lock for the table? I'm
thinking that the set of interactions for reading this data may look
something like:


   1. Obtain ValidTxnList from the meta store:
   org.apache.hadoop.hive.metastore.IMetaStoreClient.getValidTxns()

   2. Set the ValidTxnList in the Configuration:
   conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.toString());

   3. Aquire a read lock:
   org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)

   4. Use OrcInputFormat to read the data

   5. Finally, release the lock:
   org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)


Can you advise on whether the lock is needed, whether this is the correct
way of managing the lock, and whether there are any other steps I need take
to appropriately interact with the data underpinning a 'live' transactional
table?

Thanks - Elliot.

Re: Transactional table read lifecycle

Posted by Alan Gates <al...@gmail.com>.
Whether you obtain a read lock depends on the guarantees you want to 
make to your readers.  Obtaining the lock will do a couple of things 
your uses might want:
1) It will prevent DDL statements such as DROP TABLE from removing the 
data while they are reading it.
2) It will prevent the compactor from removing the versions of the delta 
files they are reading.

The other step you'll want is to heartbeat the lock.  To avoid dead 
clients holding locks forever the DbLockManager times them out after 300 
seconds (default, it's configurable).  To avoid this you'll need to call 
IMetaStoreClient.heartbeat on a regular basis.

Alan.

> Elliot West <ma...@gmail.com>
> April 17, 2015 at 8:05
> Hi, I'm working on a Cascading Tap that reads the data that backs a 
> transactional Hive table. I've successfully utilised the in-built 
> OrcInputFormat functionality to read and merge the deltas with the 
> base and optionally pull in the RecordIdentifiers. However, I'm now 
> considering what other steps I may need to take to collaborate with an 
> active Hive instance that could be writing to or compacting the table 
> as I'm trying to read it.
>
> I recently became aware of the need to obtain a list of valid 
> transaction IDs but now wonder if I must also acquire a read lock for 
> the table? I'm thinking that the set of interactions for reading this 
> data may look something like:
>
>  1. Obtain ValidTxnList from the meta store:
>     org.apache.hadoop.hive.metastore.IMetaStoreClient.getValidTxns()
>
>  2. Set the ValidTxnList in the Configuration:
>     conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.toString());
>
>  3. Aquire a read lock:
>     org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)
>
>  4. Use OrcInputFormat to read the data
>
>  5. Finally, release the lock:
>     org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)
>
> Can you advise on whether the lock is needed, whether this is the 
> correct way of managing the lock, and whether there are any other 
> steps I need take to appropriately interact with the data underpinning 
> a 'live' transactional table?
>
> Thanks - Elliot.
>