You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2019/04/04 17:25:00 UTC

[jira] [Commented] (HIVE-21506) Memory based TxnHandler implementation

    [ https://issues.apache.org/jira/browse/HIVE-21506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810130#comment-16810130 ] 

Todd Lipcon commented on HIVE-21506:
------------------------------------

Does this imply that you'd move the transaction manager out of the HMS into a standalone daemon? Then we need to worry about HA for that daemon as well as what happens to locks if the daemon crashes, right? It's probably possible but would be quite a bit of work.

Another potential design for scaling lock management is to use revocable "sticky" locks. I think there's a decent amount of literature from the shared-nothing DBMS world on this technique. With some quick googling I found that the Frangipani DFS paper has some discussion of the technique, but I think we can probably find a more detailed description of it elsewhere.

At a very high level, the idea would look something like this for shared locks:
- when a user wants to acquire a shared lock, the HMS checks if any other transaction already has the same shared lock held. If not, we acquire the lock in the database, and associate it not with any particular transaction, but with the HMS's lock manager itself. The HMS then becomes responsible for heartbeating the lock while it's held. In essence, the HMS has now taken out a "lease" on this lock.
- if the HMS already has this shared lock held on behalf of another transaction, increment an in-memory reference count.
- when a lock is released, if the refcount > 1, simply decrement the in-memory ref count.
- if the lock is released and the refcount goes to 0, the HMS can be lazy about releasing the lock in the DB (either forever or for some amount of time). In essence the lock is "sticky".

Given that most locks are shared locks, this should mean that the majority of locking operations do not require any trip to the RDBMS and can be processed in memory, but are backed persistently by a lock in the DB.

If a caller wants to acquire an exclusive lock which conflicts with an existing shared lock in the DB, we need to implement revocation:
- add a record indicating that there's a waiter on the lock, blocked on the existing shared lock(s)
- send a revocation request to any HMS(s) holding the shared locks. In the case that they're just being held in "sticky" mode, they can be revoked immediately. If there is actually an active refcount, this will just enforce that new shared lockers need to wait instead of incrementing the refcount.
- in the case that an HMS holding a sticky lock has crashed or partitioned, we need to wait out the "lease" to expire before we can revoke its lock.

There's some trickiness to think through about client->HMS "stickiness" in HA scenarios, as well. Right now, the lock requests may be sent to a different HMS than the 'commit/abort' request for a transaction, but that could be difficult with "sticky locks".


All of the above is a bit complicated, so maybe a first step is to just look at some kind of stress test/benchmark and understand if we can do any changes to the way we manage the RDBMS table to be more efficient? Perhaps if we specialize the implementation for a specific RDBMS (eg postgres) we could get some benefits here (eg stored procedures to avoid round trips if that turns out to be the bottleneck)

> Memory based TxnHandler implementation
> --------------------------------------
>
>                 Key: HIVE-21506
>                 URL: https://issues.apache.org/jira/browse/HIVE-21506
>             Project: Hive
>          Issue Type: New Feature
>          Components: Transactions
>            Reporter: Peter Vary
>            Priority: Major
>
> The current TxnHandler implementations are using the backend RDBMS to store every Hive lock and transaction data, so multiple TxnHandler instances can run simultaneously and can serve requests. The continuous communication/locking done on the RDBMS side puts serious load on the backend databases also restricts the possible throughput.
> If it is possible to have only a single active TxnHandler (with the current design HMS) instance then we can provide much better (using only java based locking) performance. We still have to store the committed write transactions to the RDBMS (or later some other persistent storage), but other lock and transaction operations could remain memory only.
> The most important drawbacks with this solution is that we definitely lose scalability when one instance of TxnHandler is no longer able to serve the requests (see NameNode), and fault tolerance in the sense that the ongoing transactions should be terminated when the TxnHandler is failed. If this drawbacks are acceptable in certain situations the we can provide better throughput for the users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)