You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2018/03/13 18:55:00 UTC

[jira] [Comment Edited] (HIVE-18940) Hive notifications serialize all write DDL operations

    [ https://issues.apache.org/jira/browse/HIVE-18940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16397463#comment-16397463 ] 

Thejas M Nair edited comment on HIVE-18940 at 3/13/18 6:54 PM:
---------------------------------------------------------------

For replication purposes, and perhaps for sentry delta updates capture as well, the EVENT_ID has to be in the order of commit.
For example, if the EVENT_ID 5 has been written and then consumed by replication program, it would then only look for rows where EVENT_ID > 5. So if there are two concurrent transactions writing new rows and one of them with EVENT_ID 5 commits before EVENT_ID 4, then EVENT_ID 4 would get missed.
Holes would be OK, what is not OK is that for another application to see row with EVENT_ID 5 getting visible before one with EVENT_ID 4.

DB generated timestamp has same issue, unless it can represent the commit sequence.

I believe the use of database autoincrement field was considered in HIVE-16886 and it wasn't meeting this criteria. 

cc [~anishek]


was (Author: thejas):
For replication purposes, and perhaps for sentry delta updates capture as well, the EVENT_ID has to be in the order of commit.
For example, if the EVENT_ID 5 has been written and then consumed by replication program, it would then only look for rows where EVENT_ID > 5. So if there are two concurrent transactions writing new rows and one of them with EVENT_ID 5 commits before EVENT_ID 4, then EVENT_ID 4 would get missed.
Holes would be OK, what is not OK is that for another application to see row with EVENT_ID 5 getting visible before one with EVENT_ID 4.

I believe the use of database autoincrement field was considered in HIVE-16886 and it wasn't meeting this criteria. 

cc [~anishek]

> Hive notifications serialize all write DDL operations
> -----------------------------------------------------
>
>                 Key: HIVE-18940
>                 URL: https://issues.apache.org/jira/browse/HIVE-18940
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 3.0.0
>            Reporter: Alexander Kolbasov
>            Priority: Major
>
> The implementation of DbNotificationListener uses a single row to store current notification ID and uses {{SELECT FOR UPDATE}} to lock the row. This serializes all write DDL operations which isn't good.
> We should consider using database auto-increment for notification ID instead. Especially on mMySQL/innoDb it is supported natively with relatively light-weight locking. 
> This creates potential issue for consumers though because such IDs may have holes. There are two types of holes - transient hole for a transaction which have not committed yet and will be committed shortly and permanent holes for transactions that fail. Consumers need to deal with it. It may be useful to add DB-generated timestamp as well to assist in recovery from holes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)