You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jeffrey Zhong (JIRA)" <ji...@apache.org> on 2013/12/03 01:02:36 UTC

[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId

    [ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837109#comment-13837109 ] 

Jeffrey Zhong commented on HBASE-8763:
--------------------------------------

Today I had some discussion with [~enis] and [~tedyu@apache.org] on this topic and found it might be possible to handle the JIRA issue in a simpler way. Below are the steps:

1) Memstore insert using long.max as the initial write number
2) append no sync
3) sync
4) update WriteEntry's write number to the sequence number returned from Step 2
5) CompleteMemstoreInsert. In this step, make current read point to be >= the sequence number from Step 2. The reasoning behind this is that once we sync till the sequence number, all changes with small sequence numbers are already synced into WAL. Therefore, we should be able to bump up read number to the last sequence number synced.

Currently, we maintain an internal queue which might defer the read point bump up if transactions complete order is different than that of MVCC internal write queue. 

By doing above, it's possible to remove the logics maintaining writeQueue so it means we can remove two locking and one queue loop in write code path. Sounds too good to be true :-). Let me try to write a quick patch and run it against unit tests to see if the idea could fly.


> [BRAINSTORM] Combine MVCC and SeqId
> -----------------------------------
>
>                 Key: HBASE-8763
>                 URL: https://issues.apache.org/jira/browse/HBASE-8763
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Enis Soztutar
>         Attachments: hbase-8763_wip1.patch
>
>
> HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. 
> Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. 
> We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding.



--
This message was sent by Atlassian JIRA
(v6.1#6144)