You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Enis Soztutar (JIRA)" <ji...@apache.org> on 2013/06/19 01:03:21 UTC

[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId

    [ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687345#comment-13687345 ] 

Enis Soztutar commented on HBASE-8763:
--------------------------------------

I also realized that, if I am not wrong, current multi row atomic mutations is broken with scanners + RS crashes. Since mvcc is not persisted, if a multi put changing r1,r100 happens where mvcc = 100, the scanner with mvcc = 90 will not see r1. Just after passing r1, the scanner might fail, and the new scanner in the new region server will get another mvcc, but since the changes for previous multi put has been persisted (in log recovery), the scanner will happily see r100 mutation. 

The underlying reason  multi puts + scanner for a region has to see a snapshot of the region, but mvcc is ephemeral. This can also be fixed by saving the seqId's in hfiles, and when a region scanner is opened, the client obtains the scanner seqId (mvcc read point) and uses this number in case of failover.
                
> [BRAINSTORM] Combine MVCC and SeqId
> -----------------------------------
>
>                 Key: HBASE-8763
>                 URL: https://issues.apache.org/jira/browse/HBASE-8763
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Enis Soztutar
>             Fix For: 0.98.0
>
>
> HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. 
> Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. 
> We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira