You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Prakash Khemani (JIRA)" <ji...@apache.org> on 2010/09/03 06:57:33 UTC

[jira] Created: (HBASE-2957) Release row lock when waiting for wal-sync

Release row lock when waiting for wal-sync
------------------------------------------

                 Key: HBASE-2957
                 URL: https://issues.apache.org/jira/browse/HBASE-2957
             Project: HBase
          Issue Type: Improvement
          Components: regionserver, wal
    Affects Versions: 0.20.0
            Reporter: Prakash Khemani


Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?

I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)

I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.

Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "Prakash Khemani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907314#action_12907314 ] 

Prakash Khemani commented on HBASE-2957:
----------------------------------------

Yes, delayed syncs should almost always work. The problem is that they don't guarantee consistency. What I am arguing is that we can get consistency and the delayed syncs like performance at the same time.

I also discussed this offline with Jonathan a little bit. Let me try one more time.

Today, by default, HBase operates in "writers wait for sync" mode. This is good because it guarantees both durability and consistency. It is bad because it can be slow.

Deferred syncs neither guarantee durability nor consistency.

By consistency I mean the following - if A is the cause of B, and if B is present in the logs then A must also be present in the logs.

If we can have HBase operate in "readers wait for sync" mode then we don't guarantee durability but we still guarantee consistency. And the performance should be similar to that of deferred syncs.

 





> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906549#action_12906549 ] 

stack commented on HBASE-2957:
------------------------------

First, are you fellas running with hdfs-895?  (See HBASE-2467)  We have it deployed over here.   It makes a big difference.

@Prakash So you are suggesting a patch that optionally allows returning to the client though WAL has not been appended.  A flag in the Put would be checked somewhere around here -- http://hbase.apache.org/docs/r0.89.20100726/xref/org/apache/hadoop/hbase/regionserver/HRegion.html#1534 -- and if set, we'd skip the inline WAL append and in its place we'd just add the edit to a queue for adding the WAL out of bound with the current update?  I'd guess you'd want the flag on ICV too, somewhere around here -- http://hbase.apache.org/docs/r0.89.20100726/xref/org/apache/hadoop/hbase/regionserver/HRegion.html#3046?  We'd change our current 'writeToWAL' flag so rather than being a binary, that it would allow a new queue-the-edit-for-WAL-addition option?

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906570#action_12906570 ] 

Todd Lipcon commented on HBASE-2957:
------------------------------------

Hm, I might have missed something in the discussion, but it seems Prakash's original suggestion should work. Right now we do for any edit:

1. lock rows
2. write to hlog
3. sync hlog
4. beginMemstoreInsert
5. edit memstore
6. completeMemstoreInsert
7. release locks

Instead, I think another valid order would be:

1. lock rows
2. write to hlog
3. begin memstore insert
4. edit memstore
5. unlock rows
6. sync hlog
7. complete memstore insert

The guarantee we have to provide is that we don't complete the memstore insert before sync, but assuming we hold that the same as today, it should be invisible to users but provide a speedup for all modifications except CAS. We'll have to be careful how this interacts with CAS, though.

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906645#action_12906645 ] 

ryan rawson commented on HBASE-2957:
------------------------------------

why dont you shard your counters?  With the perf optimizations we did
last friday, you should easily be able to support 100m counters/day
per row, just shard row-wise and you are set for scaling.

-ryan




> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "Prakash Khemani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905798#action_12905798 ] 

Prakash Khemani commented on HBASE-2957:
----------------------------------------

Actually, data consistency is not guaranteed if we return to the HBase client any value which has not yet been sync'd to WAL. But for my use case, and I think for many others, it is OK.






> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906629#action_12906629 ] 

Todd Lipcon commented on HBASE-2957:
------------------------------------

bq. other readers would have already seen the new value because u released the rowlock

Actually edits aren't visible until the "complete memstore insert" step - we do a sort of MVCC-ish here to prevent visibility until the sync is complete, much like what you were proposing above.

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "Prakash Khemani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907380#action_12907380 ] 

Prakash Khemani commented on HBASE-2957:
----------------------------------------


I agree that the ordering on a given region server will be the same with or without delayed sync. But I am pretty sure that globally there will be inconsistencies.

Say a value is updated on RS A. This value is not synced yet.

The abovementioned unsynced value on RS A is read by someone and based on that value an update is made on another RS B. Say the update on RS B is synced.

Now we have a window where B depends on A, B is in the logs but A isn't. In in this window if RS A dies and comes back up then we will have a situation where the update on RS B is present but update on RS A isn't.






> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906634#action_12906634 ] 

Jonathan Gray commented on HBASE-2957:
--------------------------------------

bq. but do you want the solution generalized so that it works for a workload that does lots of "puts" into the same record? 
Absolutely.  I think the solution you are proposing (which to me looks to be the same as what Todd suggests) makes sense and we should do it.

However, ICV has a completely different codepath and I think Prakash was specifically thinking about this use case.

We should tackle both, maybe open a separate JIRA for ICV and make this one about a normal Put?

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906984#action_12906984 ] 

stack commented on HBASE-2957:
------------------------------

@Ryan "...and It Just Did Not Work."

Can you rehearse the probs seen for the sake of the new fellas (Prakash, etc).


> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906625#action_12906625 ] 

Jonathan Gray commented on HBASE-2957:
--------------------------------------

I think Prakash and Dhruba are specifically thinking about ICVs rather than Puts.  The begin insert / complete insert stuff doesn't actually get used at all for ICVs, in this case the memstoreTS=0.  So there is no "complete memstore insert".

Maybe the special ICV group commit makes sense.  Otherwise we'll have to relax some constraints or change how ICV works.

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907322#action_12907322 ] 

Jonathan Gray commented on HBASE-2957:
--------------------------------------

bq. If we can have HBase operate in "readers wait for sync" mode then we don't guarantee durability but we still guarantee consistency. And the performance should be similar to that of deferred syncs.

And given that most workloads are write dominated, you would expect significantly more total throughput.

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906923#action_12906923 ] 

ryan rawson commented on HBASE-2957:
------------------------------------

would you use the delayed sync option?  It has the similar semantics, where we don't wait for the HDFS acks and continue with the memstore applications and then return to the client.  Data is only persisted when EITHER a different put without this option comes in, OR the time-based flush kicks in.  Our tests on a prior implementation of this indicated amazing speedups.

Also todd's idea yesterday won't work - you can't put a long pole event (hlog sync) inside a memstore transaction sequence.  We already had something like that and It Just Did Not Work.  

I think another option would be:

Get latest Value for ICV
create next value
Write HLog & sync
Obtain row lock
begin memstore insert
insert memstore values
commit memstore 
release row lock
return to client

But this wouldn't work for ICVs either!  Multiple clients would create multiple next values, and that would be bad.

If we had this potential implementation:

Write HLog diff of ICV (ie: +1 to so and so)
Obtain row lock
get existing value
create new value
begin memstore insert
insert memstore values
commit memstore
release row lock
return to client

I think you should seriously look into delayed sync.  You can run your sync thread as often as 100ms and it still improves performance. 100ms of lost data in the worst case isnt so bad... right?

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906889#action_12906889 ] 

Todd Lipcon commented on HBASE-2957:
------------------------------------

{quote}
A simple way to do that will be to attach a log-sync-number with every cell. When a cell is updated it will keep the next log-sync-number within itself. A get will not return until the current log-sync-number is at least as big as log-sync-number stored in the cell.

An update can return immediately after queuing the sync. The "wait-for-sync" is transferred from the writer to the reader. If the reader comes in sufficiently late (which is likely) then there will be no wait-for-syncs in the system.
{quote}

We actually already do this! Rather than using the log-sync number, the memstore has an internal timestamp for readability. When we scan a row, we record this number atomically and only return cell versions older than this timestamp. See the ReadWriteConsistencyControl class.

The only change we have to make is that we unlock the row before we call sync and update the memstore TS, rather than after.

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906312#action_12906312 ] 

dhruba borthakur commented on HBASE-2957:
-----------------------------------------

I am seeing a typical use case of hbase where all the rows of
a table are not equally hot. A few rows are orders of magnitude
hotter than most other rows.

Each get/put operation in hbase involes the following:
{code}
 put operation                           get operation
 --------------------------------------------------------------
1. acquire the rowlock
2. append to hlog
3. update memstore                 read from memstore
4. release rowlock
{code}

For example, if the appliction workload consists of only increment operations on *one* record, then
the entire workload is serialized and the throughout is purely dependent on the
speed of the append-hlog operation. The number of hlog.append calls is
precisely the same as the number of put calls. This can be slow, especially
because the append operation requires writing to three datanodes in hdfs.

We can make the workload supertfast while keeping the same data consistency
guarantees if we can achieve some batching. For
each record, let's say that the memstore contains a version of the record that has been committed to
hlog and another version of the same record that is being updated in memory
but has not yet been committed to hlog. let's say that we refer to these two versions
of the record as "memstore.inflight" and "memstore.committed" versions.

{code}
 put operation                                        get operation
 ----------------------------------------------------------------------------------
1. acquire the rowlock
2. update memstore.inflight                   read memstores.committed
3. release rowlock
3. append to hlog
4. memstore.committed = memstore.inflight

{code}

The key to the above protocol is that the rowlock is released as soon
as memstore is updated. This means that multiple calls to put() for
the same record will be parallelized and would result in a fewer calls
to hlog.append.

Do people think that this is feasible and beneficial? If so, I can delve deeper into the design and implementation of this performance improvement.

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906648#action_12906648 ] 

Jonathan Gray commented on HBASE-2957:
--------------------------------------

We are already sharding counters.  I don't think that precludes these optimizations.

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906558#action_12906558 ] 

stack commented on HBASE-2957:
------------------------------

@Dhruba How does memstore.inflight get flipped to be memstore.committed?  Safely?  There are some existing mechanisms that might be of help.  There is a read/write consistency class in hbase http://hbase.apache.org/docs/r0.89.20100726/xref/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.html#1 that is used around memstore updates to help keep the reads and scans 'clean'; i.e. prevents reading of partial updates on a row (There are some known caveats).  And then in KeyValue, there is a 'special' extra timestamp used ensuring memstore consistency: http://hbase.apache.org/docs/r0.89.20100726/xref/org/apache/hadoop/hbase/KeyValue.html#216

Would it be simpler adding an ICV batcher/catcher/reservoir/overspill that sat in front of an actual ICV call?   It would accumulate ICVs while WAL was busy syncing.  The ICV resevoir would be an optional facility; you'd ask for it by setting a flag on the ICV call.  On return, the actual ICV would check the reservoir to see if any ICVs had been accumulating while it was off syncing.  If any, it'd suck them all up and apply the ICVs in a batch.  We'd need to add a new batch ICV call, one that added a block of ICVs to the WAL  and that then did the memstore updates playing the content of the batch one at a time.  We'd return to the client as soon as we'd added an ICV to the revervoir not waiting on WAL

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "Prakash Khemani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906884#action_12906884 ] 

Prakash Khemani commented on HBASE-2957:
----------------------------------------

Sorry, I was out and couldn't reply to this thread.

I think a general solution that guarantees consistency for PUTs and ICVs and at the same time doesn't hold the row lock while updating hlog is possible.

===

Thinking aloud. First why do we want to hold the row lock around the log sync? Because we want the log sync to happen in causal ordering. Here is a scenario of what can go wrong if we release the row lock before the sync completes.
	1. client-1 does a put/icv on regionserver-1. releases the row lock before the sync.
	2. client-2 comes in and reads the new value. Based on this just read value, client-2 then does a put in regionserver-2.
	3. client-2 is able to do its sync on rs-2 before client-1's sync on rs-1 completes.
	4. rs-1 is brought down ungracefully. During recovery we will have client-2's update but not client-1's. And that violates the causal ordering of events.

===
So we don't want anyone to read a value which has not already been synced. I think we can transfer the wait-for-sync to the reader instead of asking all writers to wait.

A simple way to do that will be to attach a log-sync-number with every cell. When a cell is updated it will keep the next log-sync-number within itself. A get will not return until the current log-sync-number is at least as big as log-sync-number stored in the cell.

An update can return immediately after queuing the sync. The "wait-for-sync" is transferred from the writer to the reader. If the reader comes in sufficiently late (which is likely) then there will be no wait-for-syncs in the system.

===
Even in this scheme we will have to treat ICVs specially. Logically an ICV has a (a) GET the old value (b) PUT the new value (c) GET and return the new value

There are 2 cases
(1) The ICV caller doesn't use the return value of the ICV. In this case the ICV need not wait for the earlier sync to complere. (In my use case this what happens predominantly)

(2) The ICV caller uses the return value of the ICV call to make further updates. In this case the ICV has to wait for its sync to complete before it returns. While the ICV is waiting for the sync to complete it need not hold the row lock. (At least in my use case this is a very rare case)

===
I think that it is true in general that while a GET is forced to wait for a sync to complete, there is no need to hold the row lock.

===






> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907357#action_12907357 ] 

ryan rawson commented on HBASE-2957:
------------------------------------

Prakash, I think we must be talking about different things, because delayed syncs _do_ guarantee consistency. 

Some background on the code... HLog is a sequence of entries... while concurrent writes have indeterminate ordering, writes on the same row lock are effective serialized and thus there is a strong causal ordering between HLog entries and sequence of operations for writes under the same row lock.  If you look at the implementation of HLog, there is a sync block in the middle in append which causes all entries to have a serial sequence id and also protects the underlying file which is not multithread capable.

The way that delayed syncs work is like so for ICVs:

Begin Row Lock
Do Get
Make new Value from Old Value and increment Amount
Append to HLog
if (!delayed sync) Do HLog sync
Update Memstore
Release Row Lock

The longest pole item in this sequence tends to be the hlog sync.  So make it optional.  The entries are now buffered in memory.  The next sync by either a client on a different table, or via the background thread timer will flush the entries to HDFS thus persisting* them.

The ordering in the hlog will be the same with or without delayed sync, and performance is vastly improved at a small hit in durability.  For peacemeal ICVs, losing 100ms now and again should not be statistically significant in high volume cases.  The setting is on a per-table basis, thus you can choose your tradeoff at that level.

> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2957) Release row lock when waiting for wal-sync

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906627#action_12906627 ] 

dhruba borthakur commented on HBASE-2957:
-----------------------------------------

@Todd: in your proposal you unlock the row before the Hlog is synced. in the rare case then the Hlog.sync fails, that transaction will be lost but other readers would have already seen the new value because u released the rowlock.

@Stack. @Jonathan: " simpler adding an ICV batcher/catcher/reservoir/overspill that sat in front of an actual ICV call?"
I agree and I think that proposal works when the use-case is only ICV. but do you want the solution generalized so that it works for a workload that does lots of "puts" into the same record? with the exiting code, all these put calls will get serialized via the rowlock and syncs to HLog does not get any batching.


> Release row lock when waiting for wal-sync
> ------------------------------------------
>
>                 Key: HBASE-2957
>                 URL: https://issues.apache.org/jira/browse/HBASE-2957
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, wal
>    Affects Versions: 0.20.0
>            Reporter: Prakash Khemani
>
> Is there a reason to hold on to the row-lock while waiting for the WAL-sync to be completed by the logSyncer thread?
> I think data consistency will be guaranteed even if the following happens (a) the row lock is held while the row is updated in memory (b) the row lock is released after queuing the KV record for WAL-syncing (c) the log-sync system guarantees that the log records for any given row are synced in order (d) the HBase client only receives a success notification after the sync completes (no change from the current state)
> I think this should be a huge win. For my use case, and I am sure for others,  the handler thread spends the bulk of its row-lock critical section  time waiting for sync to complete.
> Even if the log-sync system cannot guarantee the orderly completion of sync records, the "Don't hold row lock while waiting for sync" option should be available to HBase clients on a per request basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.