You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2011/03/02 08:47:38 UTC

[jira] Created: (HBASE-3588) Proposal to optimize ReadWriteConsistencyControl

Proposal to optimize ReadWriteConsistencyControl
------------------------------------------------

                 Key: HBASE-3588
                 URL: https://issues.apache.org/jira/browse/HBASE-3588
             Project: HBase
          Issue Type: Improvement
          Components: regionserver
            Reporter: dhruba borthakur
            Assignee: dhruba borthakur


The ReadWriteConsistencyControl (RWCC) mechanism facilitates making a set of memstore updates atomically visible to readers. Also, the rwcc.completeMemstoreInsert() blocks till the memstore read point advances to the current writeNumber. This is done to ensure that if an application that does a put immediately issues a new get call for the same key, then the get should see the values inserted by the previous call to put. The current implementation assumes this worst-case and penalizes the put rpc to not return to the client until the read point advances to this transaction's write number.

In many use-cases, the application never actually issues a get for the most recent put that it inserted. In this case, it would be nice if we can transfer the penalty (of blocking) to the get call that follows the initial put.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3588) Proposal to optimize ReadWriteConsistencyControl

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HBASE-3588:
------------------------------------

    Attachment: rwcc.trunk.1

> Proposal to optimize ReadWriteConsistencyControl
> ------------------------------------------------
>
>                 Key: HBASE-3588
>                 URL: https://issues.apache.org/jira/browse/HBASE-3588
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: rwcc.trunk.1, rwcc.trunk.1
>
>
> The ReadWriteConsistencyControl (RWCC) mechanism facilitates making a set of memstore updates atomically visible to readers. Also, the rwcc.completeMemstoreInsert() blocks till the memstore read point advances to the current writeNumber. This is done to ensure that if an application that does a put immediately issues a new get call for the same key, then the get should see the values inserted by the previous call to put. The current implementation assumes this worst-case and penalizes the put rpc to not return to the client until the read point advances to this transaction's write number.
> In many use-cases, the application never actually issues a get for the most recent put that it inserted. In this case, it would be nice if we can transfer the penalty (of blocking) to the get call that follows the initial put.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3588) Proposal to optimize ReadWriteConsistencyControl

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002455#comment-13002455 ] 

dhruba borthakur commented on HBASE-3588:
-----------------------------------------

Thanks ted for fixing the unit test.

Ryan: I have not yet deployed this fix in our productin setup, the reason being exactly the point u bring up... that a new read request gets delayed at startup (till the readPoint advances to the most recently committed transaction). In short, this patch makes the puts faster but makes the gets somewhat slower. I am not sure that this is a good thing.

I would have somehow liked the put to not wait for the readPoint to advance to the current transaction's writeNumber while at the same time ensuring that a client will always see a previously committed transaction. 
 

> Proposal to optimize ReadWriteConsistencyControl
> ------------------------------------------------
>
>                 Key: HBASE-3588
>                 URL: https://issues.apache.org/jira/browse/HBASE-3588
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: rwcc.trunk.1, rwcc.trunk.1
>
>
> The ReadWriteConsistencyControl (RWCC) mechanism facilitates making a set of memstore updates atomically visible to readers. Also, the rwcc.completeMemstoreInsert() blocks till the memstore read point advances to the current writeNumber. This is done to ensure that if an application that does a put immediately issues a new get call for the same key, then the get should see the values inserted by the previous call to put. The current implementation assumes this worst-case and penalizes the put rpc to not return to the client until the read point advances to this transaction's write number.
> In many use-cases, the application never actually issues a get for the most recent put that it inserted. In this case, it would be nice if we can transfer the penalty (of blocking) to the get call that follows the initial put.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3588) Proposal to optimize ReadWriteConsistencyControl

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002456#comment-13002456 ] 

Ted Yu commented on HBASE-3588:
-------------------------------

If we can provide two policies that the user can choose for the particular workload, that would be nice.

> Proposal to optimize ReadWriteConsistencyControl
> ------------------------------------------------
>
>                 Key: HBASE-3588
>                 URL: https://issues.apache.org/jira/browse/HBASE-3588
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: rwcc.trunk.1, rwcc.trunk.1
>
>
> The ReadWriteConsistencyControl (RWCC) mechanism facilitates making a set of memstore updates atomically visible to readers. Also, the rwcc.completeMemstoreInsert() blocks till the memstore read point advances to the current writeNumber. This is done to ensure that if an application that does a put immediately issues a new get call for the same key, then the get should see the values inserted by the previous call to put. The current implementation assumes this worst-case and penalizes the put rpc to not return to the client until the read point advances to this transaction's write number.
> In many use-cases, the application never actually issues a get for the most recent put that it inserted. In this case, it would be nice if we can transfer the penalty (of blocking) to the get call that follows the initial put.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3588) Proposal to optimize ReadWriteConsistencyControl

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001329#comment-13001329 ] 

ryan rawson commented on HBASE-3588:
------------------------------------

interesting idea, this needs to fit in the context of the acid work stack is doing.  In the discussion of that issue, it turned out that we were required to always read from the most recent read point between each row. 

How much time is this saving?  It was expected that a memstore insertion would be REALLY fast.  Now with the MemstoreLAB stuff, that might not be so true anymore...

> Proposal to optimize ReadWriteConsistencyControl
> ------------------------------------------------
>
>                 Key: HBASE-3588
>                 URL: https://issues.apache.org/jira/browse/HBASE-3588
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: rwcc.trunk.1, rwcc.trunk.1
>
>
> The ReadWriteConsistencyControl (RWCC) mechanism facilitates making a set of memstore updates atomically visible to readers. Also, the rwcc.completeMemstoreInsert() blocks till the memstore read point advances to the current writeNumber. This is done to ensure that if an application that does a put immediately issues a new get call for the same key, then the get should see the values inserted by the previous call to put. The current implementation assumes this worst-case and penalizes the put rpc to not return to the client until the read point advances to this transaction's write number.
> In many use-cases, the application never actually issues a get for the most recent put that it inserted. In this case, it would be nice if we can transfer the penalty (of blocking) to the get call that follows the initial put.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3588) Proposal to optimize ReadWriteConsistencyControl

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001606#comment-13001606 ] 

ryan rawson commented on HBASE-3588:
------------------------------------

I had a few questions... This patch doesnt seem to remove the wait for the memstoreRead point to catch up to the just-completed write... Do you see that as well?

Also it seems like every reader is going to have to do this sequence:
- get the most recent written value
- wait until it becomes readable

Does that seem right?  Every reader will have a little delay (on a busy region of course) between when it starts and when it can start pulling data down.

One use case to worry about is a Row put followed by a CheckAndPut to the same row. I think this patch looks like it respects that use case, can you verify?

> Proposal to optimize ReadWriteConsistencyControl
> ------------------------------------------------
>
>                 Key: HBASE-3588
>                 URL: https://issues.apache.org/jira/browse/HBASE-3588
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: rwcc.trunk.1, rwcc.trunk.1
>
>
> The ReadWriteConsistencyControl (RWCC) mechanism facilitates making a set of memstore updates atomically visible to readers. Also, the rwcc.completeMemstoreInsert() blocks till the memstore read point advances to the current writeNumber. This is done to ensure that if an application that does a put immediately issues a new get call for the same key, then the get should see the values inserted by the previous call to put. The current implementation assumes this worst-case and penalizes the put rpc to not return to the client until the read point advances to this transaction's write number.
> In many use-cases, the application never actually issues a get for the most recent put that it inserted. In this case, it would be nice if we can transfer the penalty (of blocking) to the get call that follows the initial put.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3588) Proposal to optimize ReadWriteConsistencyControl

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002344#comment-13002344 ] 

Ted Yu commented on HBASE-3588:
-------------------------------

The following change in MemStore would make TestHeapSize pass:
{code}
  public final static long FIXED_OVERHEAD = ClassSize.align(
      ClassSize.OBJECT + (12 * ClassSize.REFERENCE));
{code}


> Proposal to optimize ReadWriteConsistencyControl
> ------------------------------------------------
>
>                 Key: HBASE-3588
>                 URL: https://issues.apache.org/jira/browse/HBASE-3588
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: rwcc.trunk.1, rwcc.trunk.1
>
>
> The ReadWriteConsistencyControl (RWCC) mechanism facilitates making a set of memstore updates atomically visible to readers. Also, the rwcc.completeMemstoreInsert() blocks till the memstore read point advances to the current writeNumber. This is done to ensure that if an application that does a put immediately issues a new get call for the same key, then the get should see the values inserted by the previous call to put. The current implementation assumes this worst-case and penalizes the put rpc to not return to the client until the read point advances to this transaction's write number.
> In many use-cases, the application never actually issues a get for the most recent put that it inserted. In this case, it would be nice if we can transfer the penalty (of blocking) to the get call that follows the initial put.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3588) Proposal to optimize ReadWriteConsistencyControl

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001575#comment-13001575 ] 

dhruba borthakur commented on HBASE-3588:
-----------------------------------------

I am yet to measure (at scale) the performance improvement. But does the code look good and correct?

The difference is not just one memstore update. Especially if we have lots of regionserver threads, there are many threads inserting into memstore in parallel.

> Proposal to optimize ReadWriteConsistencyControl
> ------------------------------------------------
>
>                 Key: HBASE-3588
>                 URL: https://issues.apache.org/jira/browse/HBASE-3588
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: rwcc.trunk.1, rwcc.trunk.1
>
>
> The ReadWriteConsistencyControl (RWCC) mechanism facilitates making a set of memstore updates atomically visible to readers. Also, the rwcc.completeMemstoreInsert() blocks till the memstore read point advances to the current writeNumber. This is done to ensure that if an application that does a put immediately issues a new get call for the same key, then the get should see the values inserted by the previous call to put. The current implementation assumes this worst-case and penalizes the put rpc to not return to the client until the read point advances to this transaction's write number.
> In many use-cases, the application never actually issues a get for the most recent put that it inserted. In this case, it would be nice if we can transfer the penalty (of blocking) to the get call that follows the initial put.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3588) Proposal to optimize ReadWriteConsistencyControl

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001654#comment-13001654 ] 

Ted Yu commented on HBASE-3588:
-------------------------------

The new method:
{code}
  public MemStore(final Configuration conf,
                  final KeyValue.KVComparator c,
                  final ReadWriteConsistencyControl rwcc) {
{code}
would produce this test failure:
{code}
testMultipleVersionsSimple(org.apache.hadoop.hbase.regionserver.TestMemStore)  Time elapsed: 0.007 sec  <<< ERROR!
java.lang.NoSuchMethodError: org.apache.hadoop.hbase.regionserver.MemStore.<init>(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/hbase/KeyValue$KVComparator;)V
        at org.apache.hadoop.hbase.regionserver.TestMemStore.testMultipleVersionsSimple(TestMemStore.java:477)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
{code}

> Proposal to optimize ReadWriteConsistencyControl
> ------------------------------------------------
>
>                 Key: HBASE-3588
>                 URL: https://issues.apache.org/jira/browse/HBASE-3588
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: rwcc.trunk.1, rwcc.trunk.1
>
>
> The ReadWriteConsistencyControl (RWCC) mechanism facilitates making a set of memstore updates atomically visible to readers. Also, the rwcc.completeMemstoreInsert() blocks till the memstore read point advances to the current writeNumber. This is done to ensure that if an application that does a put immediately issues a new get call for the same key, then the get should see the values inserted by the previous call to put. The current implementation assumes this worst-case and penalizes the put rpc to not return to the client until the read point advances to this transaction's write number.
> In many use-cases, the application never actually issues a get for the most recent put that it inserted. In this case, it would be nice if we can transfer the penalty (of blocking) to the get call that follows the initial put.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HBASE-3588) Proposal to optimize ReadWriteConsistencyControl

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HBASE-3588:
------------------------------------

    Attachment: rwcc.trunk.1

A quick hack that explains my proposal.

The idea is that we remember the most recent committed writeNumber. When a new RPC starts, it uses that as its read point. 

> Proposal to optimize ReadWriteConsistencyControl
> ------------------------------------------------
>
>                 Key: HBASE-3588
>                 URL: https://issues.apache.org/jira/browse/HBASE-3588
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: rwcc.trunk.1, rwcc.trunk.1
>
>
> The ReadWriteConsistencyControl (RWCC) mechanism facilitates making a set of memstore updates atomically visible to readers. Also, the rwcc.completeMemstoreInsert() blocks till the memstore read point advances to the current writeNumber. This is done to ensure that if an application that does a put immediately issues a new get call for the same key, then the get should see the values inserted by the previous call to put. The current implementation assumes this worst-case and penalizes the put rpc to not return to the client until the read point advances to this transaction's write number.
> In many use-cases, the application never actually issues a get for the most recent put that it inserted. In this case, it would be nice if we can transfer the penalty (of blocking) to the get call that follows the initial put.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira