You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "ryan rawson (JIRA)" <ji...@apache.org> on 2009/08/04 02:01:15 UTC

[jira] Created: (HBASE-1740) ICV has a subtle race condition only visible under high load

ICV has a subtle race condition only visible under high load
------------------------------------------------------------

                 Key: HBASE-1740
                 URL: https://issues.apache.org/jira/browse/HBASE-1740
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.0
            Reporter: ryan rawson
             Fix For: 0.20.0, 0.20.1


ICV demonstrates a race condition under high load.  The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile.  The get/scan code doesnt know which one to read, and picks one arbitrarily.  One of the keyvalues is correct, one is incorrect.

What happens at a deeper level:
- we start an ICV
- a snapshot happens and moves the memstore to the snapshot
- the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.

This is a deep race condition and several attempts to fix it failed in production here at SU.  This issue is about a more permanent fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1740) ICV has a subtle race condition only visible under high load

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1740:
-------------------------

    Fix Version/s:     (was: 0.20.0)

Move to 0.20.1

> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
>                 Key: HBASE-1740
>                 URL: https://issues.apache.org/jira/browse/HBASE-1740
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>             Fix For: 0.20.1
>
>         Attachments: HBASE-1740.patch
>
>
> ICV demonstrates a race condition under high load.  The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile.  The get/scan code doesnt know which one to read, and picks one arbitrarily.  One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU.  This issue is about a more permanent fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1740) ICV has a subtle race condition only visible under high load

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson reassigned HBASE-1740:
----------------------------------

    Assignee: ryan rawson

> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
>                 Key: HBASE-1740
>                 URL: https://issues.apache.org/jira/browse/HBASE-1740
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.20.1
>
>         Attachments: HBASE-1740-test.patch, HBASE-1740.patch, HBASE-icv.patch
>
>
> ICV demonstrates a race condition under high load.  The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile.  The get/scan code doesnt know which one to read, and picks one arbitrarily.  One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU.  This issue is about a more permanent fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1740) ICV has a subtle race condition only visible under high load

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-1740:
-------------------------------

    Attachment: HBASE-icv.patch

here is a patch that works for us. Highly recommended, but also very intrusive.  It does do ICV "the right way":
- log to HLog
- then make in-ram changes
- dont end up with duplicate timestamps in memstore and hfiles
- dont create too many versions

enjoy

> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
>                 Key: HBASE-1740
>                 URL: https://issues.apache.org/jira/browse/HBASE-1740
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>             Fix For: 0.20.1
>
>         Attachments: HBASE-1740.patch, HBASE-icv.patch
>
>
> ICV demonstrates a race condition under high load.  The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile.  The get/scan code doesnt know which one to read, and picks one arbitrarily.  One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU.  This issue is about a more permanent fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1740) ICV has a subtle race condition only visible under high load

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754357#action_12754357 ] 

stack commented on HBASE-1740:
------------------------------

Reviewed and ran unit tests.  All pass but the broken ITHBase test.  Committed branch and trunk.

> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
>                 Key: HBASE-1740
>                 URL: https://issues.apache.org/jira/browse/HBASE-1740
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.20.1
>
>         Attachments: HBASE-1740-test.patch, HBASE-1740.patch, HBASE-icv.patch
>
>
> ICV demonstrates a race condition under high load.  The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile.  The get/scan code doesnt know which one to read, and picks one arbitrarily.  One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU.  This issue is about a more permanent fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1740) ICV has a subtle race condition only visible under high load

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-1740:
-------------------------------

    Attachment: HBASE-1740.patch

here is a prelim potential fix, but no test updates and probably doesnt compile in the wider codebase (the snippet doesnt have errors).

> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
>                 Key: HBASE-1740
>                 URL: https://issues.apache.org/jira/browse/HBASE-1740
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>             Fix For: 0.20.0, 0.20.1
>
>         Attachments: HBASE-1740.patch
>
>
> ICV demonstrates a race condition under high load.  The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile.  The get/scan code doesnt know which one to read, and picks one arbitrarily.  One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU.  This issue is about a more permanent fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1740) ICV has a subtle race condition only visible under high load

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-1740.
--------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Committed branch and trunk.  Ran tests.  Tests passed but for the ones in contrib currently failing up on hudson.

> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
>                 Key: HBASE-1740
>                 URL: https://issues.apache.org/jira/browse/HBASE-1740
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.20.1
>
>         Attachments: HBASE-1740-test.patch, HBASE-1740.patch, HBASE-icv.patch
>
>
> ICV demonstrates a race condition under high load.  The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile.  The get/scan code doesnt know which one to read, and picks one arbitrarily.  One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU.  This issue is about a more permanent fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1740) ICV has a subtle race condition only visible under high load

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-1740:
-------------------------------

    Attachment: HBASE-1740-test.patch

i forgot the tests! oops!

> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
>                 Key: HBASE-1740
>                 URL: https://issues.apache.org/jira/browse/HBASE-1740
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.20.1
>
>         Attachments: HBASE-1740-test.patch, HBASE-1740.patch, HBASE-icv.patch
>
>
> ICV demonstrates a race condition under high load.  The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile.  The get/scan code doesnt know which one to read, and picks one arbitrarily.  One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU.  This issue is about a more permanent fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.