You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "ryan rawson (JIRA)" <ji...@apache.org> on 2009/08/04 02:01:15 UTC
[jira] Created: (HBASE-1740) ICV has a subtle race condition only
visible under high load
ICV has a subtle race condition only visible under high load
------------------------------------------------------------
Key: HBASE-1740
URL: https://issues.apache.org/jira/browse/HBASE-1740
Project: Hadoop HBase
Issue Type: Bug
Affects Versions: 0.20.0
Reporter: ryan rawson
Fix For: 0.20.0, 0.20.1
ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
What happens at a deeper level:
- we start an ICV
- a snapshot happens and moves the memstore to the snapshot
- the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1740) ICV has a subtle race condition only
visible under high load
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-1740:
-------------------------
Fix Version/s: (was: 0.20.0)
Move to 0.20.1
> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
> Key: HBASE-1740
> URL: https://issues.apache.org/jira/browse/HBASE-1740
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Fix For: 0.20.1
>
> Attachments: HBASE-1740.patch
>
>
> ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-1740) ICV has a subtle race condition only
visible under high load
Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ryan rawson reassigned HBASE-1740:
----------------------------------
Assignee: ryan rawson
> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
> Key: HBASE-1740
> URL: https://issues.apache.org/jira/browse/HBASE-1740
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: ryan rawson
> Fix For: 0.20.1
>
> Attachments: HBASE-1740-test.patch, HBASE-1740.patch, HBASE-icv.patch
>
>
> ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1740) ICV has a subtle race condition only
visible under high load
Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ryan rawson updated HBASE-1740:
-------------------------------
Attachment: HBASE-icv.patch
here is a patch that works for us. Highly recommended, but also very intrusive. It does do ICV "the right way":
- log to HLog
- then make in-ram changes
- dont end up with duplicate timestamps in memstore and hfiles
- dont create too many versions
enjoy
> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
> Key: HBASE-1740
> URL: https://issues.apache.org/jira/browse/HBASE-1740
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Fix For: 0.20.1
>
> Attachments: HBASE-1740.patch, HBASE-icv.patch
>
>
> ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-1740) ICV has a subtle race condition only
visible under high load
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754357#action_12754357 ]
stack commented on HBASE-1740:
------------------------------
Reviewed and ran unit tests. All pass but the broken ITHBase test. Committed branch and trunk.
> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
> Key: HBASE-1740
> URL: https://issues.apache.org/jira/browse/HBASE-1740
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: ryan rawson
> Fix For: 0.20.1
>
> Attachments: HBASE-1740-test.patch, HBASE-1740.patch, HBASE-icv.patch
>
>
> ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1740) ICV has a subtle race condition only
visible under high load
Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ryan rawson updated HBASE-1740:
-------------------------------
Attachment: HBASE-1740.patch
here is a prelim potential fix, but no test updates and probably doesnt compile in the wider codebase (the snippet doesnt have errors).
> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
> Key: HBASE-1740
> URL: https://issues.apache.org/jira/browse/HBASE-1740
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Fix For: 0.20.0, 0.20.1
>
> Attachments: HBASE-1740.patch
>
>
> ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-1740) ICV has a subtle race condition only
visible under high load
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack resolved HBASE-1740.
--------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Committed branch and trunk. Ran tests. Tests passed but for the ones in contrib currently failing up on hudson.
> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
> Key: HBASE-1740
> URL: https://issues.apache.org/jira/browse/HBASE-1740
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: ryan rawson
> Fix For: 0.20.1
>
> Attachments: HBASE-1740-test.patch, HBASE-1740.patch, HBASE-icv.patch
>
>
> ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-1740) ICV has a subtle race condition only
visible under high load
Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ryan rawson updated HBASE-1740:
-------------------------------
Attachment: HBASE-1740-test.patch
i forgot the tests! oops!
> ICV has a subtle race condition only visible under high load
> ------------------------------------------------------------
>
> Key: HBASE-1740
> URL: https://issues.apache.org/jira/browse/HBASE-1740
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.0
> Reporter: ryan rawson
> Assignee: ryan rawson
> Fix For: 0.20.1
>
> Attachments: HBASE-1740-test.patch, HBASE-1740.patch, HBASE-icv.patch
>
>
> ICV demonstrates a race condition under high load. The result is a duplicate KeyValue with the same timestamp, at first in the memcache, and in hfile, then both in hfile. The get/scan code doesnt know which one to read, and picks one arbitrarily. One of the keyvalues is correct, one is incorrect.
> What happens at a deeper level:
> - we start an ICV
> - a snapshot happens and moves the memstore to the snapshot
> - the ICV code puts a key-value into memstore that has the same timestamp as a keyvalue in the snapshot.
> This is a deep race condition and several attempts to fix it failed in production here at SU. This issue is about a more permanent fix.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.