You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2014/12/06 00:35:12 UTC

[jira] [Commented] (BOOKKEEPER-795) Race condition causes writes to hang if ledger is fenced

    [ https://issues.apache.org/jira/browse/BOOKKEEPER-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236306#comment-14236306 ] 

Hadoop QA commented on BOOKKEEPER-795:
--------------------------------------

Testing JIRA BOOKKEEPER-795


Patch [0001-Made-ledger-metadata-immutable.patch|https://issues.apache.org/jira/secure/attachment/12681335/0001-Made-ledger-metadata-immutable.patch] downloaded at Fri Dec  5 23:27:34 UTC 2014

----------------------------

{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:green}+1 RAW_PATCH_ANALYSIS{color}
.    {color:green}+1{color} the patch does not introduce any @author tags
.    {color:green}+1{color} the patch does not introduce any tabs
.    {color:green}+1{color} the patch does not introduce any trailing spaces
.    {color:green}+1{color} the patch does not introduce any line longer than 120
.    {color:green}+1{color} the patch does adds/modifies 20 testcase(s)
{color:green}+1 RAT{color}
.    {color:green}+1{color} the patch does not seem to introduce new RAT warnings
.    {color:red}WARNING: the current HEAD has 1 RAT warning(s), they should be addressed ASAP{color}
{color:green}+1 JAVADOC{color}
.    {color:green}+1{color} the patch does not seem to introduce new Javadoc warnings
{color:red}-1 COMPILE{color}
.    {color:green}+1{color} HEAD compiles
.    {color:red}-1{color} patch does not compile
.    {color:green}+1{color} the patch does not seem to introduce new javac warnings
{color:green}+1 FINDBUGS{color}
.    {color:green}+1{color} the patch does not seem to introduce new Findbugs warnings
{color:red}-1 TESTS{color} - patch does not compile, cannot run testcases
{color:red}-1 DISTRO{color}
.    {color:red}-1{color} distro tarball fails with the patch

----------------------------
{color:red}*-1 Overall result, please check the reported -1(s)*{color}

{color:red}.   There is at least one warning, please check{color}

The full output of the test-patch run is available at

.   https://builds.apache.org/job/bookkeeper-trunk-precommit-build/875/

> Race condition causes writes to hang if ledger is fenced
> --------------------------------------------------------
>
>                 Key: BOOKKEEPER-795
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-795
>             Project: Bookkeeper
>          Issue Type: Bug
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>            Priority: Blocker
>             Fix For: 4.4.0
>
>         Attachments: 0001-Demonstrate-race-condition.patch, 0001-Made-ledger-metadata-immutable.patch, 0001-Made-ledger-metadata-immutable.patch, TEST-org.apache.bookkeeper.client.LedgerCloseTest.xml
>
>
> If a ledger is fenced while the write is still writing to it, some of the writes will fail to ever complete.
> I've attached the log of this happening along with a test case that will trigger the behaviour.
> What appears to be happening is that when the fence occurs, the first write after the fence gets an unrecoverable error, so tries to close the ledger. Closing the ledger sets the closed flag on the ledger metadata, and tries to write it, which fails as the metadata in zookeeper was modified by the fencing operation, so the close op fails, resets the closed status for a moment, a write operation gets through, which then fails with a fencing error, so we try to close the ledger, but the other close operation has since closed the ledger in our metadata, so nothing happens, and the write hangs forever.
> There's a number of issues here, but foremost, the ledger metadata that the handle is using should only ever represent what is actually in zookeeper. Having various parts of the code flipping bits just explodes the state space. The LedgerMetadata object itself should be immutable, and should only be modified, as a local variable, using a builder, before writing to zookeeper. Only when the zookeeper operation succeeds should we update the reference which LedgerHandle has access to.
> There's also a problem in how we handle pendingaddops when we close. Really it shouldn't be possible for a write op to get through after a closure, but we should be defensive here and error out anything that has gotten through, adding a big old log message to alert us that this cases that shouldn't happen, is happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)