You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@subversion.apache.org by "Julian Foad (Jira)" <ji...@apache.org> on 2021/05/21 13:58:00 UTC

[jira] [Commented] (SVN-4877) FSFS commit failure should release txn proto-rev lock

    [ https://issues.apache.org/jira/browse/SVN-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17349279#comment-17349279 ] 

Julian Foad commented on SVN-4877:
----------------------------------

Cross-reference case reference number for original reporter: NV-7983

> FSFS commit failure should release txn proto-rev lock
> -----------------------------------------------------
>
>                 Key: SVN-4877
>                 URL: https://issues.apache.org/jira/browse/SVN-4877
>             Project: Subversion
>          Issue Type: Bug
>          Components: libsvn_fs_fs
>    Affects Versions: 1.10.x, 1.14.1
>            Reporter: Julian Foad
>            Priority: Major
>         Attachments: svn-release-proto-rev-lock-7.patch, test-release-proto-rev-lock-7.tgz
>
>
> Email thread: on dev@, **2020-06-10, "FSFS commit failure should release txn proto-rev lock", [https://lists.apache.org/thread.html/r649aba731b6b01e90eebf26a5e2ba4ce8e806b4135aa75a0a082f990%40%3Cdev.subversion.apache.org%3E]
> Quoting from that thread:
> TL;DR: I propose a change to the FSFS commit-transaction function, to 
> release the proto-rev write lock if an error occurs while it has this lock.
> The practical applications of this change are rather obscure, which 
> perhaps explains why it has not been needed before. In particular, it 
> apparently is not needed for the way the rest of standard Subversion 
> drives FSFS, but may be needed for other users of FSFS. I have come 
> across this case in WANdisco's replicator, but as there are other 
> peculiarities in how that drives FSFS, let us not confuse the issue by 
> looking too closely at it. It appears the issue would apply to other 
> users of FSFS too.
> In the FSFS commit-transaction code path (in svn_fs_fs__commit) there is 
> a region where it acquires an exclusive write lock on the prototype 
> revision (proto-rev). There are cases where code in this region can 
> fail, and there is no release of the lock in the error return path. 
> That means if the calling process re-tries, the "writing" flag is still 
> set in the transaction object in memory, and this causes an "already 
> locked" error.
> In regular Subversion we abandon a transaction if it fails at this 
> stage, and so never hit the problem. There are failure modes where a 
> re-try could not succeed, notably after we move the proto-rev file into 
> its final location, breaking the transaction; this case is called out in 
> comments in the function and will remain after this change. Abandoning 
> the transaction is a safe and effective way to use FSFS. However, other 
> users of FSFS may prefer to re-try in certain other cases.
> The case WANdisco encountered is where some old repository corruption 
> (SVN-4858) was detected and reported at some point in this code region. 
>  Although the commit would not be able to succeed, it was important to 
> them that the same error should be reported again during a re-try, and 
> what was observed instead was that the "already locked" error was thrown 
> instead.
> I suppose disk being temporarily inaccessible is one class of error 
> where a re-try might be successful.
> The attached test and patch demonstrate and fix the problem.
> This patch does not attempt to make it possible to re-try a failed 
> commit in all cases. Some remaining cases are noted in the patch log 
> message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)