You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2011/03/30 07:56:35 UTC

the retry loop in svn_io_file_lock2()

Why does svn_io_file_lock2() have a retry loop?

The reason given in the code is that some consumer of libsvn_fs wants to run
a critical section while holding a lock in each of two FSFS filesystems.  In
that case, why isn't the fix just to tell the consumer to, when wanting to
lock N filesystems, to always lock them in a well-defined order?

Concretely: sort the filesystems by their UUID lexicographically.

Re: the retry loop in svn_io_file_lock2()

Posted by Blair Zajac <bl...@orcaware.com>.

On 03/30/2011 03:22 PM, Daniel Shahaf wrote:
> Blair Zajac wrote on Wed, Mar 30, 2011 at 15:13:07 -0700:
>> On 03/30/2011 03:06 PM, Daniel Shahaf wrote:
>>> I notice that FILE_LOCK_RETRY_LOOP() checks for EINTR when EDEADLK isn't
>>> defined (or when APR isn't threaded).  I assume that check is there for
>>> the same reason --- i.e., that EINTR is another symptom of the deadlock
>>> situation that FILE_LOCK_RETRY_LOOP() was added due to?
>>
>> No, that's unrelated.  While I was working on the function, I
>> decided to handle EINTR in the same way that many of our other io
>> functions do, e.g. svn_io_create_unique_link(), svn_io_read_link(),
>> do_io_file_wrapper_cleanup().  This one didn't seem to hurt and
>> makes the function more robust.  I didn't want spurious EINTR errors
>> when trying to get a lock.
>>
>
> Okay, thanks.  I stand by my +1 then.
>
> (APR already handles EINTR in apr_file_lock(), at least in some
> environments; but I don't see the harm in leaving our check in too.)

Good catch, I didn't check apr_file_lock()'s implementation.  We could 
remove that if all environments check for it, but it's definitely safer 
having svn do it in case there was a time when APR didn't, but APR 0.9.x 
does retry also.

Blair

Re: the retry loop in svn_io_file_lock2()

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Blair Zajac wrote on Wed, Mar 30, 2011 at 15:13:07 -0700:
> On 03/30/2011 03:06 PM, Daniel Shahaf wrote:
> >I notice that FILE_LOCK_RETRY_LOOP() checks for EINTR when EDEADLK isn't
> >defined (or when APR isn't threaded).  I assume that check is there for
> >the same reason --- i.e., that EINTR is another symptom of the deadlock
> >situation that FILE_LOCK_RETRY_LOOP() was added due to?
> 
> No, that's unrelated.  While I was working on the function, I
> decided to handle EINTR in the same way that many of our other io
> functions do, e.g. svn_io_create_unique_link(), svn_io_read_link(),
> do_io_file_wrapper_cleanup().  This one didn't seem to hurt and
> makes the function more robust.  I didn't want spurious EINTR errors
> when trying to get a lock.
> 

Okay, thanks.  I stand by my +1 then.

(APR already handles EINTR in apr_file_lock(), at least in some
environments; but I don't see the harm in leaving our check in too.)

> Blair

Re: the retry loop in svn_io_file_lock2()

Posted by Blair Zajac <bl...@orcaware.com>.

On 03/30/2011 03:06 PM, Daniel Shahaf wrote:
> Blair Zajac wrote on Wed, Mar 30, 2011 at 13:59:30 -0700:
>> On 03/30/2011 04:00 AM, Daniel Shahaf wrote:
>>> Philip Martin wrote on Wed, Mar 30, 2011 at 10:33:42 +0100:
>>>> First, the knowledge about the locks is entirely within the thread at
>>>> present, each thread has no knowledge of what other threads are doing.
>>>
>>> But if the threads are independent, why are you describing the situation
>>> as deadlock?  With independent threads, p1t1 and p2t1 should finish
>>> their business and unlock both A and B, independently of anyone else who
>>> may have tried to lock either A or B...
>>
>> I agree with your assessment, which was mine also, but it appears
>> that the Linux kernel does not know that each thread will make
>> progress on its own, and I'm guessing that it cannot assume that the
>> thread will make progress.  It seems that the kernel treats the
>> process as a single threaded process, which then means a deadlock
>> has occurred for the scenario outlined in the code.
>>
>
> In other words, the difference between theory and practice is that the
> Linux kernel doesn't behave as theory predicts :-)
>
> Fair enough, thanks for the explanations.  I'll go and add my +1 to the
> STATUS entry.
>
> I notice that FILE_LOCK_RETRY_LOOP() checks for EINTR when EDEADLK isn't
> defined (or when APR isn't threaded).  I assume that check is there for
> the same reason --- i.e., that EINTR is another symptom of the deadlock
> situation that FILE_LOCK_RETRY_LOOP() was added due to?

No, that's unrelated.  While I was working on the function, I decided to 
handle EINTR in the same way that many of our other io functions do, 
e.g. svn_io_create_unique_link(), svn_io_read_link(), 
do_io_file_wrapper_cleanup().  This one didn't seem to hurt and makes 
the function more robust.  I didn't want spurious EINTR errors when 
trying to get a lock.

Blair

Re: the retry loop in svn_io_file_lock2()

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Philip Martin wrote on Wed, Mar 30, 2011 at 10:33:42 +0100:
> Daniel Shahaf <d....@daniel.shahaf.name> writes:
> 
> > Why does svn_io_file_lock2() have a retry loop?
> >
> > The reason given in the code is that some consumer of libsvn_fs wants to run
> > a critical section while holding a lock in each of two FSFS filesystems.  In
> > that case, why isn't the fix just to tell the consumer to, when wanting to
> > lock N filesystems, to always lock them in a well-defined order?
> 
> First, the knowledge about the locks is entirely within the thread at
> present, each thread has no knowledge of what other threads are doing.

But if the threads are independent, why are you describing the situation
as deadlock?  With independent threads, p1t1 and p2t1 should finish
their business and unlock both A and B, independently of anyone else who
may have tried to lock either A or B...

> You would need to introduce some sort of per-process lock handler.
> 
> Secondly, from the comment in the code:
> 
>      Process 1                         Process 2
>      ---------                         ---------
>      thread 1: get lock in repos A
>                                        thread 1: get lock in repos B
>                                        thread 2: block getting lock in repos A
>      thread 2: try to get lock in B *** deadlock ***
> 
> when each process takes the first lock it may not know that it will want
> the second lock.
> 

Okay, processes that don't know in advance what locking order they
need does sound better.

> -- 
> Philip

Thanks,

Daniel

Re: the retry loop in svn_io_file_lock2()

Posted by Philip Martin <ph...@wandisco.com>.

Daniel Shahaf <d....@daniel.shahaf.name> writes:

> Why does svn_io_file_lock2() have a retry loop?
>
> The reason given in the code is that some consumer of libsvn_fs wants to run
> a critical section while holding a lock in each of two FSFS filesystems.  In
> that case, why isn't the fix just to tell the consumer to, when wanting to
> lock N filesystems, to always lock them in a well-defined order?

First, the knowledge about the locks is entirely within the thread at
present, each thread has no knowledge of what other threads are doing.
You would need to introduce some sort of per-process lock handler.

Secondly, from the comment in the code:

     Process 1                         Process 2
     ---------                         ---------
     thread 1: get lock in repos A
                                       thread 1: get lock in repos B
                                       thread 2: block getting lock in repos A
     thread 2: try to get lock in B *** deadlock ***

when each process takes the first lock it may not know that it will want
the second lock.

-- 
Philip