You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Bert Huijben <be...@qqmail.nl> on 2014/01/16 15:19:45 UTC

[Patch] Fix for concurrent/race condition in apr_dir_make_recursive()

	Hi,

While debugging a Subversion buildbot error I found a race condition in
apr_dir_make_recursive().

If two new threads perform something like
$ mkdir -p some/deep/sub/dir
$ mkdir -p some/deep/sub/other
At the same time

And 'some' and/or 'deep' don't exist both threads will fail to create their
final directory part and then try to create the parent directories. If their
first attempt fails they try to create the parent directory, and then itself
again. But if the other thread was first, we get an EEXISTS error. This
error is ignored by the inner implementation, but the outer function
accidentally converts this to APR_SUCCESS even though the requested
directory isn't created.

[[
* file_io/win32/dir.c
  (dir_make_parent): When parent just got created, continue creating
children.
  (apr_dir_make_recursive): Only handle EEXIST of the requested directory as
success,
    not any ancestor.

Patch by: rhuijben
]]

At first glance it appears the same problem exists on other platforms (most
importantly the standard unix code) too.

Branko might already be looking at this patch as we discussed it on
#svn-dev.

It would be nice to have the patch (or something similar) applied in the
next apr 1.5.x release (and perhaps also on some older branches), as this
problem causes some very hard to diagnose errors in the Subversion test
suite.
We assumed that the function would succeed or return an error; not fail
without an error. So we were verifying and tweaking the code after calling
this function again and again.

	Bert

Re: [Patch] Fix for concurrent/race condition in apr_dir_make_recursive()

Posted by Branko Čibej <br...@apache.org>.
On 21.01.2014 10:18, Branko Čibej wrote:
> On 21.01.2014 03:01, Jeff Trawick wrote:
>> On Mon, Jan 20, 2014 at 8:23 PM, Branko Čibej <brane@apache.org
>> <ma...@apache.org>> wrote:
>>
>>     On 21.01.2014 02:18, Branko Čibej wrote:
>>     > On 21.01.2014 01:55, Branko Čibej wrote:
>>     >> I've added a test case to testdir.c to exercise this code in
>>     parallel.
>>     > r1559873, original patch (with slightly tweaked comments) and
>>     test case.
>>     > Bert, can you please double-check that this works on Windows?
>>
>>     Also r1559878 (1.5.x) and r1559879 (1.4.x).
>>
>>     -- Brane
>>
>>
>> I see intermittent failures testing the 1.5.x branch on Linux (4
>> cores, 8 threads) after picking up your new testcase.  The two
>> variations seen the most are
>>
>> testdir             : |Line 84: expected <0>, but saw <2>
>> testdir             : \Line 87: expected <0>, but saw <2>
>>
>> I've also seen a single failure at line 85 as well as this one that
>> needs to be recovered from manually:
>> testdir             : -Line 199: expected <0>, but saw <39> (39 is
>> "Directory not empty")
>>
>> Current state as of the unrecoverable failure:
>> $ find data/prll/
>> data/prll/
>> data/prll/11
>> data/prll/11/12
>> data/prll/11/12/13
>> data/prll/11/12/13/14
>> data/prll/11/12/13/14/15data
>> data/prll/11/12/13/14/15data/prll
>> data/prll/11/12/13/14/15data/prll/11
>> data/prll/11/12/13/14/15data/prll/11/12
>
> "Interesting". I'll take a look.

Heh, it was a bug in the test case; I had it using the same pool from 4
threads, and sometimes apr_pstrdup ended up copying half-baked strings.
I have no idea why I didn't see the bug on the Mac; probably different
timing and scheduling.

>> Also, can you add a CHANGES entry in the 1.4.x and 1.5.x branches?
>
> Will do, thanks for the reminder.

Done.

-- Brane

Re: [Patch] Fix for concurrent/race condition in apr_dir_make_recursive()

Posted by Branko Čibej <br...@apache.org>.
On 21.01.2014 03:01, Jeff Trawick wrote:
> On Mon, Jan 20, 2014 at 8:23 PM, Branko Čibej <brane@apache.org
> <ma...@apache.org>> wrote:
>
>     On 21.01.2014 02:18, Branko Čibej wrote:
>     > On 21.01.2014 01:55, Branko Čibej wrote:
>     >> I've added a test case to testdir.c to exercise this code in
>     parallel.
>     > r1559873, original patch (with slightly tweaked comments) and
>     test case.
>     > Bert, can you please double-check that this works on Windows?
>
>     Also r1559878 (1.5.x) and r1559879 (1.4.x).
>
>     -- Brane
>
>
> I see intermittent failures testing the 1.5.x branch on Linux (4
> cores, 8 threads) after picking up your new testcase.  The two
> variations seen the most are
>
> testdir             : |Line 84: expected <0>, but saw <2>
> testdir             : \Line 87: expected <0>, but saw <2>
>
> I've also seen a single failure at line 85 as well as this one that
> needs to be recovered from manually:
> testdir             : -Line 199: expected <0>, but saw <39> (39 is
> "Directory not empty")
>
> Current state as of the unrecoverable failure:
> $ find data/prll/
> data/prll/
> data/prll/11
> data/prll/11/12
> data/prll/11/12/13
> data/prll/11/12/13/14
> data/prll/11/12/13/14/15data
> data/prll/11/12/13/14/15data/prll
> data/prll/11/12/13/14/15data/prll/11
> data/prll/11/12/13/14/15data/prll/11/12

"Interesting". I'll take a look.

> I didn't see a failure on Windows (2 virtualized processors).

Thanks.

> Also, can you add a CHANGES entry in the 1.4.x and 1.5.x branches?

Will do, thanks for the reminder.

-- Brane

Re: [Patch] Fix for concurrent/race condition in apr_dir_make_recursive()

Posted by Jeff Trawick <tr...@gmail.com>.
On Mon, Jan 20, 2014 at 8:23 PM, Branko Čibej <br...@apache.org> wrote:

> On 21.01.2014 02:18, Branko Čibej wrote:
> > On 21.01.2014 01:55, Branko Čibej wrote:
> >> I've added a test case to testdir.c to exercise this code in parallel.
> > r1559873, original patch (with slightly tweaked comments) and test case.
> > Bert, can you please double-check that this works on Windows?
>
> Also r1559878 (1.5.x) and r1559879 (1.4.x).
>
> -- Brane
>
>
I see intermittent failures testing the 1.5.x branch on Linux (4 cores, 8
threads) after picking up your new testcase.  The two variations seen the
most are

testdir             : |Line 84: expected <0>, but saw <2>
testdir             : \Line 87: expected <0>, but saw <2>

I've also seen a single failure at line 85 as well as this one that needs
to be recovered from manually:
testdir             : -Line 199: expected <0>, but saw <39> (39 is
"Directory not empty")

Current state as of the unrecoverable failure:
$ find data/prll/
data/prll/
data/prll/11
data/prll/11/12
data/prll/11/12/13
data/prll/11/12/13/14
data/prll/11/12/13/14/15data
data/prll/11/12/13/14/15data/prll
data/prll/11/12/13/14/15data/prll/11
data/prll/11/12/13/14/15data/prll/11/12

I didn't see a failure on Windows (2 virtualized processors).

---

Also, can you add a CHANGES entry in the 1.4.x and 1.5.x branches?


-- 
Born in Roswell... married an alien...
http://emptyhammock.com/

Re: [Patch] Fix for concurrent/race condition in apr_dir_make_recursive()

Posted by Branko Čibej <br...@apache.org>.
On 21.01.2014 02:18, Branko Čibej wrote:
> On 21.01.2014 01:55, Branko Čibej wrote:
>> I've added a test case to testdir.c to exercise this code in parallel.
> r1559873, original patch (with slightly tweaked comments) and test case.
> Bert, can you please double-check that this works on Windows?

Also r1559878 (1.5.x) and r1559879 (1.4.x).

-- Brane


Re: [Patch] Fix for concurrent/race condition in apr_dir_make_recursive()

Posted by Branko Čibej <br...@apache.org>.
On 21.01.2014 01:55, Branko Čibej wrote:
> I've added a test case to testdir.c to exercise this code in parallel.
> -- Brane 

r1559873, original patch (with slightly tweaked comments) and test case.
Bert, can you please double-check that this works on Windows?

-- Brane

Re: [Patch] Fix for concurrent/race condition in apr_dir_make_recursive()

Posted by Branko Čibej <br...@apache.org>.
On 16.01.2014 15:19, Bert Huijben wrote:
> 	Hi,
>
> While debugging a Subversion buildbot error I found a race condition in
> apr_dir_make_recursive().
>
> If two new threads perform something like
> $ mkdir -p some/deep/sub/dir
> $ mkdir -p some/deep/sub/other
> At the same time
>
> And 'some' and/or 'deep' don't exist both threads will fail to create their
> final directory part and then try to create the parent directories. If their
> first attempt fails they try to create the parent directory, and then itself
> again. But if the other thread was first, we get an EEXISTS error. This
> error is ignored by the inner implementation, but the outer function
> accidentally converts this to APR_SUCCESS even though the requested
> directory isn't created.
>
> [[
> * file_io/win32/dir.c
>   (dir_make_parent): When parent just got created, continue creating
> children.
>   (apr_dir_make_recursive): Only handle EEXIST of the requested directory as
> success,
>     not any ancestor.
>
> Patch by: rhuijben
> ]]
>
> At first glance it appears the same problem exists on other platforms (most
> importantly the standard unix code) too.

As a matter of fact, Unix does not have this problem because it doesn't
recurse through a dir_make_parent function, and the existing check
already covers the EEXIST. And OS/2 doesn't have an
apr_dir_make_recursive implementation at all. So the problem existed
only on Windows.

I've added a test case to testdir.c to exercise this code in parallel.

-- Brane