You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Johan Corveleyn <jc...@gmail.com> on 2014/05/06 01:24:19 UTC

upgrade_tests.py 29 spurious failure while testing 1.7.17

Sigh. Once more I had a non-reproducible test failure while testing
1.7.17. It happened while running the testsuite over ra_svn + FSFS
(over ra_local there was no problem).

[[[
CMD: svnadmin.exe create svn-test-work\repositories\upgrade_tests-29
--bdb-txn-nosync --fs-type=fsfs
<TIME = 0.110000>
CMD: svnadmin.exe dump svn-test-work\local_tmp\repos | svnadmin.exe
load svn-test-work\repositories\upgrade_tests-29 --ignore-uuid
<TIME = 0.016000>
CMD: svn.exe upgrade svn-test-work\working_copies\upgrade_tests-29
--config-dir R:\test\subversion\tests\cmdline\svn-test-work\local_tmp\config
--password rayjandom --no-auth-cache --username jrandom
<TIME = 1.937000>
Upgraded 'svn-test-work\working_copies\upgrade_tests-29'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A\B'
Skipped 'svn-test-work\working_copies\upgrade_tests-29\A\B\E'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A\B\F'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A\C'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A\D'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A\D\G'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A\D\H'
CMD: svnadmin.exe setuuid svn-test-work\repositories\upgrade_tests-29
d7130b12-92f6-45c9-9217-b9f0472c3fab
<TIME = 0.078000>
CMD: svn.exe relocate file:///tmp/repo
svn://localhost/svn-test-work/repositories/upgrade_tests-29
svn-test-work\working_copies\upgrade_tests-29 --config-dir
R:\test\subversion\tests\cmdline\svn-test-work\local_tmp\config
--password rayjandom --no-auth-cache --username jrandom
<TIME = 0.110000>
CMD: svn.exe up svn-test-work\working_copies\upgrade_tests-29
--config-dir R:\test\subversion\tests\cmdline\svn-test-work\local_tmp\config
--password rayjandom --no-auth-cache --username jrandom
CMD: R:\test\subversion\svn\svn.exe up
svn-test-work\working_copies\upgrade_tests-29 --config-dir
R:\test\subversion\tests\cmdline\svn-test-work\local_tmp\config
--password rayjandom --no-auth-cache --username jrandom exited with 1
<TIME = 0.125000>
Updating 'svn-test-work\working_copies\upgrade_tests-29':
Restored 'svn-test-work\working_copies\upgrade_tests-29\A\B\E'
svn: E160004: Corrupt node-revision '0.0.r1/4198'
svn: E160004: Missing id field in node-rev
Traceback (most recent call last):
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\svntest\main.py",
line 1319, in run
    rc = self.pred.run(sandbox)
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\svntest\testcase.py",
line 176, in run
    return self.func(sandbox)
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\upgrade_tests.py",
line 1228, in upgrade_missing_replaced
    None, expected_status)
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\svntest\actions.py",
line 844, in run_and_verify_update
    *args)
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\svntest\main.py",
line 601, in run_svn
    *(_with_auth(_with_config_dir(varargs))))
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\svntest\main.py",
line 350, in run_command
    None, *varargs)
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\svntest\main.py",
line 526, in run_command_stdin
    raise Failure
Failure
FAIL:  upgrade_tests.py 29: upgrade with missing replaced dir
]]]

As always, I tested with Windows XP (it's end of life, I know ...
whatever) on a ramdisk, non-parellel.

This time I took a copy of repository and working copy before
rerunning the test :-). See attachment. Can anyone shed some light on
this?

I experimented a bit further with a copy of the repository and working
copy of this failed test:
- svnadmin verify says everything is ok.
- a new svn checkout over svn:// works fine.
- executing the failing "svn up" command (the last command of the
failure output) on that particular working copy, talking with that
particular repository over svn:// ... no problem.

So I'm at a loss here. I don't see any corruption, yet the test reported it.

Perhaps some kind of cache corruption is a possibility? A theory would
be nice ... anything really.

-- 
Johan

Re: upgrade_tests.py 29 spurious failure while testing 1.7.17

Posted by Johan Corveleyn <jc...@gmail.com>.
On Tue, May 13, 2014 at 5:58 PM, Ben Reser <be...@reser.org> wrote:
> On 5/13/14, 12:43 AM, Johan Corveleyn wrote:
>> First, thanks a lot for taking a look and giving a plausible
>> explanation. It's a possibility, but I'm not fully convinced yet :-).
>>
>> Pro:
>> - It fits theoretically (the one bit off etc).
>> - It's the only explanation so far. And IIUC cache corruption is ruled out.
>> - The machine is getting old (almost 8 years now -- I think the memory
>> is 5 or 6 years old). Its operating system (WinXP) is EOL.
>>
>> Con:
>> - I've had zero stability issues with my machine so far. No crashes,
>> no bluescreens. Not one for as far as I can remember.
>> - I've been testing / signing svn releases for a couple of years. No
>> problem, until the last two release cycles or so.
>> - Ran memtest86 (version 4.0.0 that I still had on some boot CD) last
>> night. It ran for 8 hours. No errors.
>>
>> So either my machine really has a memory problem, or it's a unique
>> machine that can (rarely) reproduce a bug in Subversion. I'm still not
>> sure. If it's the latter it would be a waste to throw it in the trash
>> :-). OTOH, if it's such a rare issue that nobody else is seeing this,
>> maybe it's not worth further precious time (of me and you and others)
>> ...
>>
>> I'll continue pounding it a bit more, but I'll probably give up at
>> some point (not determined yet).
>
> There are two possibilities that while also still memory corruption don't
> necessarily mean there's anything wrong with your memory or Subversion.  These
> are kinda out there but plausible.
>
> 1) An alpha particle hit your machine and flipped the bit.  If you aren't using
> ECC memory this is more likely than if you aren't.  See this question on stack
> overflow which has a bunch of useful links:
> http://stackoverflow.com/questions/4109218/do-gamma-rays-from-the-sun-really-flip-bits-every-once-in-a-while
>
> 2) The kernel on XP had a bug and flipped the bit in your memory.  This one
> seems less plausible.
>
> But back to the original theory.  Running memtest86 for 8 hours isn't
> necessarily indicative of no hardware problems.  The problem with memory issues
> is that they often cause problems in a bit when another nearby piece of memory
> is manipulated in a particular way.  Memtest86 is aware of these things and
> tries to run the patterns that cause the issues, but it takes a long time to
> try them all.  If this is the case you'd actually be able to reproduce the
> problem but only if the memory positions lined up just right again.
>
> At this point I'd stop bothering to try and find the cause.  If you get further
> test failures or strange behaviors then I'd look into it further.
>

OK, thanks for further theorizing :-).

I ran memtest86 some more (memtest86+ 4.20 this time), for 44 hours
straight (31 passes). No problem.

Now, I don't really consider the alpha particle theory as very
plausible. What especially bothers me is that until now I have only
ever had problems while running SVN tests. Not with other applications
nor random crashes or something like that.

I've gone back and listed my "non-reproducible fail history":

* 2011/09/13 1.7.0-rc3 OK
* ... (various 1.6 and 1.7 releases)
* 2013/07/18 1.7.11 OK
* 2013/07/19 1.8.1 OK
* 2013/08/27 1.7.13 OK
* 2013/11/19 1.8.5 OK
* 2013/11/20 1.7.14 OK
* 2014/02/06 1.8.6 FAIL [1]
* 2014/02/14 1.8.8 FAIL [2]
* 2014/02/21 1.7.16 OK
* 2014/02/22 1.9.0-alpha1 OK
* 2014/03/19 1.9.0-alpha2 OK
* 2014/04/30 1.8.9 OK
* 2014/05/06 1.7.17 FAIL [3]

But I think I've overlooked an essential component: I'm running these
tests on a ramdisk. That's something distinct for my svn testsuite
runs, I don't use the ramdisk for anything else. Maybe it's not the
RAM itself that's failing, but rather the Ramdisk implementation on
top of it.

I'm using Dataram RAMDisk 3.5.130 (with a 350 MB NTFS partition),
quite an old version of that software. It's possible that I might have
changed something to the Ramdisk configuration around end 2013,
beginning 2014. I can't quite remember it, but it sounds like a good
candidate for causing this problem ...

I'm going to leave it at that for now. I'll probably get back to svn
testing when I buy myself a new laptop (whenever that is).

-- 
Johan

[1] http://svn.haxx.se/dev/archive-2014-02/0084.shtml
This happened with ra_local (with fsfs).

[[[
W: ERROR:  dump failed; did not see revision 5
W: CWD: R:\test\subversion\tests\cmdline
W: EXCEPTION: SVNRepositoryCopyFailure
Traceback (most recent call last):
  File "C:\research\svn\client_build\subversion-1.8.6\subversion\tests\cmdline\svntest\main.py",
line 1550, in run
    rc = self.pred.run(sandbox)
  File "C:\research\svn\client_build\subversion-1.8.6\subversion\tests\cmdline\svntest\testcase.py",
line 176, in run
    return self.func(sandbox)
  File "C:\research\svn\client_build\subversion-1.8.6\subversion\tests\cmdline\externals_tests.py",
line 666, in disallow_dot_or_dotdot_directory_reference
    external_url_for = externals_test_setup(sbox)
  File "C:\research\svn\client_build\subversion-1.8.6\subversion\tests\cmdline\externals_tests.py",
line 155, in externals_test_setup
    svntest.main.copy_repos(repo_dir, other_repo_dir, 5)
  File "C:\research\svn\client_build\subversion-1.8.6\subversion\tests\cmdline\svntest\main.py",
line 1030, in copy_repos
    raise SVNRepositoryCopyFailure
SVNRepositoryCopyFailure
FAIL:  externals_tests.py 8: error if external target dir involves '.' or '..'
]]]


[2] http://svn.haxx.se/dev/archive-2014-02/0177.shtml
While testing ra_local x fsfs:

[[[
W: CWD: R:\test\subversion\tests\cmdline\svn-test-work\working_copies\tree_conflict_tests-10
W: EXCEPTION: SVNExpectedStdout
Traceback (most recent call last):
  File "C:\research\svn\client_build\subversion-1.8.8\subversion\tests\cmdline\svntest\main.py",
line 1550, in run
    rc = self.pred.run(sandbox)
  File "C:\research\svn\client_build\subversion-1.8.8\subversion\tests\cmdline\svntest\testcase.py",
line 176, in run
    return self.func(sandbox)
  File "C:\research\svn\client_build\subversion-1.8.8\subversion\tests\cmdline\tree_conflict_tests.py",
line 685, in merge_file_del_onto_not_same
    commit_local_mods=True)
  File "C:\research\svn\client_build\subversion-1.8.8\subversion\tests\cmdline\tree_conflict_tests.py",
line 440, in ensure_tree_conflict
    '-m', 'Mods in target branch.')
  File "C:\research\svn\client_build\subversion-1.8.8\subversion\tests\cmdline\svntest\actions.py",
line 282, in run_and_verify_svn
    expected_exit, *varargs)
  File "C:\research\svn\client_build\subversion-1.8.8\subversion\tests\cmdline\svntest\actions.py",
line 321, in run_and_verify_svn2
    verify.verify_outputs(message, out, err, expected_stdout, expected_stderr)
  File "C:\research\svn\client_build\subversion-1.8.8\subversion\tests\cmdline\svntest\verify.py",
line 445, in verify_outputs
    compare_and_display_lines(message, label, expected, actual, raisable)
  File "C:\research\svn\client_build\subversion-1.8.8\subversion\tests\cmdline\svntest\verify.py",
line 418, in compare_and_display_lines
    raise raisable
SVNExpectedStdout
FAIL:  tree_conflict_tests.py 10: merge file: del/rpl/mv onto not-same
]]]


[3] http://svn.haxx.se/dev/archive-2014-05/0037.shtml
While running the testsuite over ra_svn + FSFS (over ra_local there
was no problem).

[[[
CMD: svnadmin.exe create svn-test-work\repositories\upgrade_tests-29
--bdb-txn-nosync --fs-type=fsfs
<TIME = 0.110000>
CMD: svnadmin.exe dump svn-test-work\local_tmp\repos | svnadmin.exe
load svn-test-work\repositories\upgrade_tests-29 --ignore-uuid
<TIME = 0.016000>
CMD: svn.exe upgrade svn-test-work\working_copies\upgrade_tests-29
--config-dir R:\test\subversion\tests\cmdline\svn-test-work\local_tmp\config
--password rayjandom --no-auth-cache --username jrandom
<TIME = 1.937000>
Upgraded 'svn-test-work\working_copies\upgrade_tests-29'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A\B'
Skipped 'svn-test-work\working_copies\upgrade_tests-29\A\B\E'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A\B\F'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A\C'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A\D'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A\D\G'
Upgraded 'svn-test-work\working_copies\upgrade_tests-29\A\D\H'
CMD: svnadmin.exe setuuid svn-test-work\repositories\upgrade_tests-29
d7130b12-92f6-45c9-9217-b9f0472c3fab
<TIME = 0.078000>
CMD: svn.exe relocate file:///tmp/repo
svn://localhost/svn-test-work/repositories/upgrade_tests-29
svn-test-work\working_copies\upgrade_tests-29 --config-dir
R:\test\subversion\tests\cmdline\svn-test-work\local_tmp\config
--password rayjandom --no-auth-cache --username jrandom
<TIME = 0.110000>
CMD: svn.exe up svn-test-work\working_copies\upgrade_tests-29
--config-dir R:\test\subversion\tests\cmdline\svn-test-work\local_tmp\config
--password rayjandom --no-auth-cache --username jrandom
CMD: R:\test\subversion\svn\svn.exe up
svn-test-work\working_copies\upgrade_tests-29 --config-dir
R:\test\subversion\tests\cmdline\svn-test-work\local_tmp\config
--password rayjandom --no-auth-cache --username jrandom exited with 1
<TIME = 0.125000>
Updating 'svn-test-work\working_copies\upgrade_tests-29':
Restored 'svn-test-work\working_copies\upgrade_tests-29\A\B\E'
svn: E160004: Corrupt node-revision '0.0.r1/4198'
svn: E160004: Missing id field in node-rev
Traceback (most recent call last):
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\svntest\main.py",
line 1319, in run
    rc = self.pred.run(sandbox)
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\svntest\testcase.py",
line 176, in run
    return self.func(sandbox)
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\upgrade_tests.py",
line 1228, in upgrade_missing_replaced
    None, expected_status)
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\svntest\actions.py",
line 844, in run_and_verify_update
    *args)
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\svntest\main.py",
line 601, in run_svn
    *(_with_auth(_with_config_dir(varargs))))
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\svntest\main.py",
line 350, in run_command
    None, *varargs)
  File "C:\research\svn\client_build\subversion-1.7.17\subversion\tests\cmdline\svntest\main.py",
line 526, in run_command_stdin
    raise Failure
Failure
FAIL:  upgrade_tests.py 29: upgrade with missing replaced dir
]]]

Re: upgrade_tests.py 29 spurious failure while testing 1.7.17

Posted by Ben Reser <be...@reser.org>.
On 5/13/14, 12:43 AM, Johan Corveleyn wrote:
> First, thanks a lot for taking a look and giving a plausible
> explanation. It's a possibility, but I'm not fully convinced yet :-).
> 
> Pro:
> - It fits theoretically (the one bit off etc).
> - It's the only explanation so far. And IIUC cache corruption is ruled out.
> - The machine is getting old (almost 8 years now -- I think the memory
> is 5 or 6 years old). Its operating system (WinXP) is EOL.
> 
> Con:
> - I've had zero stability issues with my machine so far. No crashes,
> no bluescreens. Not one for as far as I can remember.
> - I've been testing / signing svn releases for a couple of years. No
> problem, until the last two release cycles or so.
> - Ran memtest86 (version 4.0.0 that I still had on some boot CD) last
> night. It ran for 8 hours. No errors.
> 
> So either my machine really has a memory problem, or it's a unique
> machine that can (rarely) reproduce a bug in Subversion. I'm still not
> sure. If it's the latter it would be a waste to throw it in the trash
> :-). OTOH, if it's such a rare issue that nobody else is seeing this,
> maybe it's not worth further precious time (of me and you and others)
> ...
> 
> I'll continue pounding it a bit more, but I'll probably give up at
> some point (not determined yet).

There are two possibilities that while also still memory corruption don't
necessarily mean there's anything wrong with your memory or Subversion.  These
are kinda out there but plausible.

1) An alpha particle hit your machine and flipped the bit.  If you aren't using
ECC memory this is more likely than if you aren't.  See this question on stack
overflow which has a bunch of useful links:
http://stackoverflow.com/questions/4109218/do-gamma-rays-from-the-sun-really-flip-bits-every-once-in-a-while

2) The kernel on XP had a bug and flipped the bit in your memory.  This one
seems less plausible.

But back to the original theory.  Running memtest86 for 8 hours isn't
necessarily indicative of no hardware problems.  The problem with memory issues
is that they often cause problems in a bit when another nearby piece of memory
is manipulated in a particular way.  Memtest86 is aware of these things and
tries to run the patterns that cause the issues, but it takes a long time to
try them all.  If this is the case you'd actually be able to reproduce the
problem but only if the memory positions lined up just right again.

At this point I'd stop bothering to try and find the cause.  If you get further
test failures or strange behaviors then I'd look into it further.


Re: upgrade_tests.py 29 spurious failure while testing 1.7.17

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, May 12, 2014 at 9:51 PM, Ben Reser <be...@reser.org> wrote:
> On 5/5/14, 4:24 PM, Johan Corveleyn wrote:
>> As always, I tested with Windows XP (it's end of life, I know ...
>> whatever) on a ramdisk, non-parellel.
>>
>> This time I took a copy of repository and working copy before
>> rerunning the test :-). See attachment. Can anyone shed some light on
>> this?
>>
>> I experimented a bit further with a copy of the repository and working
>> copy of this failed test:
>> - svnadmin verify says everything is ok.
>> - a new svn checkout over svn:// works fine.
>> - executing the failing "svn up" command (the last command of the
>> failure output) on that particular working copy, talking with that
>> particular repository over svn:// ... no problem.
>>
>> So I'm at a loss here. I don't see any corruption, yet the test reported it.
>>
>> Perhaps some kind of cache corruption is a possibility? A theory would
>> be nice ... anything really.
>
> I strongly suspect there is something wrong with your machine (memory going
> bad?).  The repository is nothing more than a dump/load from the greek tree.
> After a dump/load the repository has the UUID set.  No other modifications
> happen to the repo and the only access to the repository via the server is the
> update command that's failing.  That rules out a problem with caching because
> the cache should be entirely cold for this repository when the update command runs.
>
> The error you're getting is:
> svn: E160004: Corrupt node-revision '0.0.r1/4198'
> svn: E160004: Missing id field in node-rev
>
> The closest id in the repository is this: 0.0.r1/4206
>
> The number after the slash is the offset which is stored in the private portion
> of the svn_fs_id_t.  The offset is stored as a apr_off_t (i.e. not a string but
> a integer).
>
> Looking at the offsets in binary yields (leading zeros ommitted):
>
> 4206 = 1 0000 0110 1110
> 4198 = 1 0000 0110 0110
>
> Note that they are off by exactly bit.
>
> A memory issue would probably be very hard to reproduce.  So this seems to fit
> with the issues you've been having.  Combine that with the fact that you've
> been having unreproducible test failures in other places with this setup.  I
> have to conclude you have issues with your memory.  I'd suggest running
> memtest86 on the machine.

First, thanks a lot for taking a look and giving a plausible
explanation. It's a possibility, but I'm not fully convinced yet :-).

Pro:
- It fits theoretically (the one bit off etc).
- It's the only explanation so far. And IIUC cache corruption is ruled out.
- The machine is getting old (almost 8 years now -- I think the memory
is 5 or 6 years old). Its operating system (WinXP) is EOL.

Con:
- I've had zero stability issues with my machine so far. No crashes,
no bluescreens. Not one for as far as I can remember.
- I've been testing / signing svn releases for a couple of years. No
problem, until the last two release cycles or so.
- Ran memtest86 (version 4.0.0 that I still had on some boot CD) last
night. It ran for 8 hours. No errors.

So either my machine really has a memory problem, or it's a unique
machine that can (rarely) reproduce a bug in Subversion. I'm still not
sure. If it's the latter it would be a waste to throw it in the trash
:-). OTOH, if it's such a rare issue that nobody else is seeing this,
maybe it's not worth further precious time (of me and you and others)
...

I'll continue pounding it a bit more, but I'll probably give up at
some point (not determined yet).

-- 
Johan

Re: upgrade_tests.py 29 spurious failure while testing 1.7.17

Posted by Ben Reser <be...@reser.org>.
On 5/5/14, 4:24 PM, Johan Corveleyn wrote:
> As always, I tested with Windows XP (it's end of life, I know ...
> whatever) on a ramdisk, non-parellel.
> 
> This time I took a copy of repository and working copy before
> rerunning the test :-). See attachment. Can anyone shed some light on
> this?
>
> I experimented a bit further with a copy of the repository and working
> copy of this failed test:
> - svnadmin verify says everything is ok.
> - a new svn checkout over svn:// works fine.
> - executing the failing "svn up" command (the last command of the
> failure output) on that particular working copy, talking with that
> particular repository over svn:// ... no problem.
> 
> So I'm at a loss here. I don't see any corruption, yet the test reported it.
> 
> Perhaps some kind of cache corruption is a possibility? A theory would
> be nice ... anything really.

I strongly suspect there is something wrong with your machine (memory going
bad?).  The repository is nothing more than a dump/load from the greek tree.
After a dump/load the repository has the UUID set.  No other modifications
happen to the repo and the only access to the repository via the server is the
update command that's failing.  That rules out a problem with caching because
the cache should be entirely cold for this repository when the update command runs.

The error you're getting is:
svn: E160004: Corrupt node-revision '0.0.r1/4198'
svn: E160004: Missing id field in node-rev

The closest id in the repository is this: 0.0.r1/4206

The number after the slash is the offset which is stored in the private portion
of the svn_fs_id_t.  The offset is stored as a apr_off_t (i.e. not a string but
a integer).

Looking at the offsets in binary yields (leading zeros ommitted):

4206 = 1 0000 0110 1110
4198 = 1 0000 0110 0110

Note that they are off by exactly bit.

A memory issue would probably be very hard to reproduce.  So this seems to fit
with the issues you've been having.  Combine that with the fact that you've
been having unreproducible test failures in other places with this setup.  I
have to conclude you have issues with your memory.  I'd suggest running
memtest86 on the machine.