You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Stefan Fuhrmann <st...@wandisco.com> on 2015/08/23 21:38:03 UTC

Issue #4588, part 1: FSFS access error

Garret Wilson had a terrible experience upgrading to SVN 1.9
(http://subversion.tigris.org/issues/show_bug.cgi?id=4588),
ranging from build problems over svnadmin load inconsistency
to a low-level FSFS format 7 read error.

This post focuses on the FSFS issue.  The good news is that
confidence is high that the repo itself is correct and there
has been no apparent data loss.  The error message indicates
that something like Offset=L2P[L2P[Item]] or Offset=L2P[Offset]
instead of Offset=L2P[item] happened.

Trying to reproduce the issue, I discovered bugs in our debug
logging code (only affects FSFS devs) and in svnfsfs load-index
(fix yet to be committed, work-around is possible).  None of
these explain the behavior Garret has been seeing.  I've also
been looking for places where svn_fs_fs__item_offset() gets
called but those invocations seem to never possibly do a
double lookup.

My current hypothesis is that the server did not get restarted
after replacing the repository.  Because we decided not to make
the instance ID part of the cache key, we could easily have
picked up cached format 6 data for the format 7 repository.

The lookup would start with a hit at the L2 DAG cache, mapping
path+rev onto a noderev.  From there, everything is accessed
using physical offsets until some item is not cached.  At that
point, we would be trying to use an Offset instead of the Item
index to address the f7 data.  Error.

-- Stefan^2.

Re: Issue #4588, part 1: FSFS access error

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Stefan Fuhrmann wrote on Sun, Aug 23, 2015 at 20:38:03 +0100:
> This post focuses on the FSFS issue.  The good news is that
> confidence is high that the repo itself is correct and there
> has been no apparent data loss.  The error message indicates
> that something like Offset=L2P[L2P[Item]] or Offset=L2P[Offset]
> instead of Offset=L2P[item] happened.
...
> The lookup would start with a hit at the L2 DAG cache, mapping
> path+rev onto a noderev.  From there, everything is accessed
> using physical offsets until some item is not cached.  At that
> point, we would be trying to use an Offset instead of the Item
> index to address the f7 data.  Error.

So somehow we managed to cast an offset into an item.  Can we make the
code more strongly-typed?

Where is the "L2 DAG cache" in the code?  I assume it's
ffd->rev_node_cache, not fs_fs_dag_cache_t, correct?

Re: Issue #4588, part 1: FSFS access error

Posted by Stefan Fuhrmann <st...@wandisco.com>.

On Tue, Aug 25, 2015 at 12:47 AM, Evgeny Kotkov <evgeny.kotkov@visualsvn.com
> wrote:

> Stefan Fuhrmann <st...@wandisco.com> writes:
>
> > My current hypothesis is that the server did not get restarted after
> > replacing the repository.  Because we decided not to make the instance ID
> > part of the cache key, we could easily have picked up cached format 6
> data
> > for the format 7 repository.
>
> [...]
>
> > That said, are we still happy with the decision to not make the instance
> > ID part of the cache key? The rationale has basically been "fail early"
> > because failure to restart or reconfigure the server after the repo files
> > got modified might lead to any kind of unknown problems (much) further
> down
> > the road.
>
> As I see it, there are two separate problems:
>
[...]

> 2) The second part of the problem, to my mind, is the offset and item-based
>    addressing.  Irrespectively of whether we use instance IDs in the cache
>    keys, or not, I find it rather questionable that the same entry in the
>    cache can mean two different things, depending on how you look at it.
>

3 different things, in fact. Two in format7, two in older formats.
The 3 different addressing modes are:

1. Absolute position ("phys") in a rev / pack file.
   This is where we need to navigate to when reading data.
2. Offset within a revision. phys = manifest[ref] + offset
3. Item number within a revision: phys = L2P[rev, item]

So, we always need to translate pairs of (rev, item-index)
to phys. SVN 1.9 uses "item-index" as the umbrella term
for case 2. and 3. Only a single function, svn_fs_fs__item_offset,
deals with the differences between the two - modulo the
format specific ways to find the rev root node and changed
paths list.

In that sense, the cache content not only *means* the
same thing ("this is item xyz") but even uses the same
generalized data type. The problem here is simply that
the cache contents becomes invalid as soon as history
is being rewritten.

>    What happens if we're unlucky enough, and the offset in the revision
> file
>    also is a valid index in the l2p lookup table?  Is there something we
> can
>    do about it — say, associate the addressing type with the corresponding
>    cache entry?
>

Since they are homogenous throughout a repository,  we
could simply add the repo format, addressing mode and
sharding size (manifest cache vs. resharding) to the cache
key prefix.

The practical implications would be similar to adding the
instance ID, except working for pre-f7 repos and being
less specific. So, if we decide to add the instance ID to
the cache key, we should add the other parts as well.

-- Stefan^2.

Re: Issue #4588, part 1: FSFS access error

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

(abbreviating "instance ID" to "IID")

Stefan Fuhrmann wrote on Tue, Aug 25, 2015 at 15:13:04 +0100:
>  On Tue, Aug 25, 2015 at 7:15 AM, Daniel Shahaf <d....@daniel.shahaf.name>
> wrote:
> 
> > Evgeny Kotkov wrote on Tue, Aug 25, 2015 at 02:47:16 +0300:
> > > Stefan Fuhrmann <st...@wandisco.com> writes:
> > >
> > > > My current hypothesis is that the server did not get restarted after
> > > > replacing the repository.  Because we decided not to make the instance
> > ID
> > > > part of the cache key, we could easily have picked up cached format 6
> > data
> > > > for the format 7 repository.
> > >
> > > [...]
> > >
> > > > That said, are we still happy with the decision to not make the
> > instance
> > > > ID part of the cache key? The rationale has basically been "fail early"
> > > > because failure to restart or reconfigure the server after the repo
> > files
> > > > got modified might lead to any kind of unknown problems (much) further
> > down
> > > > the road.
> > >
> > > As I see it, there are two separate problems:
> > >
> >
> > I see it the same way.
> >
> > > 1) First of all, I am thinking that we should indeed revisit the
> > decision of
> > >    not making instance IDs a part of the cache keys.  As far as I
> > recall, this
> > >    part of the change was reverted after we agreed that failing fast is
> > better
> > >    than narrowing the window when the cache issues exist [1] (there are
> > still
> > >    scenarios like backing up and restoring the repository with cp -a).
> > >
> > >    Now I am almost sure that we should redo this change.  The experience
> > of
> > >    receiving errors related to stale cache entries isn't exactly
> > educating,
> > >    but rather than that, it's frustrating,
> 
> 
> I think this is the core issue. If you do it wrong, you are
> likely to experience frustration at some later point, this
> is the very definition of doing it "wrong". So, if we want
> to spare the user this frustration without them changing
> their behaviour, we must be sure to *always* make it
> "right".

Well, yes, but another reasonable option would be to throw a clear and
unambiguous error as soon as the user has did something he shouldn't
have.  I imagine an error such as:

svn: E123456: Repository replaced without server restart; restart the server

... although I realize it might not be easy to detect that situation.
Could we re-read the IID under the commit write lock and compare it to
the cached value?  I.e., check that either the IID was unset at
svn_fs_open2() time and is still unset at the start of commit_body(), or
that it was set at svn_fs_open2() time and is still set to the same
value on disk?  That wouldn't be bulletproof,¹ but it should narrow the
window for the race.

¹ For example, someone could replace the repository while commit_body()
is running after it checked the IID.

> > >    and the procedures describing dump,
> > >    load and hotcopy rarely say something about a server restart.
> 
> 
> Yes, the SVN book is not very explicit about that and
> the release notes have been amended only yesterday.
> 
> In the past, our thinking has been along the lines of "It's a
> DB server and who in their right mind would fiddle with
> the disk data and expect everything to work just fine?"
> To an occasional user, it may be much less clear that e.g.
> a tool called "svnadmin" does NOT interact with running
> servers whenever necessary.
> 

(See below.)

> > What are the downsides to having instance IDs as part of the cache key?
> > I haven't been able to understand that from your post or from Ben's post
> > you link to (53F4C92B.2010206@reser.org), which only mentions a "failure
> > to clear the cache race that was discussed offlist".
> >
> > Is this the problem scenario? ---
> >
> > 1. Open an svn_fs_t handle
> >
> > 2. Replace the repository with another repository having the same UUID
> >    but different contents
> >
> > 3. Make a commit from the svn_fs_t opened in (1)
> >
> > 4. The commit creates a corrupted revision due to stale (false positive)
> >    cache hits
> >
> 
> That is the worst-case scenario. By adding the instance ID
> to the cache key, the only race I can think of is:
> 
> * open svn_fs_t, read repo format (sharding!) and config
> * replace repo on disk
> * begin txn
> * ...
> * commit
> 
> SVN 1.9 rereads the format upon commit, *probably* making
> the commit fail if things are inconsistent now (txn and rev paths).
> If the repo gets replaced earlier that it's obviously no problem.
> If it was replaced later, the TXN directory will very likely be
> missing.
> 
> However, there is more state than caches. So, here are two
> counter-arguments to relying on the instance ID:
> 
> * Client connections still become invalid, i.e. reports in flight
>   are likely to error out, future requests may or may not result
>   in an error.

I don't see a way we can make it work in the general case.  For example,
even without IIDs, we would still error out if the repository is
replaced with an 'svndumpfilter exclude'ed copy of itself, and somebody
is checking out the excluded path.

I don't think there's an easy solution here.  The FSFS code is designed
around the assumption that history is immutable, and replacing the
repository under a server's feet breaks that assumption.  We can make
the cache staleness issues harder to trigger, but we cannot prevent them
entirely.

Perhaps we can do something similar to the 'revprop generation' trick?
I.e., have all the revision data caches keyed by a 'revision generation'
number as well, which 'svnadmin create' and 'svnadmin load' increment?
(I'm not entirely sure that makes sense; I'm just saying, the problem of
replacing revision data is comparable to the problem of detecting
staleness of the in-memory revprop cache.)

>

Another thought: opening revision files using openat(2) instead of
open()-by-name might handle a few more repository-replacement scenarios.
(Those in which the original REPOS_DIR is moved or deleted, rather than
overwritten in-place.)  It's another way to make the "revision data
caches are stale" issue harder to hit.

> * The instance ID is only available in format 7 repositories.
>   The problem exists in any dump/load scenario - even if
>   the format number does not change.
> 
> The first one is an inconvenience, the second one is somewhat
> stronger.
> 

Cheers,

Daniel

> -- Stefan^2.

Re: Issue #4588, part 1: FSFS access error

Posted by Stefan Fuhrmann <st...@wandisco.com>.

 On Tue, Aug 25, 2015 at 7:15 AM, Daniel Shahaf <d....@daniel.shahaf.name>
wrote:

> Evgeny Kotkov wrote on Tue, Aug 25, 2015 at 02:47:16 +0300:
> > Stefan Fuhrmann <st...@wandisco.com> writes:
> >
> > > My current hypothesis is that the server did not get restarted after
> > > replacing the repository.  Because we decided not to make the instance
> ID
> > > part of the cache key, we could easily have picked up cached format 6
> data
> > > for the format 7 repository.
> >
> > [...]
> >
> > > That said, are we still happy with the decision to not make the
> instance
> > > ID part of the cache key? The rationale has basically been "fail early"
> > > because failure to restart or reconfigure the server after the repo
> files
> > > got modified might lead to any kind of unknown problems (much) further
> down
> > > the road.
> >
> > As I see it, there are two separate problems:
> >
>
> I see it the same way.
>
> > 1) First of all, I am thinking that we should indeed revisit the
> decision of
> >    not making instance IDs a part of the cache keys.  As far as I
> recall, this
> >    part of the change was reverted after we agreed that failing fast is
> better
> >    than narrowing the window when the cache issues exist [1] (there are
> still
> >    scenarios like backing up and restoring the repository with cp -a).
> >
> >    Now I am almost sure that we should redo this change.  The experience
> of
> >    receiving errors related to stale cache entries isn't exactly
> educating,
> >    but rather than that, it's frustrating,

I think this is the core issue. If you do it wrong, you are
likely to experience frustration at some later point, this
is the very definition of doing it "wrong". So, if we want
to spare the user this frustration without them changing
their behaviour, we must be sure to *always* make it
"right".

> >    and the procedures describing dump,
> >    load and hotcopy rarely say something about a server restart.

Yes, the SVN book is not very explicit about that and
the release notes have been amended only yesterday.

In the past, our thinking has been along the lines of "It's a
DB server and who in their right mind would fiddle with
the disk data and expect everything to work just fine?"
To an occasional user, it may be much less clear that e.g.
a tool called "svnadmin" does NOT interact with running
servers whenever necessary.

> >    As we have
> >    the mechanism to make a huge set of cases work without the necessity
> of
> >    performing additional actions, I think that we should do it, leaving
> the
> >    edge cases as they are.  In other words, people who are used to
> svnadmin
> >    dump/load/hotcopy shouldn't struggle, because we think that they also
> could
> >    be doing something like the aforementioned cp -a.
>

Well, getting all cases that involve svnadmin right would
be enough in my book. People actually fiddling with
repository details on disk need to know what they are
doing. Those are the same kind of people that need to
know about the rep-cache.db.

We would also need an explicit way to bump the
instance ID to support "restore" scenarios to cover the
full range of admin activity. Maybe some cache reset
UI on the server would be useful as well.

> What are the downsides to having instance IDs as part of the cache key?
> I haven't been able to understand that from your post or from Ben's post
> you link to (53F4C92B.2010206@reser.org), which only mentions a "failure
> to clear the cache race that was discussed offlist".
>
> Is this the problem scenario? ---
>
> 1. Open an svn_fs_t handle
>
> 2. Replace the repository with another repository having the same UUID
>    but different contents
>
> 3. Make a commit from the svn_fs_t opened in (1)
>
> 4. The commit creates a corrupted revision due to stale (false positive)
>    cache hits
>

That is the worst-case scenario. By adding the instance ID
to the cache key, the only race I can think of is:

* open svn_fs_t, read repo format (sharding!) and config
* replace repo on disk
* begin txn
* ...
* commit

SVN 1.9 rereads the format upon commit, *probably* making
the commit fail if things are inconsistent now (txn and rev paths).
If the repo gets replaced earlier that it's obviously no problem.
If it was replaced later, the TXN directory will very likely be
missing.

However, there is more state than caches. So, here are two
counter-arguments to relying on the instance ID:

* Client connections still become invalid, i.e. reports in flight
  are likely to error out, future requests may or may not result
  in an error.
* The instance ID is only available in format 7 repositories.
  The problem exists in any dump/load scenario - even if
  the format number does not change.

The first one is an inconvenience, the second one is somewhat
stronger.

-- Stefan^2.

Re: Issue #4588, part 1: FSFS access error

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Evgeny Kotkov wrote on Tue, Aug 25, 2015 at 02:47:16 +0300:
> Stefan Fuhrmann <st...@wandisco.com> writes:
> 
> > My current hypothesis is that the server did not get restarted after
> > replacing the repository.  Because we decided not to make the instance ID
> > part of the cache key, we could easily have picked up cached format 6 data
> > for the format 7 repository.
> 
> [...]
> 
> > That said, are we still happy with the decision to not make the instance
> > ID part of the cache key? The rationale has basically been "fail early"
> > because failure to restart or reconfigure the server after the repo files
> > got modified might lead to any kind of unknown problems (much) further down
> > the road.
> 
> As I see it, there are two separate problems:
> 

I see it the same way.

> 1) First of all, I am thinking that we should indeed revisit the decision of
>    not making instance IDs a part of the cache keys.  As far as I recall, this
>    part of the change was reverted after we agreed that failing fast is better
>    than narrowing the window when the cache issues exist [1] (there are still
>    scenarios like backing up and restoring the repository with cp -a).
> 
>    Now I am almost sure that we should redo this change.  The experience of
>    receiving errors related to stale cache entries isn't exactly educating,
>    but rather than that, it's frustrating, and the procedures describing dump,
>    load and hotcopy rarely say something about a server restart.  As we have
>    the mechanism to make a huge set of cases work without the necessity of
>    performing additional actions, I think that we should do it, leaving the
>    edge cases as they are.  In other words, people who are used to svnadmin
>    dump/load/hotcopy shouldn't struggle, because we think that they also could
>    be doing something like the aforementioned cp -a.

What are the downsides to having instance IDs as part of the cache key?
I haven't been able to understand that from your post or from Ben's post
you link to (53F4C92B.2010206@reser.org), which only mentions a "failure
to clear the cache race that was discussed offlist".

Is this the problem scenario? ---

1. Open an svn_fs_t handle

2. Replace the repository with another repository having the same UUID
   but different contents

3. Make a commit from the svn_fs_t opened in (1)

4. The commit creates a corrupted revision due to stale (false positive)
   cache hits

?

Thanks,

Daniel

> [1] http://svn.haxx.se/dev/archive-2014-08/0239.shtml
> 
> 
> Regards,
> Evgeny Kotkov
>

Re: Issue #4588, part 1: FSFS access error

Posted by Evgeny Kotkov <ev...@visualsvn.com>.

Stefan Fuhrmann <st...@wandisco.com> writes:

> My current hypothesis is that the server did not get restarted after
> replacing the repository.  Because we decided not to make the instance ID
> part of the cache key, we could easily have picked up cached format 6 data
> for the format 7 repository.

[...]

> That said, are we still happy with the decision to not make the instance
> ID part of the cache key? The rationale has basically been "fail early"
> because failure to restart or reconfigure the server after the repo files
> got modified might lead to any kind of unknown problems (much) further down
> the road.

As I see it, there are two separate problems:

1) First of all, I am thinking that we should indeed revisit the decision of
   not making instance IDs a part of the cache keys.  As far as I recall, this
   part of the change was reverted after we agreed that failing fast is better
   than narrowing the window when the cache issues exist [1] (there are still
   scenarios like backing up and restoring the repository with cp -a).

   Now I am almost sure that we should redo this change.  The experience of
   receiving errors related to stale cache entries isn't exactly educating,
   but rather than that, it's frustrating, and the procedures describing dump,
   load and hotcopy rarely say something about a server restart.  As we have
   the mechanism to make a huge set of cases work without the necessity of
   performing additional actions, I think that we should do it, leaving the
   edge cases as they are.  In other words, people who are used to svnadmin
   dump/load/hotcopy shouldn't struggle, because we think that they also could
   be doing something like the aforementioned cp -a.

2) The second part of the problem, to my mind, is the offset and item-based
   addressing.  Irrespectively of whether we use instance IDs in the cache
   keys, or not, I find it rather questionable that the same entry in the
   cache can mean two different things, depending on how you look at it.

   What happens if we're unlucky enough, and the offset in the revision file
   also is a valid index in the l2p lookup table?  Is there something we can
   do about it — say, associate the addressing type with the corresponding
   cache entry?

[1] http://svn.haxx.se/dev/archive-2014-08/0239.shtml

Regards,
Evgeny Kotkov

Re: Issue #4588, part 1: FSFS access error

Posted by Stefan Fuhrmann <st...@wandisco.com>.

On Sun, Aug 23, 2015 at 8:38 PM, Stefan Fuhrmann <
stefan.fuhrmann@wandisco.com> wrote:

> My current hypothesis is that the server did not get restarted
> after replacing the repository.  Because we decided not to make
> the instance ID part of the cache key, we could easily have
> picked up cached format 6 data for the format 7 repository.
>
> The lookup would start with a hit at the L2 DAG cache, mapping
> path+rev onto a noderev.  From there, everything is accessed
> using physical offsets until some item is not cached.  At that
> point, we would be trying to use an Offset instead of the Item
> index to address the f7 data.  Error.
>

Garret just confirmed that he indeed (very likely) did not restart
Apache. So, this is technically known / expected behaviour.

That said, are we still happy with the decision to not make the
instance ID part of the cache key? The rationale has basically
been "fail early" because failure to restart or reconfigure the
server after the repo files got modified might lead to any kind
of unknown problems (much) further down the road.

-- Stefan^2.