You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@subversion.apache.org by st...@apache.org on 2010/12/01 01:15:11 UTC

svn commit: r1040832 - /subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c

Author: stefan2
Date: Wed Dec  1 00:15:11 2010
New Revision: 1040832

URL: http://svn.apache.org/viewvc?rev=1040832&view=rev
Log:
Port (not merge) a fix for a FSFS packing race condition from the
performance branch to /trunk: There is a slight time window
between finding the name of a rev file and actually open it. If
the revision in question gets packed just within this window,
we will find the new (and final) file name with just one retry.

* subversion/libsvn_fs_fs/fs_fs.c
  (open_pack_or_rev_file): retry once upon "missing file" error

Modified:
    subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c

Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1040832&r1=1040831&r2=1040832&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original)
+++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Wed Dec  1 00:15:11 2010
@@ -1903,21 +1903,42 @@ open_pack_or_rev_file(apr_file_t **file,
 {
   svn_error_t *err;
   const char *path;
+  svn_boolean_t retry = FALSE;
 
-  err = svn_fs_fs__path_rev_absolute(&path, fs, rev, pool);
+  do
+    {
+      err = svn_fs_fs__path_rev_absolute(&path, fs, rev, pool);
 
-  if (! err)
-    err = svn_io_file_open(file, path,
-                           APR_READ | APR_BUFFERED, APR_OS_DEFAULT, pool);
+      /* open the revision file in buffered r/o mode */
+      if (! err)
+        err = svn_io_file_open(file, path,
+                              APR_READ | APR_BUFFERED, APR_OS_DEFAULT, pool);
 
-  if (err && APR_STATUS_IS_ENOENT(err->apr_err))
-    {
-      svn_error_clear(err);
-      return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL,
-                               _("No such revision %ld"), rev);
+      if (err && APR_STATUS_IS_ENOENT(err->apr_err))
+        {
+          /* Could not open the file. This may happen if the
+           * file once existed but got packed later. */
+          svn_error_clear(err);
+
+          /* if that was our 2nd attempt, leave it at that. */
+          if (retry)
+            return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL,
+                                    _("No such revision %ld"), rev);
+
+          /* we failed for the first time. Refresh cache & retry. */
+          SVN_ERR(update_min_unpacked_rev(fs, pool));
+
+          retry = TRUE;
+        }
+      else
+        {
+          /* the file exists but something prevented us from opnening it */
+          return svn_error_return(err);
+        }
     }
+  while (err);
 
-  return svn_error_return(err);
+  return SVN_NO_ERROR;
 }
 
 /* Given REV in FS, set *REV_OFFSET to REV's offset in the packed file.

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Julian Foad wrote on Tue, Dec 07, 2010 at 09:40:07 +0000:
> On Mon, 2010-12-06 at 18:44 +0000, Philip Martin wrote:
> > Julian Foad <ju...@wandisco.com> writes:
> > 
> > > I'm going to drop this "Remove the re-try logic from
> > > svn_fs_fs__path_rev_absolute()" follow-up patch, as I don't want to get
> > > into checking or messing with the txn-correctness of FSFS now.  If
> > > Daniel or anyone else wants to pursue it, I'd be glad to help.
> > 
> > I thought the patch was an improvement.  The existing retry in
> > svn_fs_fs__path_rev_absolute doesn't fix the race, it simply makes the
> > window smaller.  As the patch makes the window larger we are more likely
> > to see and fix any places where the caller doesn't handle the race.
> 
> I *think* the problem is that a caller may use this function within a
> transaction and depend on this internal re-try logic simply to update a
> stale cached min-unpacked-foo value.  "Stale" in the sense that *this*
> transaction has changed the real value and hasn't updated the cached
> value.
> 

Yes, it's conceivable that a caller might do:

  for (int i = 0; i < 2; i++)
    {
      ...
      SVN_ERR(svn_fs_fs__path_rev_absolute());
      ...
      if (i == 0)
        /* other process runs svn_fs_pack(); */;
      ...
      err = foo();
      if (err)
        {
          svn_error_clear(err);
          continue;
        }
      ...
    }

and rely on the call to svn_fs_fs__path_rev_absolute() in the second
iteration to update ffd->min_unpacked_rev.

I do not know that any of the current callers behave this way ---
I think that other than open_pack_or_rev_file(), all callers always have
the write lock --- but it's conceivable.  And if we ever add such
a caller, we will have to make them call update_min_unpacked_rev() by
themselves.

. o O [It should be fun to build with printf("-DSVN_FS_FS_DEFAULT_MAX_FILES_PER_DIR=%d -DPACK_AFTER_EVERY_COMMIT", 1).]

> Daniel, you wrote:
> > But is it strictly *necessary* to do so?  I think not: by not calling
> > update_min_unpacked_rev(), we cause ffd->min_unpacked_rev to become
> > stale --- a situation which the code has to account for anyway.
> 
> That was true, then, and it meant that every place that used this value
> had to account for it possibly being stale.  But now we have realized
> that the code shouldn't need to account for the possibility of the value
> being stale if it is running within a transaction (with the write lock).
> 
> Now, I think it would be better to change this.  Whenever the code
> updates the min-foo on disk, it should also update the cached value.
> That style of coding would be easier to reason about (to use an in-vogue
> phrase), and then we would be able to remove the re-try in
> svn_fs_fs__path_rev_absolute() without a concern.
> 

Making ffd->min_unpacked_rev stale less often does not mean that we will
have less places to account for it *possibly* being stale: it could
become stale due to other processes who write to the repository.
(except for "pack runs under the write lock" considerations)

To borrow Philip's point: making ffd->min_unpacked_rev stale less often
might hide bugs.

> Does that make sense?
> 
> - Julian
> 
> 

Daniel
(ffd->min_unpacked_rev could be renamed ffd->min_unpacked_rev_cache...)

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Julian Foad wrote on Tue, Dec 07, 2010 at 18:13:09 +0000:
> On Tue, 2010-12-07, Daniel Shahaf wrote:
> > Julian Foad wrote on Tue, Dec 07, 2010 at 15:27:05 +0000:
> > > (Quoting and replying to two emails at once.)
> > > 
> > > First: I'm assuming that no process will ever do packing without getting
> > > the exclusive write lock.  Can you confirm that for me?  If that's
> > > false, all my reasoning would be bogus.
> > 
> > Why should I confirm that?  You have the code too. :-)
> 
> Heh.  I meant: "I am assuming that's how FSFS is *intended* to work; is
> that your understanding too?"
> 

I'm not sure how it was "intended" to work.  I know that at some point
after packing was added I pointed out that two concurrent 'svnadmin pack'
operations will cause data loss if not serialized, and in response
glasser said they should run under the write lock:

http://thread.gmane.org/gmane.comp.version-control.subversion.devel/108040/focus=108048

Daniel
(who, at the time, didn't fully appreciate the importance of using the
same write lock as other FSFS operations)

> - Julian
> 
> (I have not read the rest of this email yet.)
> 
> 
> > Seriously: as of the last time I looked, svn_fs_pack() calls ... which
> > takes the write lock and calls pack_body(), which calls pack_shard(),
> > which does
> >         cat db/revs/$shard/* > db/revs/$shard.pack/pack 
> >         echo '($shard + 1) * 1000' > db/min-unpacked-revprop
> >         rm -rf db/revs/$shard/
> > while still holding the write lock.
> > 
> > > 
> > > On Tue, 2010-12-07 at 12:13 +0200, Daniel Shahaf wrote:
> > > > Philip Martin wrote on Tue, Dec 07, 2010 at 09:49:40 +0000:
> > > > > Julian Foad <ju...@wandisco.com> writes:
> > > > > 
> > > > > > On Mon, 2010-12-06 at 18:44 +0000, Philip Martin wrote:
> > > > > >> Julian Foad <ju...@wandisco.com> writes:
> > > > > >> 
> > > > > >> > I'm going to drop this "Remove the re-try logic from
> > > > > >> > svn_fs_fs__path_rev_absolute()" follow-up patch, as I don't want to get
> > > > > >> > into checking or messing with the txn-correctness of FSFS now.  If
> > > > > >> > Daniel or anyone else wants to pursue it, I'd be glad to help.
> > > > > >> 
> > > > > >> I thought the patch was an improvement.  The existing retry in
> > > > > >> svn_fs_fs__path_rev_absolute doesn't fix the race, it simply makes the
> > > > > >> window smaller.  As the patch makes the window larger we are more likely
> > > > > >> to see and fix any places where the caller doesn't handle the race.
> > > > > >
> > > > > > I *think* the problem is that a caller may use this function within a
> > > > > > transaction and depend on this internal re-try logic simply to update a
> > > > > > stale cached min-unpacked-foo value.  "Stale" in the sense that *this*
> > > > > > transaction has changed the real value and hasn't updated the cached
> > > > > > value.
> > > > 
> > > > Yes, it's conceivable that a caller might do:
> > > > 
> > > >   for (int i = 0; i < 2; i++)
> > > >     {
> > > >       ...
> > > >       SVN_ERR(svn_fs_fs__path_rev_absolute());
> > > >       ...
> > > >       if (i == 0)
> > > >         /* other process runs svn_fs_pack(); */;
> > > > [...]
> > > 
> > > No, that's not what I mean.  I mean a caller might do:
> > > 
> > >   with a write lock:
> > >     {
> > >       svn_fs_pack();
> > >       SVN_ERR(svn_fs_fs__path_rev_absolute(&path, ...));
> > >       foo(path);
> > >     }
> > > 
> > > where neither this code nor foo() re-tries.  With the current version of
> > > svn_fs_fs__path_rev_absolute(), foo() will get the correct path,
> > 
> > Unless a commit occurs after svn_fs_fs__path_rev_absolute() returns but
> > before foo() is called.  (This is the condition Stefan2 fixed.)
> > 
> > > whereas if we remove the internal re-try then foo() will get the wrong
> > > path.
> > > 
> > > >   If there are any
> > > > > such cases then your patch should expose them and we will fix them.  Any
> > > > > such errors are easy to handle, compared to the difficulty of fixing
> > > > > races.
> > > > > 
> > > > > The real problem with the existing code is that it partially hides
> > > > > races, by making them harder to trigger, but doesn't fix the race.  If
> > > > > we want to fix the race properly making it easier to trigger is the
> > > > > right thing to do.
> > > 
> > > Agreed.
> > > 
> > > What I meant to indicate is that, before making this change, I would
> > > like to see clearly that a caller with the write lock cannot use an
> > > out-of-date value.  One way to ensure that would be:
> > > 
> > >   update the cached value whenever we get the write lock (which we do);
> > >   update the cached value whenever we update the value on disk (which we
> > > don't);
> > >   anything not using a write lock must accept that the cached value may
> > > be stale, and re-try if necessary.
> > > 
> > > Another way would be:
> > > 
> > >   don't update the cached value when we get the write lock;
> > >   don't update the cached value when we update the value on disk;
> > >   every time we use the cached value within a write lock, either update
> > > before using it or try and re-try if necessary;
> > >   every time we use the cached value without a write lock, try and
> > > re-try if necessary.
> > > 
> > > I don't have a strong preference between those, but the current
> > > inconsistency between updating it when we get the write lock and not
> > > updating it when we update the value on disk seems bad.
> > > 
> > 
> > What is the cost of this inconsistency?
> > 
> > The stale cache will result in a failed stat() call on the path where
> > a revision file existed before it was packed.  Then, either we will
> > update the cache and retry (meaning the only cost was a stat() call), or
> > due to a bug we will forget to retry and, instead, send a bogus
> > error up the stack.
> > 
> > So, we pay one stat(), but in exchange we hope that, by letting the
> > cache *actually* become stale, concurrency bugs that reproduce only
> > when the cache is stale will be unearthed earlier.
> > 
> > In fact, how about this patch?
> > 
> > [[[
> > Index: subversion/libsvn_fs_fs/fs_fs.c
> > ===================================================================
> > --- subversion/libsvn_fs_fs/fs_fs.c	(revision 1042961)
> > +++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
> > @@ -7553,7 +7553,7 @@ write_revnum_file(const char *fs_path,
> >     remove the pack file and start again. */
> >  static svn_error_t *
> >  pack_shard(const char *revs_dir,
> > -           const char *fs_path,
> > +           svn_fs_t *fs,
> >             apr_int64_t shard,
> >             int max_files_per_dir,
> >             svn_fs_pack_notify_t notify_func,
> > @@ -7562,6 +7562,7 @@ pack_shard(const char *revs_dir,
> >             void *cancel_baton,
> >             apr_pool_t *pool)
> >  {
> > +  const char *fs_path = fs->path;
> >    const char *pack_file_path, *manifest_file_path, *shard_path;
> >    const char *pack_file_dir;
> >    svn_stream_t *pack_stream, *manifest_stream;
> > @@ -7638,6 +7639,9 @@ pack_shard(const char *revs_dir,
> >    SVN_ERR(write_revnum_file(fs_path, PATH_MIN_UNPACKED_REV,
> >                              (svn_revnum_t)((shard + 1) * max_files_per_dir),
> >                              iterpool));
> > +#ifndef SVN_DEBUG
> > +  SVN_ERR(update_min_unpacked_rev(fs, iterpool));
> > +#endif
> >    svn_pool_destroy(iterpool);
> >  
> >    /* Finally, remove the existing shard directory. */
> > @@ -7701,6 +7705,9 @@ pack_revprop_shard(svn_fs_t *fs,
> >    SVN_ERR(write_revnum_file(fs_path, PATH_MIN_UNPACKED_REVPROP,
> >                              (svn_revnum_t)((shard + 1) * max_files_per_dir),
> >                              iterpool));
> > +#ifndef SVN_DEBUG
> > +  SVN_ERR(update_min_unpacked_revprop(fs, iterpool));
> > +#endif
> >    svn_pool_destroy(iterpool);
> >  
> >    /* Finally, remove the existing shard directory. */
> > @@ -7786,7 +7793,7 @@ pack_body(void *baton,
> >        if (pb->cancel_func)
> >          SVN_ERR(pb->cancel_func(pb->cancel_baton));
> >  
> > -      SVN_ERR(pack_shard(data_path, pb->fs->path, i, max_files_per_dir,
> > +      SVN_ERR(pack_shard(data_path, pb->fs, i, max_files_per_dir,
> >                           pb->notify_func, pb->notify_baton,
> >                           pb->cancel_func, pb->cancel_baton, iterpool));
> >      }
> > ]]]
> > (still need to add comments explaining why the #ifndef guard)
> > 
> > > This doesn't mean we should keep the re-try loop that we're talking
> > > about removing.  I agree it should go.  It's just that I'd like to
> > > resolve this inconsistency, or at least agree about resolving it, before
> > > feeling comfortable enough to commit this myself.
> > > 
> > > 
> > > > > > Daniel, you wrote:
> > > > > >> But is it strictly *necessary* to do so?  I think not: by not calling
> > > > > >> update_min_unpacked_rev(), we cause ffd->min_unpacked_rev to become
> > > > > >> stale --- a situation which the code has to account for anyway.
> > > > > >
> > > > > > That was true, then, and it meant that every place that used this value
> > > > > > had to account for it possibly being stale.  But now we have realized
> > > > > > that the code shouldn't need to account for the possibility of the value
> > > > > > being stale if it is running within a transaction (with the write lock).
> > > > > >
> > > > > > Now, I think it would be better to change this.  Whenever the code
> > > > > > updates the min-foo on disk, it should also update the cached value.
> > > > > > That style of coding would be easier to reason about (to use an in-vogue
> > > > > > phrase), and then we would be able to remove the re-try in
> > > > > > svn_fs_fs__path_rev_absolute() without a concern.
> > > 
> > > > Making ffd->min_unpacked_rev stale less often does not mean that we will
> > > > have less places to account for it *possibly* being stale:
> > > 
> > > I think it *does*, because every piece of code that knows it has a write
> > > lock would not have to bother about that possibility.
> > > 
> > 
> > Code that has the write lock does not have to bother about that
> > possibility anyway, because with_some_lock() updates the ffd->min_*
> > caches after acquiring the lock and before invoking the under-the-lock
> > callback.
> > 
> > > >  it could
> > > > become stale due to other processes who write to the repository.
> > > > (except for "pack runs under the write lock" considerations)
> > > 
> > > It could become stale due to another process writing to the repository,
> > > except if *this* code is running under a write lock.  If *this* code has
> > > a write lock, it doesn't matter whether it's doing packing or not;
> > > another process can't make this process's cache get stale because
> > > another process can't get the write lock.
> > > 
> > 
> > Right: if we have the write lock, then the caches are not stale.
> > 
> > > > To borrow Philip's point: making ffd->min_unpacked_rev stale less often
> > > > might hide bugs.
> > > 
> > > That is one way of looking at it, when thinking about the asynchronous
> > > (no write lock) cases, but another way of looking at the same thing is
> > > it can help to *totally avoid* potential bugs in synchronous cases.
> > > 
> > 
> > Agreed, that's the flip side of the coin.  The #ifndef patch above tries
> > to find a middle ground between hiding and unearthing bugs here.
> > 
> > > - Julian
> > > 
> > > 
> 
>

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Julian Foad wrote on Wed, Dec 08, 2010 at 13:30:42 +0000:
> I (Julian Foad) wrote:
> [...]
> > Studying the FSFS source code for the issues raised in this thread has
> > given me confidence that it seems to be doing the right thing, in
> > practice, at the moment.
> > 
> > In 1043360 I added some comments about how to remove one source of
> > fragility in using the cached value.  Again I'll say this looks
> > perfectly correct in its current usage.
> > 

Agreed, the current uses of get_root_changes_offset() will always use
the same ffd->min_unpacked_rev as open_pack_or_rev_file() used.

> > I'll go ahead and apply the remove-retry-logic-from-path-rev-absolute
> > patch now.
> 
> Committed revision 1043408.
> 

Thank you!

(for not letting the length of this thread bury the patch)

> - Julian
> 
>

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Julian Foad <ju...@wandisco.com>.

I (Julian Foad) wrote:
[...]
> Studying the FSFS source code for the issues raised in this thread has
> given me confidence that it seems to be doing the right thing, in
> practice, at the moment.
> 
> In 1043360 I added some comments about how to remove one source of
> fragility in using the cached value.  Again I'll say this looks
> perfectly correct in its current usage.
> 
> I'll go ahead and apply the remove-retry-logic-from-path-rev-absolute
> patch now.

Committed revision 1043408.

- Julian

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Julian Foad <ju...@wandisco.com>.

On Wed, 2010-12-08, Daniel Shahaf wrote:
> [ continuing the non-linear replies ]
> 
> Daniel Shahaf wrote on Tue, Dec 07, 2010 at 18:51:12 +0200:
> > Julian Foad wrote on Tue, Dec 07, 2010 at 15:27:05 +0000:
> > > On Tue, 2010-12-07 at 12:13 +0200, Daniel Shahaf wrote:
> > > > Philip Martin wrote on Tue, Dec 07, 2010 at 09:49:40 +0000:
> > > > > Julian Foad <ju...@wandisco.com> writes:
> > > > > > On Mon, 2010-12-06 at 18:44 +0000, Philip Martin wrote:
> > > > > >> Julian Foad <ju...@wandisco.com> writes:
> > > > > >> 
> > > > > >> > I'm going to drop this "Remove the re-try logic from
> > > > > >> > svn_fs_fs__path_rev_absolute()" follow-up patch, as I don't want to get
> > > > > >> > into checking or messing with the txn-correctness of FSFS now.  If
> > > > > >> > Daniel or anyone else wants to pursue it, I'd be glad to help.
> > > > > >> 
> > > > > >> I thought the patch was an improvement.  The existing retry in
> > > > > >> svn_fs_fs__path_rev_absolute doesn't fix the race, it simply makes the
> > > > > >> window smaller.  As the patch makes the window larger we are more likely
> > > > > >> to see and fix any places where the caller doesn't handle the race.
> > > > > >
> > > > > > I *think* the problem is that a caller may use this function within a
> > > > > > transaction and depend on this internal re-try logic simply to update a
> > > > > > stale cached min-unpacked-foo value.  "Stale" in the sense that *this*
> > > > > > transaction has changed the real value and hasn't updated the cached
> > > > > > value.
> > > > 
> > > > Yes, it's conceivable that a caller might do:
> > > > 
> > > >   for (int i = 0; i < 2; i++)
> > > >     {
> > > >       ...
> > > >       SVN_ERR(svn_fs_fs__path_rev_absolute());
> > > >       ...
> > > >       if (i == 0)
> > > >         /* other process runs svn_fs_pack(); */;
> > > > [...]
> > > 
> > > No, that's not what I mean.  I mean a caller might do:
> > > 
> > >   with a write lock:
> > >     {
> > >       svn_fs_pack();
> > >       SVN_ERR(svn_fs_fs__path_rev_absolute(&path, ...));
> > >       foo(path);
> > >     }
> > > 
> > > where neither this code nor foo() re-tries.  With the current version of
> > > svn_fs_fs__path_rev_absolute(), foo() will get the correct path,
> > 
> > Unless a commit occurs after svn_fs_fs__path_rev_absolute() returns but
> > before foo() is called.  (This is the condition Stefan2 fixed.)

I've already clarified this on IRC, but my example code was bad.  I
meant to indicate that the whole sequence is inside a write lock, but it
should say pack_body() where I wrote svn_fs_pack(), since I now
understand that svn_fs_fs__pack() takes a write lock itself and
therefore it's wrong to try to call svn_fs_pack() while holding a write
lock.

> > > whereas if we remove the internal re-try then foo() will get the wrong
> > > path.
> > > 
> > > >   If there are any
> > > > > such cases then your patch should expose them and we will fix them.  Any
> > > > > such errors are easy to handle, compared to the difficulty of fixing
> > > > > races.
> > > > > 
> > > > > The real problem with the existing code is that it partially hides
> > > > > races, by making them harder to trigger, but doesn't fix the race.  If
> > > > > we want to fix the race properly making it easier to trigger is the
> > > > > right thing to do.
> > > 
> > > Agreed.
> > > 
> > > What I meant to indicate is that, before making this change, I would
> > > like to see clearly that a caller with the write lock cannot use an
> > > out-of-date value.  One way to ensure that would be:
> > > 
> > >   update the cached value whenever we get the write lock (which we do);
> > >   update the cached value whenever we update the value on disk (which we
> > > don't);
> > >   anything not using a write lock must accept that the cached value may
> > > be stale, and re-try if necessary.
> > > 
> > > Another way would be:
> > > 
> > >   don't update the cached value when we get the write lock;
> > >   don't update the cached value when we update the value on disk;
> > >   every time we use the cached value within a write lock, either update
> > > before using it or try and re-try if necessary;
> > >   every time we use the cached value without a write lock, try and
> > > re-try if necessary.
> > > 
> > > I don't have a strong preference between those, but the current
> > > inconsistency between updating it when we get the write lock and not
> > > updating it when we update the value on disk seems bad.
> > > 
> > 
> > What is the cost of this inconsistency?
> > 
> > The stale cache will result in a failed stat() call on the path where
> > a revision file existed before it was packed.  Then, either we will
> > update the cache and retry (meaning the only cost was a stat() call), or
> > due to a bug we will forget to retry and, instead, send a bogus
> > error up the stack.
> > 
> > So, we pay one stat(), but in exchange we hope that, by letting the
> > cache *actually* become stale, concurrency bugs that reproduce only
> > when the cache is stale will be unearthed earlier.
> 
> In more words: bugs that only trigger under asynchronous conditions are
> more likely to escape our detection.  (For example, many of them weren't
> detected by 'make check'.)  By forcing the code to exercise the 'cache is
> stale' situation, we attempt to cause some of those bugs to be discovered
> during development, rather than during RC testing or after release.
> 
> Makes sense?

One part of me says "yes that sounds logical", another part is still not
sure.

> > 
> > In fact, how about this patch?
> > 
> > @@ -7638,6 +7639,9 @@ pack_shard(const char *revs_dir,
> >    SVN_ERR(write_revnum_file(fs_path, PATH_MIN_UNPACKED_REV,
> >                              (svn_revnum_t)((shard + 1) * max_files_per_dir),
> >                              iterpool));
> > +#ifndef SVN_DEBUG
> > +  SVN_ERR(update_min_unpacked_rev(fs, iterpool));
> > +#endif
> >    svn_pool_destroy(iterpool);
> >  
> >    /* Finally, remove the existing shard directory. */
> > @@ -7701,6 +7705,9 @@ pack_revprop_shard(svn_fs_t *fs,
> >    SVN_ERR(write_revnum_file(fs_path, PATH_MIN_UNPACKED_REVPROP,
> >                              (svn_revnum_t)((shard + 1) * max_files_per_dir),
> >                              iterpool));
> > +#ifndef SVN_DEBUG
> > +  SVN_ERR(update_min_unpacked_revprop(fs, iterpool));
> > +#endif
> >    svn_pool_destroy(iterpool);
> >  
> >    /* Finally, remove the existing shard directory. */

My intuition says no, making it differ in debug mode versus release mode
is more likely to cause us pain than gain, overall.


Studying the FSFS source code for the issues raised in this thread has
given me confidence that it seems to be doing the right thing, in
practice, at the moment.

In 1043360 I added some comments about how to remove one source of
fragility in using the cached value.  Again I'll say this looks
perfectly correct in its current usage.

I'll go ahead and apply the remove-retry-logic-from-path-rev-absolute
patch now.

- Julian

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

[ continuing the non-linear replies ]

Daniel Shahaf wrote on Tue, Dec 07, 2010 at 18:51:12 +0200:
> Julian Foad wrote on Tue, Dec 07, 2010 at 15:27:05 +0000:
> > On Tue, 2010-12-07 at 12:13 +0200, Daniel Shahaf wrote:
> > > Philip Martin wrote on Tue, Dec 07, 2010 at 09:49:40 +0000:
> > > > Julian Foad <ju...@wandisco.com> writes:
> > > > > On Mon, 2010-12-06 at 18:44 +0000, Philip Martin wrote:
> > > > >> Julian Foad <ju...@wandisco.com> writes:
> > > > >> 
> > > > >> > I'm going to drop this "Remove the re-try logic from
> > > > >> > svn_fs_fs__path_rev_absolute()" follow-up patch, as I don't want to get
> > > > >> > into checking or messing with the txn-correctness of FSFS now.  If
> > > > >> > Daniel or anyone else wants to pursue it, I'd be glad to help.
> > > > >> 
> > > > >> I thought the patch was an improvement.  The existing retry in
> > > > >> svn_fs_fs__path_rev_absolute doesn't fix the race, it simply makes the
> > > > >> window smaller.  As the patch makes the window larger we are more likely
> > > > >> to see and fix any places where the caller doesn't handle the race.
> > > > >
> > > > > I *think* the problem is that a caller may use this function within a
> > > > > transaction and depend on this internal re-try logic simply to update a
> > > > > stale cached min-unpacked-foo value.  "Stale" in the sense that *this*
> > > > > transaction has changed the real value and hasn't updated the cached
> > > > > value.
> > > 
> > > Yes, it's conceivable that a caller might do:
> > > 
> > >   for (int i = 0; i < 2; i++)
> > >     {
> > >       ...
> > >       SVN_ERR(svn_fs_fs__path_rev_absolute());
> > >       ...
> > >       if (i == 0)
> > >         /* other process runs svn_fs_pack(); */;
> > > [...]
> > 
> > No, that's not what I mean.  I mean a caller might do:
> > 
> >   with a write lock:
> >     {
> >       svn_fs_pack();
> >       SVN_ERR(svn_fs_fs__path_rev_absolute(&path, ...));
> >       foo(path);
> >     }
> > 
> > where neither this code nor foo() re-tries.  With the current version of
> > svn_fs_fs__path_rev_absolute(), foo() will get the correct path,
> 
> Unless a commit occurs after svn_fs_fs__path_rev_absolute() returns but
> before foo() is called.  (This is the condition Stefan2 fixed.)
> 
> > whereas if we remove the internal re-try then foo() will get the wrong
> > path.
> > 
> > >   If there are any
> > > > such cases then your patch should expose them and we will fix them.  Any
> > > > such errors are easy to handle, compared to the difficulty of fixing
> > > > races.
> > > > 
> > > > The real problem with the existing code is that it partially hides
> > > > races, by making them harder to trigger, but doesn't fix the race.  If
> > > > we want to fix the race properly making it easier to trigger is the
> > > > right thing to do.
> > 
> > Agreed.
> > 
> > What I meant to indicate is that, before making this change, I would
> > like to see clearly that a caller with the write lock cannot use an
> > out-of-date value.  One way to ensure that would be:
> > 
> >   update the cached value whenever we get the write lock (which we do);
> >   update the cached value whenever we update the value on disk (which we
> > don't);
> >   anything not using a write lock must accept that the cached value may
> > be stale, and re-try if necessary.
> > 
> > Another way would be:
> > 
> >   don't update the cached value when we get the write lock;
> >   don't update the cached value when we update the value on disk;
> >   every time we use the cached value within a write lock, either update
> > before using it or try and re-try if necessary;
> >   every time we use the cached value without a write lock, try and
> > re-try if necessary.
> > 
> > I don't have a strong preference between those, but the current
> > inconsistency between updating it when we get the write lock and not
> > updating it when we update the value on disk seems bad.
> > 
> 
> What is the cost of this inconsistency?
> 
> The stale cache will result in a failed stat() call on the path where
> a revision file existed before it was packed.  Then, either we will
> update the cache and retry (meaning the only cost was a stat() call), or
> due to a bug we will forget to retry and, instead, send a bogus
> error up the stack.
> 
> So, we pay one stat(), but in exchange we hope that, by letting the
> cache *actually* become stale, concurrency bugs that reproduce only
> when the cache is stale will be unearthed earlier.

In more words: bugs that only trigger under asynchronous conditions are
more likely to escape our detection.  (For example, many of them weren't
detected by 'make check'.)  By forcing the code to exercise the 'cache is
stale' situation, we attempt to cause some of those bugs to be discovered
during development, rather than during RC testing or after release.

Makes sense?

> 
> In fact, how about this patch?
> 
> @@ -7638,6 +7639,9 @@ pack_shard(const char *revs_dir,
>    SVN_ERR(write_revnum_file(fs_path, PATH_MIN_UNPACKED_REV,
>                              (svn_revnum_t)((shard + 1) * max_files_per_dir),
>                              iterpool));
> +#ifndef SVN_DEBUG
> +  SVN_ERR(update_min_unpacked_rev(fs, iterpool));
> +#endif
>    svn_pool_destroy(iterpool);
>  
>    /* Finally, remove the existing shard directory. */
> @@ -7701,6 +7705,9 @@ pack_revprop_shard(svn_fs_t *fs,
>    SVN_ERR(write_revnum_file(fs_path, PATH_MIN_UNPACKED_REVPROP,
>                              (svn_revnum_t)((shard + 1) * max_files_per_dir),
>                              iterpool));
> +#ifndef SVN_DEBUG
> +  SVN_ERR(update_min_unpacked_revprop(fs, iterpool));
> +#endif
>    svn_pool_destroy(iterpool);
>  
>    /* Finally, remove the existing shard directory. */

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Julian Foad <ju...@wandisco.com>.

On Tue, 2010-12-07, Daniel Shahaf wrote:
> Julian Foad wrote on Tue, Dec 07, 2010 at 15:27:05 +0000:
> > (Quoting and replying to two emails at once.)
> > 
> > First: I'm assuming that no process will ever do packing without getting
> > the exclusive write lock.  Can you confirm that for me?  If that's
> > false, all my reasoning would be bogus.
> 
> Why should I confirm that?  You have the code too. :-)

Heh.  I meant: "I am assuming that's how FSFS is *intended* to work; is
that your understanding too?"

- Julian

(I have not read the rest of this email yet.)


> Seriously: as of the last time I looked, svn_fs_pack() calls ... which
> takes the write lock and calls pack_body(), which calls pack_shard(),
> which does
>         cat db/revs/$shard/* > db/revs/$shard.pack/pack 
>         echo '($shard + 1) * 1000' > db/min-unpacked-revprop
>         rm -rf db/revs/$shard/
> while still holding the write lock.
> 
> > 
> > On Tue, 2010-12-07 at 12:13 +0200, Daniel Shahaf wrote:
> > > Philip Martin wrote on Tue, Dec 07, 2010 at 09:49:40 +0000:
> > > > Julian Foad <ju...@wandisco.com> writes:
> > > > 
> > > > > On Mon, 2010-12-06 at 18:44 +0000, Philip Martin wrote:
> > > > >> Julian Foad <ju...@wandisco.com> writes:
> > > > >> 
> > > > >> > I'm going to drop this "Remove the re-try logic from
> > > > >> > svn_fs_fs__path_rev_absolute()" follow-up patch, as I don't want to get
> > > > >> > into checking or messing with the txn-correctness of FSFS now.  If
> > > > >> > Daniel or anyone else wants to pursue it, I'd be glad to help.
> > > > >> 
> > > > >> I thought the patch was an improvement.  The existing retry in
> > > > >> svn_fs_fs__path_rev_absolute doesn't fix the race, it simply makes the
> > > > >> window smaller.  As the patch makes the window larger we are more likely
> > > > >> to see and fix any places where the caller doesn't handle the race.
> > > > >
> > > > > I *think* the problem is that a caller may use this function within a
> > > > > transaction and depend on this internal re-try logic simply to update a
> > > > > stale cached min-unpacked-foo value.  "Stale" in the sense that *this*
> > > > > transaction has changed the real value and hasn't updated the cached
> > > > > value.
> > > 
> > > Yes, it's conceivable that a caller might do:
> > > 
> > >   for (int i = 0; i < 2; i++)
> > >     {
> > >       ...
> > >       SVN_ERR(svn_fs_fs__path_rev_absolute());
> > >       ...
> > >       if (i == 0)
> > >         /* other process runs svn_fs_pack(); */;
> > > [...]
> > 
> > No, that's not what I mean.  I mean a caller might do:
> > 
> >   with a write lock:
> >     {
> >       svn_fs_pack();
> >       SVN_ERR(svn_fs_fs__path_rev_absolute(&path, ...));
> >       foo(path);
> >     }
> > 
> > where neither this code nor foo() re-tries.  With the current version of
> > svn_fs_fs__path_rev_absolute(), foo() will get the correct path,
> 
> Unless a commit occurs after svn_fs_fs__path_rev_absolute() returns but
> before foo() is called.  (This is the condition Stefan2 fixed.)
> 
> > whereas if we remove the internal re-try then foo() will get the wrong
> > path.
> > 
> > >   If there are any
> > > > such cases then your patch should expose them and we will fix them.  Any
> > > > such errors are easy to handle, compared to the difficulty of fixing
> > > > races.
> > > > 
> > > > The real problem with the existing code is that it partially hides
> > > > races, by making them harder to trigger, but doesn't fix the race.  If
> > > > we want to fix the race properly making it easier to trigger is the
> > > > right thing to do.
> > 
> > Agreed.
> > 
> > What I meant to indicate is that, before making this change, I would
> > like to see clearly that a caller with the write lock cannot use an
> > out-of-date value.  One way to ensure that would be:
> > 
> >   update the cached value whenever we get the write lock (which we do);
> >   update the cached value whenever we update the value on disk (which we
> > don't);
> >   anything not using a write lock must accept that the cached value may
> > be stale, and re-try if necessary.
> > 
> > Another way would be:
> > 
> >   don't update the cached value when we get the write lock;
> >   don't update the cached value when we update the value on disk;
> >   every time we use the cached value within a write lock, either update
> > before using it or try and re-try if necessary;
> >   every time we use the cached value without a write lock, try and
> > re-try if necessary.
> > 
> > I don't have a strong preference between those, but the current
> > inconsistency between updating it when we get the write lock and not
> > updating it when we update the value on disk seems bad.
> > 
> 
> What is the cost of this inconsistency?
> 
> The stale cache will result in a failed stat() call on the path where
> a revision file existed before it was packed.  Then, either we will
> update the cache and retry (meaning the only cost was a stat() call), or
> due to a bug we will forget to retry and, instead, send a bogus
> error up the stack.
> 
> So, we pay one stat(), but in exchange we hope that, by letting the
> cache *actually* become stale, concurrency bugs that reproduce only
> when the cache is stale will be unearthed earlier.
> 
> In fact, how about this patch?
> 
> [[[
> Index: subversion/libsvn_fs_fs/fs_fs.c
> ===================================================================
> --- subversion/libsvn_fs_fs/fs_fs.c	(revision 1042961)
> +++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
> @@ -7553,7 +7553,7 @@ write_revnum_file(const char *fs_path,
>     remove the pack file and start again. */
>  static svn_error_t *
>  pack_shard(const char *revs_dir,
> -           const char *fs_path,
> +           svn_fs_t *fs,
>             apr_int64_t shard,
>             int max_files_per_dir,
>             svn_fs_pack_notify_t notify_func,
> @@ -7562,6 +7562,7 @@ pack_shard(const char *revs_dir,
>             void *cancel_baton,
>             apr_pool_t *pool)
>  {
> +  const char *fs_path = fs->path;
>    const char *pack_file_path, *manifest_file_path, *shard_path;
>    const char *pack_file_dir;
>    svn_stream_t *pack_stream, *manifest_stream;
> @@ -7638,6 +7639,9 @@ pack_shard(const char *revs_dir,
>    SVN_ERR(write_revnum_file(fs_path, PATH_MIN_UNPACKED_REV,
>                              (svn_revnum_t)((shard + 1) * max_files_per_dir),
>                              iterpool));
> +#ifndef SVN_DEBUG
> +  SVN_ERR(update_min_unpacked_rev(fs, iterpool));
> +#endif
>    svn_pool_destroy(iterpool);
>  
>    /* Finally, remove the existing shard directory. */
> @@ -7701,6 +7705,9 @@ pack_revprop_shard(svn_fs_t *fs,
>    SVN_ERR(write_revnum_file(fs_path, PATH_MIN_UNPACKED_REVPROP,
>                              (svn_revnum_t)((shard + 1) * max_files_per_dir),
>                              iterpool));
> +#ifndef SVN_DEBUG
> +  SVN_ERR(update_min_unpacked_revprop(fs, iterpool));
> +#endif
>    svn_pool_destroy(iterpool);
>  
>    /* Finally, remove the existing shard directory. */
> @@ -7786,7 +7793,7 @@ pack_body(void *baton,
>        if (pb->cancel_func)
>          SVN_ERR(pb->cancel_func(pb->cancel_baton));
>  
> -      SVN_ERR(pack_shard(data_path, pb->fs->path, i, max_files_per_dir,
> +      SVN_ERR(pack_shard(data_path, pb->fs, i, max_files_per_dir,
>                           pb->notify_func, pb->notify_baton,
>                           pb->cancel_func, pb->cancel_baton, iterpool));
>      }
> ]]]
> (still need to add comments explaining why the #ifndef guard)
> 
> > This doesn't mean we should keep the re-try loop that we're talking
> > about removing.  I agree it should go.  It's just that I'd like to
> > resolve this inconsistency, or at least agree about resolving it, before
> > feeling comfortable enough to commit this myself.
> > 
> > 
> > > > > Daniel, you wrote:
> > > > >> But is it strictly *necessary* to do so?  I think not: by not calling
> > > > >> update_min_unpacked_rev(), we cause ffd->min_unpacked_rev to become
> > > > >> stale --- a situation which the code has to account for anyway.
> > > > >
> > > > > That was true, then, and it meant that every place that used this value
> > > > > had to account for it possibly being stale.  But now we have realized
> > > > > that the code shouldn't need to account for the possibility of the value
> > > > > being stale if it is running within a transaction (with the write lock).
> > > > >
> > > > > Now, I think it would be better to change this.  Whenever the code
> > > > > updates the min-foo on disk, it should also update the cached value.
> > > > > That style of coding would be easier to reason about (to use an in-vogue
> > > > > phrase), and then we would be able to remove the re-try in
> > > > > svn_fs_fs__path_rev_absolute() without a concern.
> > 
> > > Making ffd->min_unpacked_rev stale less often does not mean that we will
> > > have less places to account for it *possibly* being stale:
> > 
> > I think it *does*, because every piece of code that knows it has a write
> > lock would not have to bother about that possibility.
> > 
> 
> Code that has the write lock does not have to bother about that
> possibility anyway, because with_some_lock() updates the ffd->min_*
> caches after acquiring the lock and before invoking the under-the-lock
> callback.
> 
> > >  it could
> > > become stale due to other processes who write to the repository.
> > > (except for "pack runs under the write lock" considerations)
> > 
> > It could become stale due to another process writing to the repository,
> > except if *this* code is running under a write lock.  If *this* code has
> > a write lock, it doesn't matter whether it's doing packing or not;
> > another process can't make this process's cache get stale because
> > another process can't get the write lock.
> > 
> 
> Right: if we have the write lock, then the caches are not stale.
> 
> > > To borrow Philip's point: making ffd->min_unpacked_rev stale less often
> > > might hide bugs.
> > 
> > That is one way of looking at it, when thinking about the asynchronous
> > (no write lock) cases, but another way of looking at the same thing is
> > it can help to *totally avoid* potential bugs in synchronous cases.
> > 
> 
> Agreed, that's the flip side of the coin.  The #ifndef patch above tries
> to find a middle ground between hiding and unearthing bugs here.
> 
> > - Julian
> > 
> >

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Julian Foad wrote on Tue, Dec 07, 2010 at 15:27:05 +0000:
> (Quoting and replying to two emails at once.)
> 
> First: I'm assuming that no process will ever do packing without getting
> the exclusive write lock.  Can you confirm that for me?  If that's
> false, all my reasoning would be bogus.
> 

Why should I confirm that?  You have the code too. :-)

Seriously: as of the last time I looked, svn_fs_pack() calls ... which
takes the write lock and calls pack_body(), which calls pack_shard(),
which does
        cat db/revs/$shard/* > db/revs/$shard.pack/pack 
        echo '($shard + 1) * 1000' > db/min-unpacked-revprop
        rm -rf db/revs/$shard/
while still holding the write lock.

> 
> On Tue, 2010-12-07 at 12:13 +0200, Daniel Shahaf wrote:
> > Philip Martin wrote on Tue, Dec 07, 2010 at 09:49:40 +0000:
> > > Julian Foad <ju...@wandisco.com> writes:
> > > 
> > > > On Mon, 2010-12-06 at 18:44 +0000, Philip Martin wrote:
> > > >> Julian Foad <ju...@wandisco.com> writes:
> > > >> 
> > > >> > I'm going to drop this "Remove the re-try logic from
> > > >> > svn_fs_fs__path_rev_absolute()" follow-up patch, as I don't want to get
> > > >> > into checking or messing with the txn-correctness of FSFS now.  If
> > > >> > Daniel or anyone else wants to pursue it, I'd be glad to help.
> > > >> 
> > > >> I thought the patch was an improvement.  The existing retry in
> > > >> svn_fs_fs__path_rev_absolute doesn't fix the race, it simply makes the
> > > >> window smaller.  As the patch makes the window larger we are more likely
> > > >> to see and fix any places where the caller doesn't handle the race.
> > > >
> > > > I *think* the problem is that a caller may use this function within a
> > > > transaction and depend on this internal re-try logic simply to update a
> > > > stale cached min-unpacked-foo value.  "Stale" in the sense that *this*
> > > > transaction has changed the real value and hasn't updated the cached
> > > > value.
> > 
> > Yes, it's conceivable that a caller might do:
> > 
> >   for (int i = 0; i < 2; i++)
> >     {
> >       ...
> >       SVN_ERR(svn_fs_fs__path_rev_absolute());
> >       ...
> >       if (i == 0)
> >         /* other process runs svn_fs_pack(); */;
> > [...]
> 
> No, that's not what I mean.  I mean a caller might do:
> 
>   with a write lock:
>     {
>       svn_fs_pack();
>       SVN_ERR(svn_fs_fs__path_rev_absolute(&path, ...));
>       foo(path);
>     }
> 
> where neither this code nor foo() re-tries.  With the current version of
> svn_fs_fs__path_rev_absolute(), foo() will get the correct path,

Unless a commit occurs after svn_fs_fs__path_rev_absolute() returns but
before foo() is called.  (This is the condition Stefan2 fixed.)

> whereas if we remove the internal re-try then foo() will get the wrong
> path.
> 
> >   If there are any
> > > such cases then your patch should expose them and we will fix them.  Any
> > > such errors are easy to handle, compared to the difficulty of fixing
> > > races.
> > > 
> > > The real problem with the existing code is that it partially hides
> > > races, by making them harder to trigger, but doesn't fix the race.  If
> > > we want to fix the race properly making it easier to trigger is the
> > > right thing to do.
> 
> Agreed.
> 
> What I meant to indicate is that, before making this change, I would
> like to see clearly that a caller with the write lock cannot use an
> out-of-date value.  One way to ensure that would be:
> 
>   update the cached value whenever we get the write lock (which we do);
>   update the cached value whenever we update the value on disk (which we
> don't);
>   anything not using a write lock must accept that the cached value may
> be stale, and re-try if necessary.
> 
> Another way would be:
> 
>   don't update the cached value when we get the write lock;
>   don't update the cached value when we update the value on disk;
>   every time we use the cached value within a write lock, either update
> before using it or try and re-try if necessary;
>   every time we use the cached value without a write lock, try and
> re-try if necessary.
> 
> I don't have a strong preference between those, but the current
> inconsistency between updating it when we get the write lock and not
> updating it when we update the value on disk seems bad.
> 

What is the cost of this inconsistency?

The stale cache will result in a failed stat() call on the path where
a revision file existed before it was packed.  Then, either we will
update the cache and retry (meaning the only cost was a stat() call), or
due to a bug we will forget to retry and, instead, send a bogus
error up the stack.

So, we pay one stat(), but in exchange we hope that, by letting the
cache *actually* become stale, concurrency bugs that reproduce only
when the cache is stale will be unearthed earlier.

In fact, how about this patch?

[[[
Index: subversion/libsvn_fs_fs/fs_fs.c
===================================================================
--- subversion/libsvn_fs_fs/fs_fs.c	(revision 1042961)
+++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
@@ -7553,7 +7553,7 @@ write_revnum_file(const char *fs_path,
    remove the pack file and start again. */
 static svn_error_t *
 pack_shard(const char *revs_dir,
-           const char *fs_path,
+           svn_fs_t *fs,
            apr_int64_t shard,
            int max_files_per_dir,
            svn_fs_pack_notify_t notify_func,
@@ -7562,6 +7562,7 @@ pack_shard(const char *revs_dir,
            void *cancel_baton,
            apr_pool_t *pool)
 {
+  const char *fs_path = fs->path;
   const char *pack_file_path, *manifest_file_path, *shard_path;
   const char *pack_file_dir;
   svn_stream_t *pack_stream, *manifest_stream;
@@ -7638,6 +7639,9 @@ pack_shard(const char *revs_dir,
   SVN_ERR(write_revnum_file(fs_path, PATH_MIN_UNPACKED_REV,
                             (svn_revnum_t)((shard + 1) * max_files_per_dir),
                             iterpool));
+#ifndef SVN_DEBUG
+  SVN_ERR(update_min_unpacked_rev(fs, iterpool));
+#endif
   svn_pool_destroy(iterpool);
 
   /* Finally, remove the existing shard directory. */
@@ -7701,6 +7705,9 @@ pack_revprop_shard(svn_fs_t *fs,
   SVN_ERR(write_revnum_file(fs_path, PATH_MIN_UNPACKED_REVPROP,
                             (svn_revnum_t)((shard + 1) * max_files_per_dir),
                             iterpool));
+#ifndef SVN_DEBUG
+  SVN_ERR(update_min_unpacked_revprop(fs, iterpool));
+#endif
   svn_pool_destroy(iterpool);
 
   /* Finally, remove the existing shard directory. */
@@ -7786,7 +7793,7 @@ pack_body(void *baton,
       if (pb->cancel_func)
         SVN_ERR(pb->cancel_func(pb->cancel_baton));
 
-      SVN_ERR(pack_shard(data_path, pb->fs->path, i, max_files_per_dir,
+      SVN_ERR(pack_shard(data_path, pb->fs, i, max_files_per_dir,
                          pb->notify_func, pb->notify_baton,
                          pb->cancel_func, pb->cancel_baton, iterpool));
     }
]]]
(still need to add comments explaining why the #ifndef guard)

> This doesn't mean we should keep the re-try loop that we're talking
> about removing.  I agree it should go.  It's just that I'd like to
> resolve this inconsistency, or at least agree about resolving it, before
> feeling comfortable enough to commit this myself.
> 
> 
> > > > Daniel, you wrote:
> > > >> But is it strictly *necessary* to do so?  I think not: by not calling
> > > >> update_min_unpacked_rev(), we cause ffd->min_unpacked_rev to become
> > > >> stale --- a situation which the code has to account for anyway.
> > > >
> > > > That was true, then, and it meant that every place that used this value
> > > > had to account for it possibly being stale.  But now we have realized
> > > > that the code shouldn't need to account for the possibility of the value
> > > > being stale if it is running within a transaction (with the write lock).
> > > >
> > > > Now, I think it would be better to change this.  Whenever the code
> > > > updates the min-foo on disk, it should also update the cached value.
> > > > That style of coding would be easier to reason about (to use an in-vogue
> > > > phrase), and then we would be able to remove the re-try in
> > > > svn_fs_fs__path_rev_absolute() without a concern.
> 
> > Making ffd->min_unpacked_rev stale less often does not mean that we will
> > have less places to account for it *possibly* being stale:
> 
> I think it *does*, because every piece of code that knows it has a write
> lock would not have to bother about that possibility.
> 

Code that has the write lock does not have to bother about that
possibility anyway, because with_some_lock() updates the ffd->min_*
caches after acquiring the lock and before invoking the under-the-lock
callback.

> >  it could
> > become stale due to other processes who write to the repository.
> > (except for "pack runs under the write lock" considerations)
> 
> It could become stale due to another process writing to the repository,
> except if *this* code is running under a write lock.  If *this* code has
> a write lock, it doesn't matter whether it's doing packing or not;
> another process can't make this process's cache get stale because
> another process can't get the write lock.
> 

Right: if we have the write lock, then the caches are not stale.

> > To borrow Philip's point: making ffd->min_unpacked_rev stale less often
> > might hide bugs.
> 
> That is one way of looking at it, when thinking about the asynchronous
> (no write lock) cases, but another way of looking at the same thing is
> it can help to *totally avoid* potential bugs in synchronous cases.
> 

Agreed, that's the flip side of the coin.  The #ifndef patch above tries
to find a middle ground between hiding and unearthing bugs here.

> - Julian
> 
>

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Julian Foad <ju...@wandisco.com>.

(Quoting and replying to two emails at once.)

First: I'm assuming that no process will ever do packing without getting
the exclusive write lock.  Can you confirm that for me?  If that's
false, all my reasoning would be bogus.

On Tue, 2010-12-07 at 12:13 +0200, Daniel Shahaf wrote:
> Philip Martin wrote on Tue, Dec 07, 2010 at 09:49:40 +0000:
> > Julian Foad <ju...@wandisco.com> writes:
> > 
> > > On Mon, 2010-12-06 at 18:44 +0000, Philip Martin wrote:
> > >> Julian Foad <ju...@wandisco.com> writes:
> > >> 
> > >> > I'm going to drop this "Remove the re-try logic from
> > >> > svn_fs_fs__path_rev_absolute()" follow-up patch, as I don't want to get
> > >> > into checking or messing with the txn-correctness of FSFS now.  If
> > >> > Daniel or anyone else wants to pursue it, I'd be glad to help.
> > >> 
> > >> I thought the patch was an improvement.  The existing retry in
> > >> svn_fs_fs__path_rev_absolute doesn't fix the race, it simply makes the
> > >> window smaller.  As the patch makes the window larger we are more likely
> > >> to see and fix any places where the caller doesn't handle the race.
> > >
> > > I *think* the problem is that a caller may use this function within a
> > > transaction and depend on this internal re-try logic simply to update a
> > > stale cached min-unpacked-foo value.  "Stale" in the sense that *this*
> > > transaction has changed the real value and hasn't updated the cached
> > > value.
> 
> Yes, it's conceivable that a caller might do:
> 
>   for (int i = 0; i < 2; i++)
>     {
>       ...
>       SVN_ERR(svn_fs_fs__path_rev_absolute());
>       ...
>       if (i == 0)
>         /* other process runs svn_fs_pack(); */;
> [...]

No, that's not what I mean.  I mean a caller might do:

  with a write lock:
    {
      svn_fs_pack();
      SVN_ERR(svn_fs_fs__path_rev_absolute(&path, ...));
      foo(path);
    }

where neither this code nor foo() re-tries.  With the current version of
svn_fs_fs__path_rev_absolute(), foo() will get the correct path, whereas
if we remove the internal re-try then foo() will get the wrong path.

> > That's not really a race, it's more of a logic error.

That's right.

>   If there are any
> > such cases then your patch should expose them and we will fix them.  Any
> > such errors are easy to handle, compared to the difficulty of fixing
> > races.
> > 
> > The real problem with the existing code is that it partially hides
> > races, by making them harder to trigger, but doesn't fix the race.  If
> > we want to fix the race properly making it easier to trigger is the
> > right thing to do.

Agreed.

What I meant to indicate is that, before making this change, I would
like to see clearly that a caller with the write lock cannot use an
out-of-date value.  One way to ensure that would be:

  update the cached value whenever we get the write lock (which we do);
  update the cached value whenever we update the value on disk (which we
don't);
  anything not using a write lock must accept that the cached value may
be stale, and re-try if necessary.

Another way would be:

  don't update the cached value when we get the write lock;
  don't update the cached value when we update the value on disk;
  every time we use the cached value within a write lock, either update
before using it or try and re-try if necessary;
  every time we use the cached value without a write lock, try and
re-try if necessary.

I don't have a strong preference between those, but the current
inconsistency between updating it when we get the write lock and not
updating it when we update the value on disk seems bad.

This doesn't mean we should keep the re-try loop that we're talking
about removing.  I agree it should go.  It's just that I'd like to
resolve this inconsistency, or at least agree about resolving it, before
feeling comfortable enough to commit this myself.

> As I said, I think all current callers are safe.  However, I agree that
> applying the patch will make it easier to detect the mistake if in the
> future we add callers to svn_fs_fs__path_rev_absolute() that do not
> retry as they should.

OK, that's great.

> > > Daniel, you wrote:
> > >> But is it strictly *necessary* to do so?  I think not: by not calling
> > >> update_min_unpacked_rev(), we cause ffd->min_unpacked_rev to become
> > >> stale --- a situation which the code has to account for anyway.
> > >
> > > That was true, then, and it meant that every place that used this value
> > > had to account for it possibly being stale.  But now we have realized
> > > that the code shouldn't need to account for the possibility of the value
> > > being stale if it is running within a transaction (with the write lock).
> > >
> > > Now, I think it would be better to change this.  Whenever the code
> > > updates the min-foo on disk, it should also update the cached value.
> > > That style of coding would be easier to reason about (to use an in-vogue
> > > phrase), and then we would be able to remove the re-try in
> > > svn_fs_fs__path_rev_absolute() without a concern.

> Making ffd->min_unpacked_rev stale less often does not mean that we will
> have less places to account for it *possibly* being stale:

I think it *does*, because every piece of code that knows it has a write
lock would not have to bother about that possibility.

>  it could
> become stale due to other processes who write to the repository.
> (except for "pack runs under the write lock" considerations)

It could become stale due to another process writing to the repository,
except if *this* code is running under a write lock.  If *this* code has
a write lock, it doesn't matter whether it's doing packing or not;
another process can't make this process's cache get stale because
another process can't get the write lock.

> To borrow Philip's point: making ffd->min_unpacked_rev stale less often
> might hide bugs.

That is one way of looking at it, when thinking about the asynchronous
(no write lock) cases, but another way of looking at the same thing is
it can help to *totally avoid* potential bugs in synchronous cases.

- Julian

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Philip Martin wrote on Tue, Dec 07, 2010 at 09:49:40 +0000:
> Julian Foad <ju...@wandisco.com> writes:
> 
> > On Mon, 2010-12-06 at 18:44 +0000, Philip Martin wrote:
> >> Julian Foad <ju...@wandisco.com> writes:
> >> 
> >> > I'm going to drop this "Remove the re-try logic from
> >> > svn_fs_fs__path_rev_absolute()" follow-up patch, as I don't want to get
> >> > into checking or messing with the txn-correctness of FSFS now.  If
> >> > Daniel or anyone else wants to pursue it, I'd be glad to help.
> >> 
> >> I thought the patch was an improvement.  The existing retry in
> >> svn_fs_fs__path_rev_absolute doesn't fix the race, it simply makes the
> >> window smaller.  As the patch makes the window larger we are more likely
> >> to see and fix any places where the caller doesn't handle the race.
> >
> > I *think* the problem is that a caller may use this function within a
> > transaction and depend on this internal re-try logic simply to update a
> > stale cached min-unpacked-foo value.  "Stale" in the sense that *this*
> > transaction has changed the real value and hasn't updated the cached
> > value.
> 
> That's not really a race, it's more of a logic error.  If there are any
> such cases then your patch should expose them and we will fix them.  Any
> such errors are easy to handle, compared to the difficulty of fixing
> races.
> 
> The real problem with the existing code is that it partially hides
> races, by making them harder to trigger, but doesn't fix the race.  If
> we want to fix the race properly making it easier to trigger is the
> right thing to do.
> 

As I said, I think all current callers are safe.  However, I agree that
applying the patch will make it easier to detect the mistake if in the
future we add callers to svn_fs_fs__path_rev_absolute() that do not
retry as they should.

> 
> > Daniel, you wrote:
> >> But is it strictly *necessary* to do so?  I think not: by not calling
> >> update_min_unpacked_rev(), we cause ffd->min_unpacked_rev to become
> >> stale --- a situation which the code has to account for anyway.
> >
> > That was true, then, and it meant that every place that used this value
> > had to account for it possibly being stale.  But now we have realized
> > that the code shouldn't need to account for the possibility of the value
> > being stale if it is running within a transaction (with the write lock).
> >
> > Now, I think it would be better to change this.  Whenever the code
> > updates the min-foo on disk, it should also update the cached value.
> > That style of coding would be easier to reason about (to use an in-vogue
> > phrase), and then we would be able to remove the re-try in
> > svn_fs_fs__path_rev_absolute() without a concern.
> >
> > Does that make sense?
> >
> > - Julian
> >
> >
> >
> 
> -- 
> Philip

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Philip Martin <ph...@wandisco.com>.

Julian Foad <ju...@wandisco.com> writes:

> On Mon, 2010-12-06 at 18:44 +0000, Philip Martin wrote:
>> Julian Foad <ju...@wandisco.com> writes:
>> 
>> > I'm going to drop this "Remove the re-try logic from
>> > svn_fs_fs__path_rev_absolute()" follow-up patch, as I don't want to get
>> > into checking or messing with the txn-correctness of FSFS now.  If
>> > Daniel or anyone else wants to pursue it, I'd be glad to help.
>> 
>> I thought the patch was an improvement.  The existing retry in
>> svn_fs_fs__path_rev_absolute doesn't fix the race, it simply makes the
>> window smaller.  As the patch makes the window larger we are more likely
>> to see and fix any places where the caller doesn't handle the race.
>
> I *think* the problem is that a caller may use this function within a
> transaction and depend on this internal re-try logic simply to update a
> stale cached min-unpacked-foo value.  "Stale" in the sense that *this*
> transaction has changed the real value and hasn't updated the cached
> value.

That's not really a race, it's more of a logic error.  If there are any
such cases then your patch should expose them and we will fix them.  Any
such errors are easy to handle, compared to the difficulty of fixing
races.

The real problem with the existing code is that it partially hides
races, by making them harder to trigger, but doesn't fix the race.  If
we want to fix the race properly making it easier to trigger is the
right thing to do.


> Daniel, you wrote:
>> But is it strictly *necessary* to do so?  I think not: by not calling
>> update_min_unpacked_rev(), we cause ffd->min_unpacked_rev to become
>> stale --- a situation which the code has to account for anyway.
>
> That was true, then, and it meant that every place that used this value
> had to account for it possibly being stale.  But now we have realized
> that the code shouldn't need to account for the possibility of the value
> being stale if it is running within a transaction (with the write lock).
>
> Now, I think it would be better to change this.  Whenever the code
> updates the min-foo on disk, it should also update the cached value.
> That style of coding would be easier to reason about (to use an in-vogue
> phrase), and then we would be able to remove the re-try in
> svn_fs_fs__path_rev_absolute() without a concern.
>
> Does that make sense?
>
> - Julian
>
>
>

-- 
Philip

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Julian Foad <ju...@wandisco.com>.

On Mon, 2010-12-06 at 18:44 +0000, Philip Martin wrote:
> Julian Foad <ju...@wandisco.com> writes:
> 
> > I'm going to drop this "Remove the re-try logic from
> > svn_fs_fs__path_rev_absolute()" follow-up patch, as I don't want to get
> > into checking or messing with the txn-correctness of FSFS now.  If
> > Daniel or anyone else wants to pursue it, I'd be glad to help.
> 
> I thought the patch was an improvement.  The existing retry in
> svn_fs_fs__path_rev_absolute doesn't fix the race, it simply makes the
> window smaller.  As the patch makes the window larger we are more likely
> to see and fix any places where the caller doesn't handle the race.

I *think* the problem is that a caller may use this function within a
transaction and depend on this internal re-try logic simply to update a
stale cached min-unpacked-foo value.  "Stale" in the sense that *this*
transaction has changed the real value and hasn't updated the cached
value.

Daniel, you wrote:
> But is it strictly *necessary* to do so?  I think not: by not calling
> update_min_unpacked_rev(), we cause ffd->min_unpacked_rev to become
> stale --- a situation which the code has to account for anyway.

That was true, then, and it meant that every place that used this value
had to account for it possibly being stale.  But now we have realized
that the code shouldn't need to account for the possibility of the value
being stale if it is running within a transaction (with the write lock).

Now, I think it would be better to change this.  Whenever the code
updates the min-foo on disk, it should also update the cached value.
That style of coding would be easier to reason about (to use an in-vogue
phrase), and then we would be able to remove the re-try in
svn_fs_fs__path_rev_absolute() without a concern.

Does that make sense?

- Julian

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Philip Martin <ph...@wandisco.com>.

Julian Foad <ju...@wandisco.com> writes:

> I'm going to drop this "Remove the re-try logic from
> svn_fs_fs__path_rev_absolute()" follow-up patch, as I don't want to get
> into checking or messing with the txn-correctness of FSFS now.  If
> Daniel or anyone else wants to pursue it, I'd be glad to help.

I thought the patch was an improvement.  The existing retry in
svn_fs_fs__path_rev_absolute doesn't fix the race, it simply makes the
window smaller.  As the patch makes the window larger we are more likely
to see and fix any places where the caller doesn't handle the race.

-- 
Philip

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Julian Foad <ju...@wandisco.com>.

I'm going to drop this "Remove the re-try logic from
svn_fs_fs__path_rev_absolute()" follow-up patch, as I don't want to get
into checking or messing with the txn-correctness of FSFS now.  If
Daniel or anyone else wants to pursue it, I'd be glad to help.

- Julian


On Thu, 2010-12-02, Daniel Shahaf wrote:
> Julian Foad wrote on Thu, Dec 02, 2010 at 14:33:19 +0000:
> > On Thu, 2010-12-02 at 14:56 +0200, Daniel Shahaf wrote:
> > > Julian Foad wrote on Thu, Dec 02, 2010 at 12:15:19 +0000:
> > > > Remove the re-try logic from svn_fs_fs__path_rev_absolute().  Since
> > > > r1040832, all its callers correctly account for the possibility of an
> > > > out-of-date result due to a concurrent packing operation.
> > > > 
> > > > The re-try logic was introduced in r875097 and reduced but did not eliminate
> > > > the window of opportunity for the caller to use an out-of-date result.
> > > > 
> > > > See the email thread <http://svn.haxx.se/dev/archive-2010-12/0019.shtml>,
> > > > subject "Re: svn commit: r1040832 - Port a fix for a FSFS packing race".
> > > > 
> > > > * subversion/libsvn_fs_fs/fs_fs.c
> > > >   (svn_fs_fs__path_rev_absolute): Remove the re-try logic.
> > > > 
> > > > * subversion/libsvn_fs_fs/fs_fs.h
> > > >   (svn_fs_fs__path_rev_absolute): Update the doc string accordingly.
> > > >  
> > > 
> > > In fact, doesn't the correctness of this change depend on the fact that
> > > svn_fs_fs__with_write_lock() calls update_min_unpacked_rev() before
> > > executing the body() callback?
> > 
> > Yes, it does.  Do you think it shouldn't?
> > 
> 
> I'm not sure whether it "should" or "shouldn't" depend.  The change IS
> correct, but it is only correct because the ffd->min_unpacked_rev is
> known to be up-to-date whenever the write lock is held.
> 
> That's a bit of a spaghetti effect there.  Now, is there another such
> effect that means the patch is wrong?  (I haven't come up with one yet,
> but that doesn't mean there isn't any.)
> 
> > >   (Otherwise we don't know whether there
> > > is a "concurrent" packer, or just ffd->min_unpacked_rev is stale.)
> > 
> > You know, I'm not too clear on the design intention here.  I would have
> > assumed it would be the job of all the fs_fs.c code to ensure that
> > ffd->min_unpacked_rev is never stale (due to its own actions) at any
> > point where it matters.  But the way it does that seems to be to re-read
> > the file in several places, instead of simply updating
> > ffd->min_unpacked_rev when it writes the file.  Maybe the idea was that
> > that method would pick up both internal and concurrent (external)
> > changes in the same way - I don't know.  Seems odd.
> > 
> 
> Yes, that was the idea.  If two processes access the filesystem at the
> same time, and one of them calls svn_fs_pack(), then the other's
> ffd->min_unpacked_rev would be stale.
> 
> It's the same story with ffd->youngest_rev, by the way; compare
> ensure_revision_exists(), which I believe predates the packing logic.
> 
> > Do you have an idea whether something should be done differently?  I
> > don't, not without studying it further.
> > 
> 
> I'm not sure.  Ultimately, ffd->min_unpacked_rev is just a cache, so we
> have to update it when we're about to do something that depends on its
> value.
> 
> In the meantime, let me at least attempt to define the "something" which
> perhaps should be done differently:
> 
> Given the way packing works today, it's possible for a revision file to
> stop existing where you expected it to exist.  Today the code informs
> itself of that situation by re-reading min-packed-rev and retrying.
> (Further up the stack, the code calculating the offset to seek to also
> needs to know whether it's seeking a pack file or a revision file.)
> 
> > I note that the following comment in pack_shard() is not quite right:
> > 
> >   /* Update the min-unpacked-rev file to reflect our newly packed shard.
> >    * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
> > 
> > That may or may not be true, but it doesn't seem like this function has
> > any right to make that assertion.
> > 
> 
> We could remove the comment and have that function call update_min_unpacked_rev().
> (This would also save us a failed disk access later.)
> 
> But is it strictly *necessary* to do so?  I think not: by not calling
> update_min_unpacked_rev(), we cause ffd->min_unpacked_rev to become
> stale --- a situation which the code has to account for anyway.
> 
> > And in pack_revprop_shard(), it's the same, verbatim:
> > 
> >   /* Update the min-unpacked-rev file to reflect our newly packed shard.
> > 
> > should say 'the min-unpacked-revprop file';
> > 
> >    * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
> > 
> > and that second line looks completely bogus in this context.
> > 
> 
> Yes, whoever did the copy-paste forgot to update some places.
> 
> > 
> > > > Index: subversion/libsvn_fs_fs/fs_fs.h
> > > > ===================================================================
> > > > --- subversion/libsvn_fs_fs/fs_fs.h	(revision 1041339)
> > > > +++ subversion/libsvn_fs_fs/fs_fs.h	(working copy)
> > > > @@ -411,8 +411,10 @@ svn_error_t *svn_fs_fs__move_into_place(
> > > >     Allocate *PATH in POOL.
> > > >  
> > > >     Note: If the caller does not have the write lock on FS, then the path is
> > > > -   not guaranteed to remain correct after the function returns, because the
> > > > -   revision might become packed just after this call. */
> > > > +   not guaranteed to be correct or to remain correct after the function
> > > > +   returns, because the revision might become packed before or after this
> > > > +   call.  If a file exists at that path, then it is correct; if not, then
> > > > +   the caller should call update_min_unpacked_rev() and re-try once. */
> > > 
> > > +1 semantically.  However, update_min_unpacked_rev() is private to
> > > fs_fs.c, so technically you aren't supposed to mention it in this
> > > context.
> > 
> > Good point.  Really that means any external caller of this API has to
> > hold the write lock.  Calling update_min_unpacked_rev() and re-trying is
> > only an option for internal callers (within fs_fs.c).  I wonder if this
> > should give us a clue about the questions above.
> > 
> 
> First, I still think having the update_min_unpacked_rev() reference
> there is useful.
> 
> Would it be a layering violation for code in libsvn_fs_fs outside of
> fs_fs.c to call update_min_unpacked_rev()?  I'm not sure.
> 
> > - Julian
> > 
> > 
> > > >  svn_error_t *
> > > >  svn_fs_fs__path_rev_absolute(const char **path,
> > > >                               svn_fs_t *fs,
> > > 
> > 
> >

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Julian Foad wrote on Thu, Dec 02, 2010 at 15:59:34 +0000:
> On Thu, 2010-12-02 at 17:40 +0200, Daniel Shahaf wrote:
> > Julian Foad wrote on Thu, Dec 02, 2010 at 15:34:48 +0000:
> > > First step: this patch fixes the comments.  Good to commit?
> > > 
> > > [[[
> > > Index: subversion/libsvn_fs_fs/fs_fs.c
> > > ===================================================================
> > > --- subversion/libsvn_fs_fs/fs_fs.c	(revision 1041350)
> > > +++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
> > > @@ -7606,8 +7606,8 @@ pack_shard(const char *revs_dir,
> > >    SVN_ERR(svn_io_set_file_read_only(manifest_file_path, FALSE, pool));
> > >  
> > >    /* Update the min-unpacked-rev file to reflect our newly packed shard.
> > > -   * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
> > > -   */
> > > +   * (This doesn't update ffd->min_unpacked_rev.  That will be updated by
> > > +   * open_pack_or_rev_file() when necessary.) */
> > 
> > Didn't you mean s/open_pack_or_rev_file/update_min_unpacked_rev/?
> 
> I was just modifying the existing comment and assuming it had some truth
> or meaning in mentioning open_pack_or_rev_file().  Not sure what,
> exactly.

Today, either open_pack_or_rev_file() or with_some_lock() may call
update_min_unpacked_rev().  The latter didn't call it at the time
I wrote the comment.

I'll update the wording in a moment.

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Julian Foad <ju...@wandisco.com>.

On Thu, 2010-12-02 at 17:40 +0200, Daniel Shahaf wrote:
> Julian Foad wrote on Thu, Dec 02, 2010 at 15:34:48 +0000:
> > First step: this patch fixes the comments.  Good to commit?
> > 
> > [[[
> > Index: subversion/libsvn_fs_fs/fs_fs.c
> > ===================================================================
> > --- subversion/libsvn_fs_fs/fs_fs.c	(revision 1041350)
> > +++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
> > @@ -7606,8 +7606,8 @@ pack_shard(const char *revs_dir,
> >    SVN_ERR(svn_io_set_file_read_only(manifest_file_path, FALSE, pool));
> >  
> >    /* Update the min-unpacked-rev file to reflect our newly packed shard.
> > -   * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
> > -   */
> > +   * (This doesn't update ffd->min_unpacked_rev.  That will be updated by
> > +   * open_pack_or_rev_file() when necessary.) */
> 
> Didn't you mean s/open_pack_or_rev_file/update_min_unpacked_rev/?

I was just modifying the existing comment and assuming it had some truth
or meaning in mentioning open_pack_or_rev_file().  Not sure what,
exactly.

> >    final_path = svn_dirent_join(fs_path, PATH_MIN_UNPACKED_REV, iterpool);
> >    SVN_ERR(svn_stream_open_unique(&tmp_stream, &tmp_path, fs_path,
> >                                     svn_io_file_del_none, iterpool, iterpool));
> > @@ -7674,9 +7674,8 @@ pack_revprop_shard(svn_fs_t *fs,
> >        SVN_ERR(svn_sqlite__insert(NULL, stmt));
> >      }
> >  
> > -  /* Update the min-unpacked-rev file to reflect our newly packed shard.
> > -   * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
> > -   */
> > +  /* Update the min-unpacked-revprop file to reflect our newly packed shard.
> > +   * (This doesn't update ffd->min_unpacked_revprop.) */
> >    final_path = svn_dirent_join(fs_path, PATH_MIN_UNPACKED_REVPROP, iterpool);
> >    SVN_ERR(svn_stream_open_unique(&tmp_stream, &tmp_path, fs_path,
> >                                   svn_io_file_del_none, iterpool, iterpool));
> > ]]]
> > 
> 
> +1

r1041422.  If you the point above still needs changing, please do so or
let me know.

- Julian

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Julian Foad wrote on Thu, Dec 02, 2010 at 15:34:48 +0000:
> First step: this patch fixes the comments.  Good to commit?
> 
> [[[
> Index: subversion/libsvn_fs_fs/fs_fs.c
> ===================================================================
> --- subversion/libsvn_fs_fs/fs_fs.c	(revision 1041350)
> +++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
> @@ -7606,8 +7606,8 @@ pack_shard(const char *revs_dir,
>    SVN_ERR(svn_io_set_file_read_only(manifest_file_path, FALSE, pool));
>  
>    /* Update the min-unpacked-rev file to reflect our newly packed shard.
> -   * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
> -   */
> +   * (This doesn't update ffd->min_unpacked_rev.  That will be updated by
> +   * open_pack_or_rev_file() when necessary.) */

Didn't you mean s/open_pack_or_rev_file/update_min_unpacked_rev/?

>    final_path = svn_dirent_join(fs_path, PATH_MIN_UNPACKED_REV, iterpool);
>    SVN_ERR(svn_stream_open_unique(&tmp_stream, &tmp_path, fs_path,
>                                     svn_io_file_del_none, iterpool, iterpool));
> @@ -7674,9 +7674,8 @@ pack_revprop_shard(svn_fs_t *fs,
>        SVN_ERR(svn_sqlite__insert(NULL, stmt));
>      }
>  
> -  /* Update the min-unpacked-rev file to reflect our newly packed shard.
> -   * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
> -   */
> +  /* Update the min-unpacked-revprop file to reflect our newly packed shard.
> +   * (This doesn't update ffd->min_unpacked_revprop.) */
>    final_path = svn_dirent_join(fs_path, PATH_MIN_UNPACKED_REVPROP, iterpool);
>    SVN_ERR(svn_stream_open_unique(&tmp_stream, &tmp_path, fs_path,
>                                   svn_io_file_del_none, iterpool, iterpool));
> ]]]
> 

+1

> - Julian
> 
>

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Julian Foad <ju...@wandisco.com>.

Daniel Shahaf wrote:
> Julian Foad wrote on Thu, Dec 02, 2010 at 14:33:19 +0000:
> > I note that the following comment in pack_shard() is not quite right:
> > 
> >   /* Update the min-unpacked-rev file to reflect our newly packed shard.
> >    * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
> > 
> > That may or may not be true, but it doesn't seem like this function has
> > any right to make that assertion.
> 
> We could remove the comment and have that function call update_min_unpacked_rev().
> (This would also save us a failed disk access later.)
> 
> But is it strictly *necessary* to do so?  I think not: by not calling
> update_min_unpacked_rev(), we cause ffd->min_unpacked_rev to become
> stale --- a situation which the code has to account for anyway.
> 
> > And in pack_revprop_shard(), it's the same, verbatim:
> > 
> >   /* Update the min-unpacked-rev file to reflect our newly packed shard.
> > 
> > should say 'the min-unpacked-revprop file';
> > 
> >    * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
> > 
> > and that second line looks completely bogus in this context.
> > 
> 
> Yes, whoever did the copy-paste forgot to update some places.

First step: this patch fixes the comments.  Good to commit?

[[[
Index: subversion/libsvn_fs_fs/fs_fs.c
===================================================================
--- subversion/libsvn_fs_fs/fs_fs.c	(revision 1041350)
+++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
@@ -7606,8 +7606,8 @@ pack_shard(const char *revs_dir,
   SVN_ERR(svn_io_set_file_read_only(manifest_file_path, FALSE, pool));
 
   /* Update the min-unpacked-rev file to reflect our newly packed shard.
-   * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
-   */
+   * (This doesn't update ffd->min_unpacked_rev.  That will be updated by
+   * open_pack_or_rev_file() when necessary.) */
   final_path = svn_dirent_join(fs_path, PATH_MIN_UNPACKED_REV, iterpool);
   SVN_ERR(svn_stream_open_unique(&tmp_stream, &tmp_path, fs_path,
                                    svn_io_file_del_none, iterpool, iterpool));
@@ -7674,9 +7674,8 @@ pack_revprop_shard(svn_fs_t *fs,
       SVN_ERR(svn_sqlite__insert(NULL, stmt));
     }
 
-  /* Update the min-unpacked-rev file to reflect our newly packed shard.
-   * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
-   */
+  /* Update the min-unpacked-revprop file to reflect our newly packed shard.
+   * (This doesn't update ffd->min_unpacked_revprop.) */
   final_path = svn_dirent_join(fs_path, PATH_MIN_UNPACKED_REVPROP, iterpool);
   SVN_ERR(svn_stream_open_unique(&tmp_stream, &tmp_path, fs_path,
                                  svn_io_file_del_none, iterpool, iterpool));
]]]

- Julian

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Julian Foad wrote on Thu, Dec 02, 2010 at 14:33:19 +0000:
> On Thu, 2010-12-02 at 14:56 +0200, Daniel Shahaf wrote:
> > Julian Foad wrote on Thu, Dec 02, 2010 at 12:15:19 +0000:
> > > Remove the re-try logic from svn_fs_fs__path_rev_absolute().  Since
> > > r1040832, all its callers correctly account for the possibility of an
> > > out-of-date result due to a concurrent packing operation.
> > > 
> > > The re-try logic was introduced in r875097 and reduced but did not eliminate
> > > the window of opportunity for the caller to use an out-of-date result.
> > > 
> > > See the email thread <http://svn.haxx.se/dev/archive-2010-12/0019.shtml>,
> > > subject "Re: svn commit: r1040832 - Port a fix for a FSFS packing race".
> > > 
> > > * subversion/libsvn_fs_fs/fs_fs.c
> > >   (svn_fs_fs__path_rev_absolute): Remove the re-try logic.
> > > 
> > > * subversion/libsvn_fs_fs/fs_fs.h
> > >   (svn_fs_fs__path_rev_absolute): Update the doc string accordingly.
> > >  
> > 
> > In fact, doesn't the correctness of this change depend on the fact that
> > svn_fs_fs__with_write_lock() calls update_min_unpacked_rev() before
> > executing the body() callback?
> 
> Yes, it does.  Do you think it shouldn't?
> 

I'm not sure whether it "should" or "shouldn't" depend.  The change IS
correct, but it is only correct because the ffd->min_unpacked_rev is
known to be up-to-date whenever the write lock is held.

That's a bit of a spaghetti effect there.  Now, is there another such
effect that means the patch is wrong?  (I haven't come up with one yet,
but that doesn't mean there isn't any.)

> >   (Otherwise we don't know whether there
> > is a "concurrent" packer, or just ffd->min_unpacked_rev is stale.)
> 
> You know, I'm not too clear on the design intention here.  I would have
> assumed it would be the job of all the fs_fs.c code to ensure that
> ffd->min_unpacked_rev is never stale (due to its own actions) at any
> point where it matters.  But the way it does that seems to be to re-read
> the file in several places, instead of simply updating
> ffd->min_unpacked_rev when it writes the file.  Maybe the idea was that
> that method would pick up both internal and concurrent (external)
> changes in the same way - I don't know.  Seems odd.
> 

Yes, that was the idea.  If two processes access the filesystem at the
same time, and one of them calls svn_fs_pack(), then the other's
ffd->min_unpacked_rev would be stale.

It's the same story with ffd->youngest_rev, by the way; compare
ensure_revision_exists(), which I believe predates the packing logic.

> Do you have an idea whether something should be done differently?  I
> don't, not without studying it further.
> 

I'm not sure.  Ultimately, ffd->min_unpacked_rev is just a cache, so we
have to update it when we're about to do something that depends on its
value.

In the meantime, let me at least attempt to define the "something" which
perhaps should be done differently:

Given the way packing works today, it's possible for a revision file to
stop existing where you expected it to exist.  Today the code informs
itself of that situation by re-reading min-packed-rev and retrying.
(Further up the stack, the code calculating the offset to seek to also
needs to know whether it's seeking a pack file or a revision file.)

> I note that the following comment in pack_shard() is not quite right:
> 
>   /* Update the min-unpacked-rev file to reflect our newly packed shard.
>    * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
> 
> That may or may not be true, but it doesn't seem like this function has
> any right to make that assertion.
> 

We could remove the comment and have that function call update_min_unpacked_rev().
(This would also save us a failed disk access later.)

But is it strictly *necessary* to do so?  I think not: by not calling
update_min_unpacked_rev(), we cause ffd->min_unpacked_rev to become
stale --- a situation which the code has to account for anyway.

> And in pack_revprop_shard(), it's the same, verbatim:
> 
>   /* Update the min-unpacked-rev file to reflect our newly packed shard.
> 
> should say 'the min-unpacked-revprop file';
> 
>    * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)
> 
> and that second line looks completely bogus in this context.
> 

Yes, whoever did the copy-paste forgot to update some places.

> 
> > > Index: subversion/libsvn_fs_fs/fs_fs.h
> > > ===================================================================
> > > --- subversion/libsvn_fs_fs/fs_fs.h	(revision 1041339)
> > > +++ subversion/libsvn_fs_fs/fs_fs.h	(working copy)
> > > @@ -411,8 +411,10 @@ svn_error_t *svn_fs_fs__move_into_place(
> > >     Allocate *PATH in POOL.
> > >  
> > >     Note: If the caller does not have the write lock on FS, then the path is
> > > -   not guaranteed to remain correct after the function returns, because the
> > > -   revision might become packed just after this call. */
> > > +   not guaranteed to be correct or to remain correct after the function
> > > +   returns, because the revision might become packed before or after this
> > > +   call.  If a file exists at that path, then it is correct; if not, then
> > > +   the caller should call update_min_unpacked_rev() and re-try once. */
> > 
> > +1 semantically.  However, update_min_unpacked_rev() is private to
> > fs_fs.c, so technically you aren't supposed to mention it in this
> > context.
> 
> Good point.  Really that means any external caller of this API has to
> hold the write lock.  Calling update_min_unpacked_rev() and re-trying is
> only an option for internal callers (within fs_fs.c).  I wonder if this
> should give us a clue about the questions above.
> 

First, I still think having the update_min_unpacked_rev() reference
there is useful.

Would it be a layering violation for code in libsvn_fs_fs outside of
fs_fs.c to call update_min_unpacked_rev()?  I'm not sure.

> - Julian
> 
> 
> > >  svn_error_t *
> > >  svn_fs_fs__path_rev_absolute(const char **path,
> > >                               svn_fs_t *fs,
> > 
> 
>

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Julian Foad <ju...@wandisco.com>.

On Thu, 2010-12-02 at 14:56 +0200, Daniel Shahaf wrote:
> Julian Foad wrote on Thu, Dec 02, 2010 at 12:15:19 +0000:
> > Remove the re-try logic from svn_fs_fs__path_rev_absolute().  Since
> > r1040832, all its callers correctly account for the possibility of an
> > out-of-date result due to a concurrent packing operation.
> > 
> > The re-try logic was introduced in r875097 and reduced but did not eliminate
> > the window of opportunity for the caller to use an out-of-date result.
> > 
> > See the email thread <http://svn.haxx.se/dev/archive-2010-12/0019.shtml>,
> > subject "Re: svn commit: r1040832 - Port a fix for a FSFS packing race".
> > 
> > * subversion/libsvn_fs_fs/fs_fs.c
> >   (svn_fs_fs__path_rev_absolute): Remove the re-try logic.
> > 
> > * subversion/libsvn_fs_fs/fs_fs.h
> >   (svn_fs_fs__path_rev_absolute): Update the doc string accordingly.
> > --This line, and those below, will be ignored--
> > 
> > Index: subversion/libsvn_fs_fs/fs_fs.c
> > ===================================================================
> > --- subversion/libsvn_fs_fs/fs_fs.c	(revision 1041339)
> > +++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
> > @@ -246,8 +246,6 @@ path_rev(svn_fs_t *fs, svn_revnum_t rev,
> >                                apr_psprintf(pool, "%ld", rev), NULL);
> >  }
> >  
> > -/* Returns the path of REV in FS, whether in a pack file or not.
> > -   Allocate in POOL. */
> >  svn_error_t *
> >  svn_fs_fs__path_rev_absolute(const char **path,
> >                               svn_fs_t *fs,
> > @@ -256,45 +254,16 @@ svn_fs_fs__path_rev_absolute(const char 
> >  {
> >    fs_fs_data_t *ffd = fs->fsap_data;
> >  
> > -  if (ffd->format < SVN_FS_FS__MIN_PACKED_FORMAT)
> > +  if (ffd->format < SVN_FS_FS__MIN_PACKED_FORMAT
> > +      || ! is_packed_rev(fs, rev))
> >      {
> >        *path = path_rev(fs, rev, pool);
> > -      return SVN_NO_ERROR;
> >      }
> > -
> > -  if (! is_packed_rev(fs, rev))
> > +  else
> >      {
> > -      svn_node_kind_t kind;
> > -
> > -      /* Initialize the return variable. */
> > -      *path = path_rev(fs, rev, pool);
> > -
> > -      SVN_ERR(svn_io_check_path(*path, &kind, pool));
> > -      if (kind == svn_node_file)
> > -        {
> > -          /* *path is already set correctly. */
> > -          return SVN_NO_ERROR;
> > -        }
> > -      else
> > -        {
> > -          /* Someone must have run 'svnadmin pack' while this fs object
> > -           * was open. */
> > -
> > -          SVN_ERR(update_min_unpacked_rev(fs, pool));
> > -
> > -          /* The rev really should be present now. */
> > -          if (! is_packed_rev(fs, rev))
> > -            return svn_error_createf(APR_ENOENT, NULL,
> > -                                     _("Revision file '%s' does not exist, "
> > -                                       "and r%ld is not packed"),
> > -                                     svn_dirent_local_style(*path, pool),
> > -                                     rev);
> > -          /* Fall through. */
> > -        }
> > +      *path = path_rev_packed(fs, rev, "pack", pool);
> >      }
> >  
> > -  *path = path_rev_packed(fs, rev, "pack", pool);
> > -
> >    return SVN_NO_ERROR;
> >  }
> >  
> 
> This part looks good.  I'm more concerned about a bug in the "All
> callers use the write lock, ..." analysis than about a bug in the
> above hunk.

Right.

> In fact, doesn't the correctness of this change depend on the fact that
> svn_fs_fs__with_write_lock() calls update_min_unpacked_rev() before
> executing the body() callback?

Yes, it does.  Do you think it shouldn't?

>   (Otherwise we don't know whether there
> is a "concurrent" packer, or just ffd->min_unpacked_rev is stale.)

You know, I'm not too clear on the design intention here.  I would have
assumed it would be the job of all the fs_fs.c code to ensure that
ffd->min_unpacked_rev is never stale (due to its own actions) at any
point where it matters.  But the way it does that seems to be to re-read
the file in several places, instead of simply updating
ffd->min_unpacked_rev when it writes the file.  Maybe the idea was that
that method would pick up both internal and concurrent (external)
changes in the same way - I don't know.  Seems odd.

Do you have an idea whether something should be done differently?  I
don't, not without studying it further.

I note that the following comment in pack_shard() is not quite right:

  /* Update the min-unpacked-rev file to reflect our newly packed shard.
   * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)

That may or may not be true, but it doesn't seem like this function has
any right to make that assertion.

And in pack_revprop_shard(), it's the same, verbatim:

  /* Update the min-unpacked-rev file to reflect our newly packed shard.

should say 'the min-unpacked-revprop file';

   * (ffd->min_unpacked_rev will be updated by open_pack_or_rev_file().)

and that second line looks completely bogus in this context.


> > Index: subversion/libsvn_fs_fs/fs_fs.h
> > ===================================================================
> > --- subversion/libsvn_fs_fs/fs_fs.h	(revision 1041339)
> > +++ subversion/libsvn_fs_fs/fs_fs.h	(working copy)
> > @@ -411,8 +411,10 @@ svn_error_t *svn_fs_fs__move_into_place(
> >     Allocate *PATH in POOL.
> >  
> >     Note: If the caller does not have the write lock on FS, then the path is
> > -   not guaranteed to remain correct after the function returns, because the
> > -   revision might become packed just after this call. */
> > +   not guaranteed to be correct or to remain correct after the function
> > +   returns, because the revision might become packed before or after this
> > +   call.  If a file exists at that path, then it is correct; if not, then
> > +   the caller should call update_min_unpacked_rev() and re-try once. */
> 
> +1 semantically.  However, update_min_unpacked_rev() is private to
> fs_fs.c, so technically you aren't supposed to mention it in this
> context.

Good point.  Really that means any external caller of this API has to
hold the write lock.  Calling update_min_unpacked_rev() and re-trying is
only an option for internal callers (within fs_fs.c).  I wonder if this
should give us a clue about the questions above.

- Julian


> >  svn_error_t *
> >  svn_fs_fs__path_rev_absolute(const char **path,
> >                               svn_fs_t *fs,
>

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Julian Foad wrote on Thu, Dec 02, 2010 at 12:15:19 +0000:
> On Wed, 2010-12-01, Daniel Shahaf wrote: 
> > Julian Foad wrote on Wed, Dec 01, 2010 at 12:32:45 +0000:
> > > (2) Doesn't the exact same race exist in *all* uses of
> > > svn_fs_fs__path_rev_absolute()?  There are five other calls to it,
> > 
> > As far as I can tell, apart from open_pack_or_rev_file(), all callers
> > always[1] run under the write lock.  Since pack operation take the write
> > lock too, it's not possible for the result of svn_fs_fs__path_rev_absolute()
> > to become outdated for those callers.
> 
> OK, that's good.  No change needed then.
> 
> 
> > > -/* Sets *PATH to the path of REV in FS, whether in a pack file or not.
> > > +/* Set *PATH to the path of REV in FS, whether in a pack file or not.
> > > +   The path is not guaranteed to remain correct after the function returns,
> > > +   because it is possible that the revision will become packed just after
> > > +   this call, so the caller should re-try once if the path is not found.
> > >     Allocate in POOL. */
> > 
> > Sounds right.
> > 
> > Bordering on bikeshed, I would suggest some changes:
> > 
> > * Separate describing what the function does ("Sets *PATH to FOO and
> >   allocates in POOL") from describing the conditions the caller should
> >   beware of / check for ("sometimes the return value will be out-of-date
> >   by the time you receive it").
> > 
> > * Mention that the non-guarantee is not in effect if the caller has the
> >   write lock.
> 
> OK, committed in r1041056 with these improvements.
> 

+1, thanks.

> > >  svn_error_t *
> > >  svn_fs_fs__path_rev_absolute(const char **path,
> > >                               svn_fs_t *fs,
> > >                               svn_revnum_t rev,
> > >                               apr_pool_t *pool);
> > 
> > Also, svn_fs_fs__path_rev_absolute() also does, internally, a single
> > repeat loop.  Theoretically this isn't needed any more now (since all
> > callers either run under the write lock or retry)...
> 
> Can you check the attached patch for this, please?
> 
> - Julian
> 
> 

> Remove the re-try logic from svn_fs_fs__path_rev_absolute().  Since
> r1040832, all its callers correctly account for the possibility of an
> out-of-date result due to a concurrent packing operation.
> 
> The re-try logic was introduced in r875097 and reduced but did not eliminate
> the window of opportunity for the caller to use an out-of-date result.
> 
> See the email thread <http://svn.haxx.se/dev/archive-2010-12/0019.shtml>,
> subject "Re: svn commit: r1040832 - Port a fix for a FSFS packing race".
> 
> * subversion/libsvn_fs_fs/fs_fs.c
>   (svn_fs_fs__path_rev_absolute): Remove the re-try logic.
> 
> * subversion/libsvn_fs_fs/fs_fs.h
>   (svn_fs_fs__path_rev_absolute): Update the doc string accordingly.
> --This line, and those below, will be ignored--
> 
> Index: subversion/libsvn_fs_fs/fs_fs.c
> ===================================================================
> --- subversion/libsvn_fs_fs/fs_fs.c	(revision 1041339)
> +++ subversion/libsvn_fs_fs/fs_fs.c	(working copy)
> @@ -246,8 +246,6 @@ path_rev(svn_fs_t *fs, svn_revnum_t rev,
>                                apr_psprintf(pool, "%ld", rev), NULL);
>  }
>  
> -/* Returns the path of REV in FS, whether in a pack file or not.
> -   Allocate in POOL. */
>  svn_error_t *
>  svn_fs_fs__path_rev_absolute(const char **path,
>                               svn_fs_t *fs,
> @@ -256,45 +254,16 @@ svn_fs_fs__path_rev_absolute(const char 
>  {
>    fs_fs_data_t *ffd = fs->fsap_data;
>  
> -  if (ffd->format < SVN_FS_FS__MIN_PACKED_FORMAT)
> +  if (ffd->format < SVN_FS_FS__MIN_PACKED_FORMAT
> +      || ! is_packed_rev(fs, rev))
>      {
>        *path = path_rev(fs, rev, pool);
> -      return SVN_NO_ERROR;
>      }
> -
> -  if (! is_packed_rev(fs, rev))
> +  else
>      {
> -      svn_node_kind_t kind;
> -
> -      /* Initialize the return variable. */
> -      *path = path_rev(fs, rev, pool);
> -
> -      SVN_ERR(svn_io_check_path(*path, &kind, pool));
> -      if (kind == svn_node_file)
> -        {
> -          /* *path is already set correctly. */
> -          return SVN_NO_ERROR;
> -        }
> -      else
> -        {
> -          /* Someone must have run 'svnadmin pack' while this fs object
> -           * was open. */
> -
> -          SVN_ERR(update_min_unpacked_rev(fs, pool));
> -
> -          /* The rev really should be present now. */
> -          if (! is_packed_rev(fs, rev))
> -            return svn_error_createf(APR_ENOENT, NULL,
> -                                     _("Revision file '%s' does not exist, "
> -                                       "and r%ld is not packed"),
> -                                     svn_dirent_local_style(*path, pool),
> -                                     rev);
> -          /* Fall through. */
> -        }
> +      *path = path_rev_packed(fs, rev, "pack", pool);
>      }
>  
> -  *path = path_rev_packed(fs, rev, "pack", pool);
> -
>    return SVN_NO_ERROR;
>  }
>  

This part looks good.  I'm more concerned about a bug in the "All
callers use the write lock, ..." analysis than about a bug in the
above hunk.

In fact, doesn't the correctness of this change depend on the fact that
svn_fs_fs__with_write_lock() calls update_min_unpacked_rev() before
executing the body() callback?  (Otherwise we don't know whether there
is a "concurrent" packer, or just ffd->min_unpacked_rev is stale.)

> Index: subversion/libsvn_fs_fs/fs_fs.h
> ===================================================================
> --- subversion/libsvn_fs_fs/fs_fs.h	(revision 1041339)
> +++ subversion/libsvn_fs_fs/fs_fs.h	(working copy)
> @@ -411,8 +411,10 @@ svn_error_t *svn_fs_fs__move_into_place(
>     Allocate *PATH in POOL.
>  
>     Note: If the caller does not have the write lock on FS, then the path is
> -   not guaranteed to remain correct after the function returns, because the
> -   revision might become packed just after this call. */
> +   not guaranteed to be correct or to remain correct after the function
> +   returns, because the revision might become packed before or after this
> +   call.  If a file exists at that path, then it is correct; if not, then
> +   the caller should call update_min_unpacked_rev() and re-try once. */

+1 semantically.  However, update_min_unpacked_rev() is private to
fs_fs.c, so technically you aren't supposed to mention it in this
context.

>  svn_error_t *
>  svn_fs_fs__path_rev_absolute(const char **path,
>                               svn_fs_t *fs,

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Julian Foad <ju...@wandisco.com>.

On Wed, 2010-12-01, Daniel Shahaf wrote: 
> Julian Foad wrote on Wed, Dec 01, 2010 at 12:32:45 +0000:
> > On Wed, 2010-12-01, stefan2@apache.org wrote:
> > > Port (not merge) a fix for a FSFS packing race condition from the
> > > performance branch to /trunk: There is a slight time window
> > > between finding the name of a rev file and actually open it. If
> > > the revision in question gets packed just within this window,
> > > we will find the new (and final) file name with just one retry.
> > > 
> > > * subversion/libsvn_fs_fs/fs_fs.c
> > >   (open_pack_or_rev_file): retry once upon "missing file" error
> > 
> > Hi Stefan.
> > 
> > (1) I think the doc string of svn_fs_fs__path_rev_absolute() should note
> > that the returned path may no longer be valid by the time you come to
> > use it, and the reason for that.  Does my patch below sound right?
> > 
> 
> Well, yes, and Stefan already added such a comment at the definition at
> my suggestion.  But +1 to move that to the declaration.
> 
> > (2) Doesn't the exact same race exist in *all* uses of
> > svn_fs_fs__path_rev_absolute()?  There are five other calls to it,
> 
> As far as I can tell, apart from open_pack_or_rev_file(), all callers
> always[1] run under the write lock.  Since pack operation take the write
> lock too, it's not possible for the result of svn_fs_fs__path_rev_absolute()
> to become outdated for those callers.

OK, that's good.  No change needed then.


> [1] with the exception of set_revision_proplist(), which can be called
> either under the write lock or under write_revision_zero() --- but in
> the latter case it's called before the format file has been created.
> 
> > all of them using the result as a permissions reference to set the
> > file permissions on some new file.  It seems to me that those uses can
> > fail in the same way.
> > 
> > The root problem is the existence of a "get the path of this file" API
> > in the presence of semantics whereby the file might be removed from that
> > path at any time.
> > 
> > Perhaps your fix is the best fix for this caller, but for the other
> > callers I think creating and using a "copy file permissions from
> > rev-file of rev R" API would be a better solution than adding re-tries
> > there.  That API can do the re-try internally.
> > 
> > What do you think?
> > 
> > 
> > Patch for (1) above:
> > [[[
> > Index: subversion/libsvn_fs_fs/fs_fs.h
> > ===================================================================
> > --- subversion/libsvn_fs_fs/fs_fs.h	(revision 1040662)
> > +++ subversion/libsvn_fs_fs/fs_fs.h	(working copy)
> > @@ -404,13 +404,16 @@ svn_error_t *svn_fs_fs__txn_changes_fetc
> >     PERMS_REFERENCE.  Temporary allocations are from POOL. */
> >  svn_error_t *svn_fs_fs__move_into_place(const char *old_filename,
> >                                          const char *new_filename,
> >                                          const char *perms_reference,
> >                                          apr_pool_t *pool);
> >  
> > -/* Sets *PATH to the path of REV in FS, whether in a pack file or not.
> > +/* Set *PATH to the path of REV in FS, whether in a pack file or not.
> > +   The path is not guaranteed to remain correct after the function returns,
> > +   because it is possible that the revision will become packed just after
> > +   this call, so the caller should re-try once if the path is not found.
> >     Allocate in POOL. */
> 
> Sounds right.
> 
> Bordering on bikeshed, I would suggest some changes:
> 
> * Separate describing what the function does ("Sets *PATH to FOO and
>   allocates in POOL") from describing the conditions the caller should
>   beware of / check for ("sometimes the return value will be out-of-date
>   by the time you receive it").
> 
> * Mention that the non-guarantee is not in effect if the caller has the
>   write lock.

OK, committed in r1041056 with these improvements.

> >  svn_error_t *
> >  svn_fs_fs__path_rev_absolute(const char **path,
> >                               svn_fs_t *fs,
> >                               svn_revnum_t rev,
> >                               apr_pool_t *pool);
> 
> Also, svn_fs_fs__path_rev_absolute() also does, internally, a single
> repeat loop.  Theoretically this isn't needed any more now (since all
> callers either run under the write lock or retry)...

Can you check the attached patch for this, please?

- Julian

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Stefan Fuhrmann wrote on Sun, Dec 05, 2010 at 23:45:50 +0100:
> On 01.12.2010 14:16, Daniel Shahaf wrote:
>> Julian Foad wrote on Wed, Dec 01, 2010 at 12:32:45 +0000:
>>> On Wed, 2010-12-01, stefan2@apache.org wrote:
>>>> Port (not merge) a fix for a FSFS packing race condition from the
>>>> performance branch to /trunk: There is a slight time window
>>>> between finding the name of a rev file and actually open it. If
>>>> the revision in question gets packed just within this window,
>>>> we will find the new (and final) file name with just one retry.
>>>>
>>>> * subversion/libsvn_fs_fs/fs_fs.c
>>>>    (open_pack_or_rev_file): retry once upon "missing file" error
>>> Hi Stefan.
>>>
>>> (1) I think the doc string of svn_fs_fs__path_rev_absolute() should note
>>> that the returned path may no longer be valid by the time you come to
>>> use it, and the reason for that.  Does my patch below sound right?
>>>
>> Well, yes, and Stefan already added such a comment at the definition at
>> my suggestion.  But +1 to move that to the declaration.
>>
>
>>> Patch for (1) above:
>>> [[[
>>> Index: subversion/libsvn_fs_fs/fs_fs.h
>>> ===================================================================
>>> --- subversion/libsvn_fs_fs/fs_fs.h	(revision 1040662)
>>> +++ subversion/libsvn_fs_fs/fs_fs.h	(working copy)
>>> @@ -404,13 +404,16 @@ svn_error_t *svn_fs_fs__txn_changes_fetc
>>>      PERMS_REFERENCE.  Temporary allocations are from POOL. */
>>>   svn_error_t *svn_fs_fs__move_into_place(const char *old_filename,
>>>                                           const char *new_filename,
>>>                                           const char *perms_reference,
>>>                                           apr_pool_t *pool);
>>>
>>> -/* Sets *PATH to the path of REV in FS, whether in a pack file or not.
>>> +/* Set *PATH to the path of REV in FS, whether in a pack file or not.
>>> +   The path is not guaranteed to remain correct after the function returns,
>>> +   because it is possible that the revision will become packed just after
>>> +   this call, so the caller should re-try once if the path is not found.
>>>      Allocate in POOL. */
>> Sounds right.
>>
>> Bordering on bikeshed, I would suggest some changes:
>>
>> * Separate describing what the function does ("Sets *PATH to FOO and
>>    allocates in POOL") from describing the conditions the caller should
>>    beware of / check for ("sometimes the return value will be out-of-date
>>    by the time you receive it").
>>
>> * Mention that the non-guarantee is not in effect if the caller has the
>>    write lock.
>>
> I took the comment from the performance branch and added the
> info about guarantees in presence of a write lock in r1042479.
>

Thanks.

However, Julian already fixed this in fs_fs.h.  Normally we keep the doc
strings only at the declaration of a function, and have no comments (or
only comments aimed at people patching the function itself --- but not
at people who call the function) at the definition.

> -- Stefan^2.

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Stefan Fuhrmann <eq...@web.de>.

On 01.12.2010 14:16, Daniel Shahaf wrote:
> Julian Foad wrote on Wed, Dec 01, 2010 at 12:32:45 +0000:
>> On Wed, 2010-12-01, stefan2@apache.org wrote:
>>> Port (not merge) a fix for a FSFS packing race condition from the
>>> performance branch to /trunk: There is a slight time window
>>> between finding the name of a rev file and actually open it. If
>>> the revision in question gets packed just within this window,
>>> we will find the new (and final) file name with just one retry.
>>>
>>> * subversion/libsvn_fs_fs/fs_fs.c
>>>    (open_pack_or_rev_file): retry once upon "missing file" error
>> Hi Stefan.
>>
>> (1) I think the doc string of svn_fs_fs__path_rev_absolute() should note
>> that the returned path may no longer be valid by the time you come to
>> use it, and the reason for that.  Does my patch below sound right?
>>
> Well, yes, and Stefan already added such a comment at the definition at
> my suggestion.  But +1 to move that to the declaration.
>

>> Patch for (1) above:
>> [[[
>> Index: subversion/libsvn_fs_fs/fs_fs.h
>> ===================================================================
>> --- subversion/libsvn_fs_fs/fs_fs.h	(revision 1040662)
>> +++ subversion/libsvn_fs_fs/fs_fs.h	(working copy)
>> @@ -404,13 +404,16 @@ svn_error_t *svn_fs_fs__txn_changes_fetc
>>      PERMS_REFERENCE.  Temporary allocations are from POOL. */
>>   svn_error_t *svn_fs_fs__move_into_place(const char *old_filename,
>>                                           const char *new_filename,
>>                                           const char *perms_reference,
>>                                           apr_pool_t *pool);
>>
>> -/* Sets *PATH to the path of REV in FS, whether in a pack file or not.
>> +/* Set *PATH to the path of REV in FS, whether in a pack file or not.
>> +   The path is not guaranteed to remain correct after the function returns,
>> +   because it is possible that the revision will become packed just after
>> +   this call, so the caller should re-try once if the path is not found.
>>      Allocate in POOL. */
> Sounds right.
>
> Bordering on bikeshed, I would suggest some changes:
>
> * Separate describing what the function does ("Sets *PATH to FOO and
>    allocates in POOL") from describing the conditions the caller should
>    beware of / check for ("sometimes the return value will be out-of-date
>    by the time you receive it").
>
> * Mention that the non-guarantee is not in effect if the caller has the
>    write lock.
>
I took the comment from the performance branch and added the
info about guarantees in presence of a write lock in r1042479.

-- Stefan^2.

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Julian Foad <ju...@wandisco.com>.

On Wed, 2010-12-01, Daniel Shahaf wrote: 
> Julian Foad wrote on Wed, Dec 01, 2010 at 12:32:45 +0000:
> > On Wed, 2010-12-01, stefan2@apache.org wrote:
> > > Port (not merge) a fix for a FSFS packing race condition from the
> > > performance branch to /trunk: There is a slight time window
> > > between finding the name of a rev file and actually open it. If
> > > the revision in question gets packed just within this window,
> > > we will find the new (and final) file name with just one retry.
> > > 
> > > * subversion/libsvn_fs_fs/fs_fs.c
> > >   (open_pack_or_rev_file): retry once upon "missing file" error
> > 
> > Hi Stefan.
> > 
> > (1) I think the doc string of svn_fs_fs__path_rev_absolute() should note
> > that the returned path may no longer be valid by the time you come to
> > use it, and the reason for that.  Does my patch below sound right?
> > 
> 
> Well, yes, and Stefan already added such a comment at the definition at
> my suggestion.  But +1 to move that to the declaration.
> 
> > (2) Doesn't the exact same race exist in *all* uses of
> > svn_fs_fs__path_rev_absolute()?  There are five other calls to it,
> 
> As far as I can tell, apart from open_pack_or_rev_file(), all callers
> always[1] run under the write lock.  Since pack operation take the write
> lock too, it's not possible for the result of svn_fs_fs__path_rev_absolute()
> to become outdated for those callers.

OK, that's good.  No change needed then.


> [1] with the exception of set_revision_proplist(), which can be called
> either under the write lock or under write_revision_zero() --- but in
> the latter case it's called before the format file has been created.
> 
> > all of them using the result as a permissions reference to set the
> > file permissions on some new file.  It seems to me that those uses can
> > fail in the same way.
> > 
> > The root problem is the existence of a "get the path of this file" API
> > in the presence of semantics whereby the file might be removed from that
> > path at any time.
> > 
> > Perhaps your fix is the best fix for this caller, but for the other
> > callers I think creating and using a "copy file permissions from
> > rev-file of rev R" API would be a better solution than adding re-tries
> > there.  That API can do the re-try internally.
> > 
> > What do you think?
> > 
> > 
> > Patch for (1) above:
> > [[[
> > Index: subversion/libsvn_fs_fs/fs_fs.h
> > ===================================================================
> > --- subversion/libsvn_fs_fs/fs_fs.h	(revision 1040662)
> > +++ subversion/libsvn_fs_fs/fs_fs.h	(working copy)
> > @@ -404,13 +404,16 @@ svn_error_t *svn_fs_fs__txn_changes_fetc
> >     PERMS_REFERENCE.  Temporary allocations are from POOL. */
> >  svn_error_t *svn_fs_fs__move_into_place(const char *old_filename,
> >                                          const char *new_filename,
> >                                          const char *perms_reference,
> >                                          apr_pool_t *pool);
> >  
> > -/* Sets *PATH to the path of REV in FS, whether in a pack file or not.
> > +/* Set *PATH to the path of REV in FS, whether in a pack file or not.
> > +   The path is not guaranteed to remain correct after the function returns,
> > +   because it is possible that the revision will become packed just after
> > +   this call, so the caller should re-try once if the path is not found.
> >     Allocate in POOL. */
> 
> Sounds right.
> 
> Bordering on bikeshed, I would suggest some changes:
> 
> * Separate describing what the function does ("Sets *PATH to FOO and
>   allocates in POOL") from describing the conditions the caller should
>   beware of / check for ("sometimes the return value will be out-of-date
>   by the time you receive it").
> 
> * Mention that the non-guarantee is not in effect if the caller has the
>   write lock.

OK, committed in r1041056 with these improvements.

> >  svn_error_t *
> >  svn_fs_fs__path_rev_absolute(const char **path,
> >                               svn_fs_t *fs,
> >                               svn_revnum_t rev,
> >                               apr_pool_t *pool);
> 
> Also, svn_fs_fs__path_rev_absolute() also does, internally, a single
> repeat loop.  Theoretically this isn't needed any more now (since all
> callers either run under the write lock or retry)...

Can you check the attached patch for this, please?

- Julian

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Julian Foad wrote on Wed, Dec 01, 2010 at 12:32:45 +0000:
> On Wed, 2010-12-01, stefan2@apache.org wrote:
> > Port (not merge) a fix for a FSFS packing race condition from the
> > performance branch to /trunk: There is a slight time window
> > between finding the name of a rev file and actually open it. If
> > the revision in question gets packed just within this window,
> > we will find the new (and final) file name with just one retry.
> > 
> > * subversion/libsvn_fs_fs/fs_fs.c
> >   (open_pack_or_rev_file): retry once upon "missing file" error
> 
> Hi Stefan.
> 
> (1) I think the doc string of svn_fs_fs__path_rev_absolute() should note
> that the returned path may no longer be valid by the time you come to
> use it, and the reason for that.  Does my patch below sound right?
> 

Well, yes, and Stefan already added such a comment at the definition at
my suggestion.  But +1 to move that to the declaration.

> (2) Doesn't the exact same race exist in *all* uses of
> svn_fs_fs__path_rev_absolute()?  There are five other calls to it,

As far as I can tell, apart from open_pack_or_rev_file(), all callers
always[1] run under the write lock.  Since pack operation take the write
lock too, it's not possible for the result of svn_fs_fs__path_rev_absolute()
to become outdated for those callers.



[1] with the exception of set_revision_proplist(), which can be called
either under the write lock or under write_revision_zero() --- but in
the latter case it's called before the format file has been created.

> all of them using the result as a permissions reference to set the
> file permissions on some new file.  It seems to me that those uses can
> fail in the same way.
> 
> The root problem is the existence of a "get the path of this file" API
> in the presence of semantics whereby the file might be removed from that
> path at any time.
> 
> Perhaps your fix is the best fix for this caller, but for the other
> callers I think creating and using a "copy file permissions from
> rev-file of rev R" API would be a better solution than adding re-tries
> there.  That API can do the re-try internally.
> 
> What do you think?
> 
> 
> Patch for (1) above:
> [[[
> Index: subversion/libsvn_fs_fs/fs_fs.h
> ===================================================================
> --- subversion/libsvn_fs_fs/fs_fs.h	(revision 1040662)
> +++ subversion/libsvn_fs_fs/fs_fs.h	(working copy)
> @@ -404,13 +404,16 @@ svn_error_t *svn_fs_fs__txn_changes_fetc
>     PERMS_REFERENCE.  Temporary allocations are from POOL. */
>  svn_error_t *svn_fs_fs__move_into_place(const char *old_filename,
>                                          const char *new_filename,
>                                          const char *perms_reference,
>                                          apr_pool_t *pool);
>  
> -/* Sets *PATH to the path of REV in FS, whether in a pack file or not.
> +/* Set *PATH to the path of REV in FS, whether in a pack file or not.
> +   The path is not guaranteed to remain correct after the function returns,
> +   because it is possible that the revision will become packed just after
> +   this call, so the caller should re-try once if the path is not found.
>     Allocate in POOL. */

Sounds right.

Bordering on bikeshed, I would suggest some changes:

* Separate describing what the function does ("Sets *PATH to FOO and
  allocates in POOL") from describing the conditions the caller should
  beware of / check for ("sometimes the return value will be out-of-date
  by the time you receive it").

* Mention that the non-guarantee is not in effect if the caller has the
  write lock.

>  svn_error_t *
>  svn_fs_fs__path_rev_absolute(const char **path,
>                               svn_fs_t *fs,
>                               svn_revnum_t rev,
>                               apr_pool_t *pool);

Also, svn_fs_fs__path_rev_absolute() also does, internally, a single
repeat loop.  Theoretically this isn't needed any more now (since all
callers either run under the write lock or retry)...

> ]]]
> 
> - Julian
> 
>

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Stefan Fuhrmann <eq...@web.de>.

On 02.12.2010 08:18, Daniel Shahaf wrote:
>> On Wed, 2010-12-01, stefan2@apache.org wrote:
>>> Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
>>> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1040832&r1=1040831&r2=1040832&view=diff
>>> ==============================================================================
>>> --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original)
>>> +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Wed Dec  1 00:15:11 2010
>>> @@ -1903,21 +1903,42 @@ open_pack_or_rev_file(apr_file_t **file,
>>>   {
>>>     svn_error_t *err;
>>>     const char *path;
>>>     svn_boolean_t retry = FALSE;
>>>
> I agree the code below is correct, but I found it confusing:
>
>>>     do
>>>       {
>>>         err = svn_fs_fs__path_rev_absolute(&path, fs, rev, pool);
>>>
>>>         /* open the revision file in buffered r/o mode */
>>>         if (! err)
>>>           err = svn_io_file_open(file, path,
>>>                                 APR_READ | APR_BUFFERED, APR_OS_DEFAULT, pool);
>>>
>>>         if (err&&  APR_STATUS_IS_ENOENT(err->apr_err))
>>>           {
>>>             /* Could not open the file. This may happen if the
>>>              * file once existed but got packed later. */
>>>             svn_error_clear(err);
>>>
>>>             /* if that was our 2nd attempt, leave it at that. */
>>>             if (retry)
>>>               return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL,
>>>                                       _("No such revision %ld"), rev);
>>>
>>>             /* we failed for the first time. Refresh cache&  retry. */
>>>             SVN_ERR(update_min_unpacked_rev(fs, pool));
>>>
> Philip noted that this call should be guarded by a format number check
> (otherwise we would assert on format-3 repositories that are missing
> a rev file).  I've fixed that.
>
Thanks.
>>>             retry = TRUE;
>>>           }
>>>         else
>>>           {
>>>             /* the file exists but something prevented us from opnening it */
>>>             return svn_error_return(err);
> The comment doesn't indicate that the else{} block is also entered in
> the rare case where ERR is SVN_NO_ERROR.
>
> In other words, the "success" case is handled not by the 'return SVN_NO_ERROR'
> below (which in fact is never reached), but by this else{} block.
Yup. The comment is just one of those "last minute improvements" ...
>>>           }
>>>       }
>>>     while (err);
>>>
>>>     return SVN_NO_ERROR;
>>>   }
> The error handling confused me here: it clears ERR but then checks that
> it's non-NULL, and right after that check (which normally means "there
> is an error") it overwrites ERR.  I think the loop would be clearer if
> were just 'while (1)'.
r1042478 should make the logic more obvious.

-- Stefan^2.

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

> On Wed, 2010-12-01, stefan2@apache.org wrote:
> > Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
> > URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1040832&r1=1040831&r2=1040832&view=diff
> > ==============================================================================
> > --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original)
> > +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Wed Dec  1 00:15:11 2010
> > @@ -1903,21 +1903,42 @@ open_pack_or_rev_file(apr_file_t **file,
> >  {
> >    svn_error_t *err;
> >    const char *path;
> >    svn_boolean_t retry = FALSE;
> >  

I agree the code below is correct, but I found it confusing:

> >    do
> >      {
> >        err = svn_fs_fs__path_rev_absolute(&path, fs, rev, pool);
> >  
> >        /* open the revision file in buffered r/o mode */
> >        if (! err)
> >          err = svn_io_file_open(file, path,
> >                                APR_READ | APR_BUFFERED, APR_OS_DEFAULT, pool);
> >  
> >        if (err && APR_STATUS_IS_ENOENT(err->apr_err))
> >          {
> >            /* Could not open the file. This may happen if the
> >             * file once existed but got packed later. */
> >            svn_error_clear(err);
> >  
> >            /* if that was our 2nd attempt, leave it at that. */
> >            if (retry)
> >              return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL,
> >                                      _("No such revision %ld"), rev);
> >  
> >            /* we failed for the first time. Refresh cache & retry. */
> >            SVN_ERR(update_min_unpacked_rev(fs, pool));
> >  

Philip noted that this call should be guarded by a format number check
(otherwise we would assert on format-3 repositories that are missing
a rev file).  I've fixed that.

> >            retry = TRUE;
> >          }
> >        else
> >          {
> >            /* the file exists but something prevented us from opnening it */
> >            return svn_error_return(err);

The comment doesn't indicate that the else{} block is also entered in
the rare case where ERR is SVN_NO_ERROR.

In other words, the "success" case is handled not by the 'return SVN_NO_ERROR'
below (which in fact is never reached), but by this else{} block.

> >          }
> >      }
> >    while (err);
> >  
> >    return SVN_NO_ERROR;
> >  }

The error handling confused me here: it clears ERR but then checks that
it's non-NULL, and right after that check (which normally means "there
is an error") it overwrites ERR.  I think the loop would be clearer if
were just 'while (1)'.

> 
>

Re: svn commit: r1040832 - Port a fix for a FSFS packing race

Posted by Julian Foad <ju...@wandisco.com>.

On Wed, 2010-12-01, stefan2@apache.org wrote:
> Port (not merge) a fix for a FSFS packing race condition from the
> performance branch to /trunk: There is a slight time window
> between finding the name of a rev file and actually open it. If
> the revision in question gets packed just within this window,
> we will find the new (and final) file name with just one retry.
> 
> * subversion/libsvn_fs_fs/fs_fs.c
>   (open_pack_or_rev_file): retry once upon "missing file" error

Hi Stefan.

(1) I think the doc string of svn_fs_fs__path_rev_absolute() should note
that the returned path may no longer be valid by the time you come to
use it, and the reason for that.  Does my patch below sound right?

(2) Doesn't the exact same race exist in *all* uses of
svn_fs_fs__path_rev_absolute()?  There are five other calls to it, all
of them using the result as a permissions reference to set the file
permissions on some new file.  It seems to me that those uses can fail
in the same way.

The root problem is the existence of a "get the path of this file" API
in the presence of semantics whereby the file might be removed from that
path at any time.

Perhaps your fix is the best fix for this caller, but for the other
callers I think creating and using a "copy file permissions from
rev-file of rev R" API would be a better solution than adding re-tries
there.  That API can do the re-try internally.

What do you think?


Patch for (1) above:
[[[
Index: subversion/libsvn_fs_fs/fs_fs.h
===================================================================
--- subversion/libsvn_fs_fs/fs_fs.h	(revision 1040662)
+++ subversion/libsvn_fs_fs/fs_fs.h	(working copy)
@@ -404,13 +404,16 @@ svn_error_t *svn_fs_fs__txn_changes_fetc
    PERMS_REFERENCE.  Temporary allocations are from POOL. */
 svn_error_t *svn_fs_fs__move_into_place(const char *old_filename,
                                         const char *new_filename,
                                         const char *perms_reference,
                                         apr_pool_t *pool);
 
-/* Sets *PATH to the path of REV in FS, whether in a pack file or not.
+/* Set *PATH to the path of REV in FS, whether in a pack file or not.
+   The path is not guaranteed to remain correct after the function returns,
+   because it is possible that the revision will become packed just after
+   this call, so the caller should re-try once if the path is not found.
    Allocate in POOL. */
 svn_error_t *
 svn_fs_fs__path_rev_absolute(const char **path,
                              svn_fs_t *fs,
                              svn_revnum_t rev,
                              apr_pool_t *pool);
]]]

- Julian


> Modified: subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c
> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c?rev=1040832&r1=1040831&r2=1040832&view=diff
> ==============================================================================
> --- subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c (original)
> +++ subversion/trunk/subversion/libsvn_fs_fs/fs_fs.c Wed Dec  1 00:15:11 2010
> @@ -1903,21 +1903,42 @@ open_pack_or_rev_file(apr_file_t **file,
>  {
>    svn_error_t *err;
>    const char *path;
>    svn_boolean_t retry = FALSE;
>  
>    do
>      {
>        err = svn_fs_fs__path_rev_absolute(&path, fs, rev, pool);
>  
>        /* open the revision file in buffered r/o mode */
>        if (! err)
>          err = svn_io_file_open(file, path,
>                                APR_READ | APR_BUFFERED, APR_OS_DEFAULT, pool);
>  
>        if (err && APR_STATUS_IS_ENOENT(err->apr_err))
>          {
>            /* Could not open the file. This may happen if the
>             * file once existed but got packed later. */
>            svn_error_clear(err);
>  
>            /* if that was our 2nd attempt, leave it at that. */
>            if (retry)
>              return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL,
>                                      _("No such revision %ld"), rev);
>  
>            /* we failed for the first time. Refresh cache & retry. */
>            SVN_ERR(update_min_unpacked_rev(fs, pool));
>  
>            retry = TRUE;
>          }
>        else
>          {
>            /* the file exists but something prevented us from opnening it */
>            return svn_error_return(err);
>          }
>      }
>    while (err);
>  
>    return SVN_NO_ERROR;
>  }