You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by John Szakmeister <jo...@szakmeister.net> on 2010/08/31 21:49:03 UTC

Another FSFS bug somewhere?

I can't be sure which version of SVN this occurred with (I believe it
was a very recent version), but I had a user email me the other day
about a broken revision.  After taking a look, it appears that SVN
picked the right offset, the right length, and the right checksums,
but the wrong revision number.  It looks like this was during a tag
operation (or a copy from a previous revision). The revision the
backend chose didn't even have a related file, and the contents
certainly were not the same.

Any ideas where I should look to find the cause of this problem? Also,
I noticed that files seem to have both a SHA-1 and a uniquifier, but
directories don't.  The structure document in libsvn_fs_fs doesn't
speak of this difference between files and directories.  It just says
that newer formats have it while older ones don't.  Should directories
have both the SHA-1 and unquifier?

Thanks!

-John

Re: Another FSFS bug somewhere?

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Sep 02, 2010 at 06:29:41PM +0300, Daniel Shahaf wrote:
> John Szakmeister wrote on Thu, Sep 02, 2010 at 07:09:43 -0400:
> > On Thu, Sep 2, 2010 at 6:48 AM, Stefan Sperling <st...@elego.de> wrote:
> > > John, if you had time for a quick IRC session where you could explain
> > > the ideas behind fsfsverify.py to me at a high level and answer questions,
> > > I'd be grateful. And I'd very much like to see its functionality inside
> > > of svnadmin verify/recover, partly because I believe that reimplementing
> > > it there would give me great insight into the problem :)
> > 
> > I can do that.  My schedule is a bit wonky though.  Early morning
> > Eastern Standard Time works best for me (I'm usually up 4:30ish).  Let
> > me know what works for you.
> > 
> 
> I'd love to join such an IRC session.

We can have it on #svn-dev, so it will be logged, and you can read
it later even if you happen to miss it.

Stefan

Re: Another FSFS bug somewhere?

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
John Szakmeister wrote on Thu, Sep 02, 2010 at 07:09:43 -0400:
> On Thu, Sep 2, 2010 at 6:48 AM, Stefan Sperling <st...@elego.de> wrote:
> > Given what I know about the scenario at the customer site, there seems to
> > be a correlation between revprop edits and corruption of revisions,

FWIW, r981921 makes sure rev files in new repositories are created
read-only, and is nominated for backport.

(Hmm... Now I realize it does not affect existing repositories, unless
the admin manually chmods -w the youngest rev file.)

> > John, if you had time for a quick IRC session where you could explain
> > the ideas behind fsfsverify.py to me at a high level and answer questions,
> > I'd be grateful. And I'd very much like to see its functionality inside
> > of svnadmin verify/recover, partly because I believe that reimplementing
> > it there would give me great insight into the problem :)
> 
> I can do that.  My schedule is a bit wonky though.  Early morning
> Eastern Standard Time works best for me (I'm usually up 4:30ish).  Let
> me know what works for you.
> 

I'd love to join such an IRC session.

> TTYL!
> 
> -John

Re: Another FSFS bug somewhere?

Posted by John Szakmeister <jo...@szakmeister.net>.
On Thu, Sep 2, 2010 at 6:48 AM, Stefan Sperling <st...@elego.de> wrote:
[snip]
> Just a possibly related note:
>
> I've been investigating broken FSFS revisions at a customer site,
> which fsfsverify.py was able to fix. fsfsverify.py reported
> "InvalidCompressedStream" and/or "InvalidWindow" errors.
> I haven't found the time yet to fully dig into the problem to figure out
> what really happened. I do have the corrupt and fixed revision files for
> analysis and will try to pin-point the problem based on them.

I think that's the repeated block issue.  When the problem occurs with
a compressed stream, it gets a little more interesting... but it's
usually recoverable.

> Given what I know about the scenario at the customer site, there seems to
> be a correlation between revprop edits and corruption of revisions,
> even of revisions unrelated to the revisions which received the revprop
> edits (though I'm not sure yet if that's really the case). Julian Foad
> says he's seen similar issues also possibly related to revprop edits,
> but it's unclear whether we're seeing the same problem.

I've seen several cases where propedits have gone bad and you end up
with an empty file.  I haven't seen much in the way of corruption on
that front though (no weird looking bytes, or a malformed file...
other than it being empty).  Perhaps there is a correlation between
editing properties and causing a subsequent problem in a revision.  I
haven't personally seen that trend though.  OTOH, a great deal of
people come to me for help, but are unwilling to share many details.
:-(

> I do think there could be a long-standing bug we need to fix.
> In the case I saw, the server was on 1.4, but in Julian's case the
> server was on 1.6. Maybe you're seeing the same or a similar problem,
> with a presumably "very recent" server?

This one was different than most that I had seen.  Almost everything
in the text: line was right except for the revision number.  It
referenced 38904, and it should have referenced 38910.  I know we've
had some caching bugs in the past, and I'm curious if this is another
one.

> John, if you had time for a quick IRC session where you could explain
> the ideas behind fsfsverify.py to me at a high level and answer questions,
> I'd be grateful. And I'd very much like to see its functionality inside
> of svnadmin verify/recover, partly because I believe that reimplementing
> it there would give me great insight into the problem :)

I can do that.  My schedule is a bit wonky though.  Early morning
Eastern Standard Time works best for me (I'm usually up 4:30ish).  Let
me know what works for you.

TTYL!

-John

Re: Another FSFS bug somewhere?

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Aug 31, 2010 at 05:49:03PM -0400, John Szakmeister wrote:
> I can't be sure which version of SVN this occurred with (I believe it
> was a very recent version), but I had a user email me the other day
> about a broken revision.  After taking a look, it appears that SVN
> picked the right offset, the right length, and the right checksums,
> but the wrong revision number.  It looks like this was during a tag
> operation (or a copy from a previous revision). The revision the
> backend chose didn't even have a related file, and the contents
> certainly were not the same.

Just a possibly related note:

I've been investigating broken FSFS revisions at a customer site,
which fsfsverify.py was able to fix. fsfsverify.py reported
"InvalidCompressedStream" and/or "InvalidWindow" errors.
I haven't found the time yet to fully dig into the problem to figure out
what really happened. I do have the corrupt and fixed revision files for
analysis and will try to pin-point the problem based on them.

Given what I know about the scenario at the customer site, there seems to
be a correlation between revprop edits and corruption of revisions,
even of revisions unrelated to the revisions which received the revprop
edits (though I'm not sure yet if that's really the case). Julian Foad
says he's seen similar issues also possibly related to revprop edits,
but it's unclear whether we're seeing the same problem.

I do think there could be a long-standing bug we need to fix.
In the case I saw, the server was on 1.4, but in Julian's case the
server was on 1.6. Maybe you're seeing the same or a similar problem,
with a presumably "very recent" server?
 
John, if you had time for a quick IRC session where you could explain
the ideas behind fsfsverify.py to me at a high level and answer questions,
I'd be grateful. And I'd very much like to see its functionality inside
of svnadmin verify/recover, partly because I believe that reimplementing
it there would give me great insight into the problem :)

Thanks,
Stefan