You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by NextGen$ <ne...@emu.freenetproject.org> on 2007/02/24 22:31:25 UTC

Error: Found malformed header in revision file

Hi,

	Today I have experienced a weird problem with subversion... I googled
	a bit for it and found
	http://svn.haxx.se/users/archive-2004-12/0453.shtml ... So I joined
	the irc channel where I have been told to mail this list.

	Here is what has occured : On the client side I have tried to commit
	and got the following error on the first try :

commit -m "Add the same thing for backgroundFetchers" /home/nextgens/src/eclipse/Freenet 0.7/src/freenet/client/async/USKManager.java /home/nextgens/src/eclipse/Freenet 0.7/src/freenet/clients/http/StatisticsToadlet.java
    Sending        /home/nextgens/src/eclipse/Freenet 0.7/src/freenet/client/async/USKManager.java
    Sending        /home/nextgens/src/eclipse/Freenet 0.7/src/freenet/clients/http/StatisticsToadlet.java
    Transmitting file data ...
    svn: Commit failed (details follow):
    svn: Commit failed (details follow):
svn: Found malformed header in revision file
svn: MERGE of '/svn/trunk/freenet/src/freenet': 500 Internal Server Error (https://emu.freenetproject.org)
    svn: Found malformed header in revision file
svn: PROPFIND of '/svn/trunk/freenet': 500 Internal Server Error (https://emu.freenetproject.org)

	I supposed it was a temporary glitch and retried :

commit -m "Add the same metric for backgroundFetchers" /home/nextgens/src/eclipse/Freenet 0.7/src/freenet/client/async/USKManager.java /home/nextgens/src/eclipse/Freenet 0.7/src/freenet/clients/http/StatisticsToadlet.java
    svn: Commit failed (details follow):
    svn: Commit failed (details follow):
svn: Found malformed header in revision file
svn: PROPFIND of '/svn/trunk/freenet/src/freenet': 500 Internal Server Error (https://emu.freenetproject.org)

	Since then I have no access to the repository whatever I do. I am using subclipse and javaHL on the client side.

	On the server side: I know the postcommit hook has been triggered (It has generated http://archives.freenetproject.org/message/20070224.213047.3025a94e.en.html) in spite of the failure message on the client.

	I can provide both client and server side files, including apache logs if helpfull;
	I am using the FSFS backend and apache as the only "provider".

	nextgens@emu:~$ svn --version
	svn, version 1.4.2 (r22196)
	   compiled Nov 10 2006, 17:39:50
	
	nextgens@emu:~$ svnserve --version
	svnserve, version 1.4.2 (r22196)
	   compiled Nov 10 2006, 17:39:50

	svnadmin verify /path/to/repo says:
	[...]
	* Verified revision 11911.
	svnadmin: Found malformed header in revision file

	I have enclosed the revelant revision files in the mail.
	I hope it is a bug you will catch ... and not a hardware problem.

	Thanks in advance for investigating that bug.

NextGen$
PS: please Cc me when you reply; I am not subscribed to your mailing list.

Re: Error: Found malformed header in revision file

Posted by "Florent Daignière (NextGen$)" <ne...@emu.freenetproject.org>.
* Malcolm Rowe <ma...@farside.org.uk> [2007-02-28 06:03:49]:

> Hi nextgens,
> 
> Couple of questions for you below.
> 
> On Sat, Feb 24, 2007 at 10:31:25PM +0000, NextGen$ wrote:
> > 	Today I have experienced a weird problem with subversion... I googled
> > 	a bit for it and found
> > 	http://svn.haxx.se/users/archive-2004-12/0453.shtml ... So I joined
> > 	the irc channel where I have been told to mail this list.
> > 
> > 	Here is what has occured : On the client side I have tried to commit
> > 	svnadmin verify /path/to/repo says:
> > 	[...]
> > 	* Verified revision 11911.
> > 	svnadmin: Found malformed header in revision file
> > 
> 
> So I took a look at this problem on IRC right after nextgens posted this
> message.
> 
> Interestingly, it's not the 'normal' FSFS corruption problem
> ('underlapping' repeated writes while writing a delta rep), though it
> does involve data duplication.  The problem is at the first noderev:
> 
> DELTA...
> ENDREP
> DELTA...
> ENDREP
> id: 7x2.0.r11912/988
> type: file
> pred: 7x2.0.r11909/178
> count: 55
> text: 11912 831 132 42239 29931bab8843c518b26d9ebef4c9a131
> cpath: /trunk/freenet/src/freenet/clients/http/StatisticsToadlet.java
> copyroot: 0 /
> 
> id: 7x2.0.r11912/988
> type: file
> pred: 7x2.0.r11909/178
> count: 55
> text: 11912 831 132 42239 29931bab8843c518b26d9ebef4c9a131
> cpath: /trunk/freenet/src/freenet/clients/http/StatisticsToadlet.java
> copyroot: 0 /
> 
> PLAIN...
> 
> As you can see, the noderev's been duplicated.  The effect of this is to
> break all the subsequently recorded offsets for the following noderevs
> and the trailer line.  Removing one of those duplicates with dd fixed
> the problem.
> 
> So here's the question - how on earth did that happen?  We only ever
> write these out to the revision file once, in fs_fs.c:write_final_rev(),
> and we get the offset to use just before we write each one.
> 
> Okay, so one explanation of why the file offset is wrong might be
> because, for buffered files, APR doesn't call lseek() to get the current
> file offset, because it can't.  It uses it's own concept of the
> location, so if that gets out of sync, we'd see what we have here.
> 
> How could that have happened?  I'm not entirely sure.  The sequence of
> events here is:
> 
> 1. We get the offset for the noderev, which has the side effect of
>    flushing the APR file buffer to disk.  We know this offset is
>    correct because the offset for the noderev is correct.
> 2. We write out the noderev using a series of apr_file_write()'s.
>    This puts all the data into the APR file buffer (it's smaller than 4K,
>    the APR buffer size).
> 3. We get the offset for the directory rep (PLAIN), which flushes the
>    noderev to disk.  We know this offset is incorrect because of the
>    offset reported in the text part of the directory noderev (not shown).
>    [Hmm, actually I guess we don't know that for sure, but I'm going
>    with that theory for the moment.]
> 
> I'm pretty sure we're not trying to write out the noderev twice (i.e.
> calling write_final_rev() twice for the same file entry), since we
> unconditionally update the offset in the node id, and so the two
> noderevs above would have different offsets.
> 
> So I'm thinking that somewhere in #3, the APR file buffer is written
> twice.  I can think of only a few ways to achieve that effect, and none
> of them seem particularly likely:
> 
> a. There's a journalling filesystem in use, and it's decided to replay
>    the write of the noderev.  I don't think this is how journalling
>    filesystems actually work, but I might be wrong.
>    [nextgens: What OS and filesystem are you using for the repository?]
> 

I am using ext3 on GNU/Linux 2.6.18.2-grsec... I have attached to the
mail results from "tune2fs -l".

> b. There's a bug in APR's apr_file_flush() that's causing the same
>    buffer to be flushed to disk more than once.
> 
> c. There's a bug in APR's apr_file_seek()/setptr() that's causing the same
>    buffer to be flushed to disk more than once.
>    [nextgens: Can you tell us what version of APR you're using? (as
>               precisely as you can)]

I get "1.2.7" using "apr-config --version". I am using the
debian/testing package version 1.2.7-8.2.

> 
> d. Something Else.
> 
> For b., I can't really see where a bug would be, but I wouldn't mind
> someone else taking a look once nextgens confirms which version of APR to
> look at.  I assume that nowadays we can rely on write(2) not returning
> EINTR after a partial write - anyone know if this isn't the case?
> 
> For c., I again can't see anything obvious, but the implementation is
> non-trivial.  Now I know that APR 0.9.7 fixed a problem whereby errors
> encountered while flushing during seek weren't reported, but that's why
> we raised the minimum APR requirement to 0.9.7... and in any case, I
> can't see how that could have this effect.
> 
> Regards,
> Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Error: Found malformed header in revision file

Posted by "Florent Daignière (NextGen$)" <ne...@emu.freenetproject.org>.
* Christian Unger <cu...@apple.com> [2007-03-22 23:32:51]:

> Hi,
> 
> Is it possible that apache's
> 
> SetOutputFilter DEFLATE directive could be causing this?
> 
> This is just an uneducated guess.
> 
> 
> cu
> christian unger

I wasn't using mod_deflate on the server (I've just enabled it)

NextGen$

> 
> On Mar 22, 2007, at 11:16 PM, Florent Daignière (NextGen$) wrote:
> 
> >* Ph. Marek <ph...@bmlv.gv.at> [2007-03-16 07:54:55]:
> >
> >>On Friday 16 March 2007 00:40, Florent Daignière wrote:
> >>...
> >>>Today I have encountered an other problem with the same repository :
> >>...
> >>>I wonder if I'm really (un)lucky or if the hardware is to blame.
> >>As it seems to be (something like) reproduceable - may I suggest  
> >>using strace?
> >>	strace -f -tt -i -o /tmp/subversion-bug -p <pid of httpd> -p  
> >><other pid> ...
> >>would write a series of logfiles into /tmp/subversion-bug.PID -  
> >>you could keep
> >>only the last 100 or so, remove the other ones to keep storage  
> >>usage low.
> >>
> >>That would tell us whether it's software - and with a copy of
> >>	/proc/<offending pid>/maps
> >>we might see (from the instruction pointer) which library is to  
> >>blame,
> >>although that's more a hope than anything else.
> >>
> >>Regards,
> >>
> >>Phil
> >
> >Today the repository got corrupted twice (no software update has been
> >done in the meantime). I realized it too late to get the first strace
> >log ... but I got the second one. Unfortunately I was unable to get
> >the memory map from /proc (I'm using -grsec on my kernel and it hasn't
> >helped).
> >
> >Each time it has been a "Found malformed header in revision file"  
> >error
> >(like the first two ones I reported).
> >
> >Here are the two revision files:
> >http://emu.freenetproject.org/~nextgens/12276
> >http://emu.freenetproject.org/~nextgens/12281
> >
> >http://emu.freenetproject.org/~nextgens/subversion-httpd-strace.gz
> > - be carefull : 20M - I don't have the exact commit timestamp so I  
> >kept
> >data around... the post-commit hook has been triggered at 20:51 I  
> >think.
> >
> >I am reluctant to publish apache's logs here but may do it on demand.
> >
> >Thanks for your time,
> >	NextGen$
> 

Re: Error: Found malformed header in revision file

Posted by Christian Unger <cu...@apple.com>.
Hi,

Is it possible that apache's

SetOutputFilter DEFLATE directive could be causing this?

This is just an uneducated guess.


cu
christian unger




On Mar 22, 2007, at 11:16 PM, Florent Daignière (NextGen$) wrote:

> * Ph. Marek <ph...@bmlv.gv.at> [2007-03-16 07:54:55]:
>
>> On Friday 16 March 2007 00:40, Florent Daignière wrote:
>> ...
>>> Today I have encountered an other problem with the same repository :
>> ...
>>> I wonder if I'm really (un)lucky or if the hardware is to blame.
>> As it seems to be (something like) reproduceable - may I suggest  
>> using strace?
>> 	strace -f -tt -i -o /tmp/subversion-bug -p <pid of httpd> -p  
>> <other pid> ...
>> would write a series of logfiles into /tmp/subversion-bug.PID -  
>> you could keep
>> only the last 100 or so, remove the other ones to keep storage  
>> usage low.
>>
>> That would tell us whether it's software - and with a copy of
>> 	/proc/<offending pid>/maps
>> we might see (from the instruction pointer) which library is to  
>> blame,
>> although that's more a hope than anything else.
>>
>> Regards,
>>
>> Phil
>
> Today the repository got corrupted twice (no software update has been
> done in the meantime). I realized it too late to get the first strace
> log ... but I got the second one. Unfortunately I was unable to get
> the memory map from /proc (I'm using -grsec on my kernel and it hasn't
> helped).
>
> Each time it has been a "Found malformed header in revision file"  
> error
> (like the first two ones I reported).
>
> Here are the two revision files:
> http://emu.freenetproject.org/~nextgens/12276
> http://emu.freenetproject.org/~nextgens/12281
>
> http://emu.freenetproject.org/~nextgens/subversion-httpd-strace.gz
>  - be carefull : 20M - I don't have the exact commit timestamp so I  
> kept
> data around... the post-commit hook has been triggered at 20:51 I  
> think.
>
> I am reluctant to publish apache's logs here but may do it on demand.
>
> Thanks for your time,
> 	NextGen$


Re: Error: Found malformed header in revision file

Posted by "Florent Daignière (NextGen$)" <ne...@emu.freenetproject.org>.
* Malcolm Rowe <ma...@farside.org.uk> [2007-03-27 17:31:03]:

> On Thu, Mar 22, 2007 at 11:16:22PM +0100, Florent Daignière (NextGen$) wrote:
> > Today the repository got corrupted twice (no software update has been
> > done in the meantime). I realized it too late to get the first strace
> > log ... but I got the second one. Unfortunately I was unable to get
> > the memory map from /proc (I'm using -grsec on my kernel and it hasn't
> > helped).
> > 
> > Each time it has been a "Found malformed header in revision file" error
> > (like the first two ones I reported).
> > 
> 
> So I finally had a chance to look at this briefly last night.  I've
> attached the lines that refer directly to the rev-file; you can see that
> there's an 'obvious' duplication of the noderev.
> 
> Unfortunately, there's also a whole load of suspicious stuff happening
> between the first and second (duplicated) write, so I can't really be
> sure what's going on yet.
> 
> I notice you're using PHP - I wonder if this is another case of PHP
> "surely there can't be any other modules running" disease, and it's
> doing something to trip us up? (I'm thinking of the SIGCHLD reaping that
> it can carry out in some cases - this is obviously more serious).
> 
> Regards,
> Malcolm

Yes, I'm using suPHP.

NextGen$
PS: FYI, the server's hardware has been upgraded.

Re: Error: Found malformed header in revision file

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Thu, Mar 22, 2007 at 11:16:22PM +0100, Florent Daignière (NextGen$) wrote:
> Today the repository got corrupted twice (no software update has been
> done in the meantime). I realized it too late to get the first strace
> log ... but I got the second one. Unfortunately I was unable to get
> the memory map from /proc (I'm using -grsec on my kernel and it hasn't
> helped).
> 
> Each time it has been a "Found malformed header in revision file" error
> (like the first two ones I reported).
> 

So I finally had a chance to look at this briefly last night.  I've
attached the lines that refer directly to the rev-file; you can see that
there's an 'obvious' duplication of the noderev.

Unfortunately, there's also a whole load of suspicious stuff happening
between the first and second (duplicated) write, so I can't really be
sure what's going on yet.

I notice you're using PHP - I wonder if this is another case of PHP
"surely there can't be any other modules running" disease, and it's
doing something to trip us up? (I'm thinking of the SIGCHLD reaping that
it can carry out in some cases - this is obviously more serious).

Regards,
Malcolm

Re: Error: Found malformed header in revision file

Posted by "Florent Daignière (NextGen$)" <ne...@emu.freenetproject.org>.
* Ph. Marek <ph...@bmlv.gv.at> [2007-03-16 07:54:55]:

> On Friday 16 March 2007 00:40, Florent Daignière wrote:
> ...
> > Today I have encountered an other problem with the same repository :
> ...
> > I wonder if I'm really (un)lucky or if the hardware is to blame.
> As it seems to be (something like) reproduceable - may I suggest using strace?
> 	strace -f -tt -i -o /tmp/subversion-bug -p <pid of httpd> -p <other pid> ...
> would write a series of logfiles into /tmp/subversion-bug.PID - you could keep 
> only the last 100 or so, remove the other ones to keep storage usage low.
> 
> That would tell us whether it's software - and with a copy of 
> 	/proc/<offending pid>/maps
> we might see (from the instruction pointer) which library is to blame, 
> although that's more a hope than anything else.
> 
> Regards,
> 
> Phil

Today the repository got corrupted twice (no software update has been
done in the meantime). I realized it too late to get the first strace
log ... but I got the second one. Unfortunately I was unable to get
the memory map from /proc (I'm using -grsec on my kernel and it hasn't
helped).

Each time it has been a "Found malformed header in revision file" error
(like the first two ones I reported).

Here are the two revision files:
http://emu.freenetproject.org/~nextgens/12276
http://emu.freenetproject.org/~nextgens/12281

http://emu.freenetproject.org/~nextgens/subversion-httpd-strace.gz
 - be carefull : 20M - I don't have the exact commit timestamp so I kept
data around... the post-commit hook has been triggered at 20:51 I think.

I am reluctant to publish apache's logs here but may do it on demand.

Thanks for your time,
	NextGen$

Re: Error: Found malformed header in revision file

Posted by "Ph. Marek" <ph...@bmlv.gv.at>.
On Friday 16 March 2007 00:40, Florent Daignière wrote:
...
> Today I have encountered an other problem with the same repository :
...
> I wonder if I'm really (un)lucky or if the hardware is to blame.
As it seems to be (something like) reproduceable - may I suggest using strace?
	strace -f -tt -i -o /tmp/subversion-bug -p <pid of httpd> -p <other pid> ...
would write a series of logfiles into /tmp/subversion-bug.PID - you could keep 
only the last 100 or so, remove the other ones to keep storage usage low.

That would tell us whether it's software - and with a copy of 
	/proc/<offending pid>/maps
we might see (from the instruction pointer) which library is to blame, 
although that's more a hope than anything else.


Regards,

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Error: Found malformed header in revision file

Posted by christian unger <ch...@mac.com>.
Finally!


reported as issue 2726.


The workaround is to server-side remove the file, but  the revisions  
which hold the corrupted data are unusable.

My major concern is that you cannot recover that revision thus are  
not able to dump or sync the repository any longer.

cu
christian unger


On 16.03.2007, at 00:40, Florent Daignière (NextGen$) wrote:

> * Florent Daignière (NextGen$) <ne...@freenetproject.org>  
> [2007-03-07 13:59:28]:
>
>> * Florent Daignière (NextGen$) <ne...@freenetproject.org>  
>> [2007-02-28 14:52:43]:
>>
>>> * John Szakmeister <jo...@szakmeister.net> [2007-02-28 02:56:38]:
>>>
>>>> Ph. Marek wrote:
>>>>> On Wednesday 28 February 2007 07:03, Malcolm Rowe wrote:
>>>>> ...
>>>>>> So I'm thinking that somewhere in #3, the APR file buffer is  
>>>>>> written
>>>>>> twice.  I can think of only a few ways to achieve that effect,  
>>>>>> and none
>>>>>> of them seem particularly likely:
>>>>> ...
>>>>>
>>>>> If that's repeatable and on a platform allowing for API traces  
>>>>> (see eg.
>>>>> strace on linux -- similar things exist for other OS), that  
>>>>> could help a
>>>>> lot - it would tell us the sequence of events.
>>>>
>>>> That's been a huge problem.  Only 1 user has ever been able to  
>>>> repeat
>>>> the problem (and he no longer can).  Even then it was only after an
>>>> undetermined series of commits.  I've spent hours and hours  
>>>> trying...
>>>> and have had no luck.
>>>>
>>>> -John
>>>
>>> I doubt I will be able to reproduce it : In spite I kept both the  
>>> working
>>> copy and the directory containing the repository, I haven't  
>>> managed to
>>> so far...
>>>
>>> Moreover, I have done a dump/load of the repository up to latest  
>>> working
>>> version and sucessfully commited the *same* patch using the *same*
>>> client software.
>>>
>>> NextGen$
>>
>> Someone reproduced it yesterday on the same repository (same server,
>> same version of apache-mpm-worker, same client software : subclipse
>> 1.0.3 : an old version of javaSVN) attempting to commit a different
>> patch from a different working copy.
>>
>> I have enclosed the revision file to this mail in case it helps  
>> and will
>> ask our commiters to update their client software.
>>
>> NextGen$
>
> Today I have encountered an other problem with the same repository :
> on revision 12150 svnlook (started from the post-commit hook)
> complained about :
> "svnlook: Decompression of svndiff data failed"
> http://archives.freenetproject.org/message/ 
> 20070315.215906.f3661837.en.html
>
> Then I managed to commit r12151 sucessfully, and svnlook managed to do
> its job on it for some time ; a commit mail got genarated !
> http://archives.freenetproject.org/message/ 
> 20070315.221857.0d594b69.en.html
>
> and since then I can't access the repository anymore... errors and  
> symptoms
>  are different from last time... but the result is the same : the  
> repository
> is unavailable. The client software has been updated to subclipse  
> 1.2.0 in
>  the meantime.
>
> Apache's logs report :
> [Thu Mar 15 22:23:21 2007] [error] [client 208.181.82.58] Provider  
> encountered an error while streaming a REPORT response.  [500, #0]
> [Thu Mar 15 22:23:21 2007] [error] [client 208.181.82.58] A failure  
> occurred while driving the update report editor  [500, #185005]
> [Thu Mar 15 22:23:21 2007] [error] [client 208.181.82.58]  
> Decompression of svndiff data failed  [500, #185005]
>
> Here are links to revision files in case it helps:
> http://emu.freenetproject.org/~nextgens/12150
> http://emu.freenetproject.org/~nextgens/12151
>
> Apache and mod_dav_svn are the only frontend configured.
>
> nextgens@emu:~$ svn --version|head -1
> svn, version 1.4.2 (r22196)
> nextgens@emu:~$ svnlook --version|head -1
> svnlook, version 1.4.2 (r22196)
> nextgens@emu:~$ svnadmin --version|head -1
> svnadmin, version 1.4.2 (r22196)
> nextgens@emu:~$ dpkg -l libapache2-svn |grep ii
> ii  libapache2-svn 1.4.2dfsg1-2   Subversion server modules for Apache
> nextgens@emu:~$ dpkg -l apache2-mpm-* |grep ii
> ii  apache2-mpm-worker     2.2.3-3.3      High speed threaded model  
> for Apache HTTPD 2
>
> SMART doesn't report any hard-drive error (I know that's not a  
> proof it's
>  not faulty)  and the server has 98 days of uptime... There is no  
> obvious
>  kernel oops in logs nor complain about filesystem corruption...
>
> I wonder if I'm really (un)lucky or if the hardware is to blame.
>
> NextGen$

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Error: Found malformed header in revision file

Posted by John Szakmeister <jo...@szakmeister.net>.
----- "Christian Unger" <cu...@apple.com> wrote:
> Hi John,
> 
> 
> is there publically available information on how to fix broken revs?

No, not really.  Everything I've learned is by reading and understanding how the revs are formatted and how the backend works.  The single most helpful document is the one describing the structure of the fsfs backend:
  http://svn.collab.net/repos/svn/trunk/subversion/libsvn_fs_fs/structure

fsfsverify can help fix a number of issues as well:
  http://www.szakmeister.net/fsfsverify/

I wish I could say there is more. :-(

-John


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Error: Found malformed header in revision file

Posted by Christian Unger <cu...@apple.com>.
Hi John,

is there publically available information on how to fix broken revs?

cu
christian unger




On Mar 16, 2007, at 11:20 AM, John Szakmeister wrote:

> Florent Daignière (NextGen$) wrote:
> [snip]
>
>> Here are links to revision files in case it helps:
>> http://emu.freenetproject.org/~nextgens/12150
>> http://emu.freenetproject.org/~nextgens/12151
>
> I fixed all of the posted revs.  Here are the links:
>   http://www.szakmeister.net/11912
>   http://www.szakmeister.net/12006
>   http://www.szakmeister.net/12150
>
> The first two had a repeated node rev (as Malcolm pointed out).   
> The last one didn't appear to actually get the svndiff data written  
> out, and then restarted from the beginning by writing the svndiff  
> header and contents out again.
>
> -John
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>


Re: Error: Found malformed header in revision file

Posted by John Szakmeister <jo...@szakmeister.net>.
Florent Daignière (NextGen$) wrote:
[snip]

> Here are links to revision files in case it helps:
> http://emu.freenetproject.org/~nextgens/12150
> http://emu.freenetproject.org/~nextgens/12151

I fixed all of the posted revs.  Here are the links:
   http://www.szakmeister.net/11912
   http://www.szakmeister.net/12006
   http://www.szakmeister.net/12150

The first two had a repeated node rev (as Malcolm pointed out).  The 
last one didn't appear to actually get the svndiff data written out, and 
then restarted from the beginning by writing the svndiff header and 
contents out again.

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Error: Found malformed header in revision file

Posted by "Florent Daignière (NextGen$)" <ne...@emu.freenetproject.org>.
* Florent Daignière (NextGen$) <ne...@freenetproject.org> [2007-03-07 13:59:28]:

> * Florent Daignière (NextGen$) <ne...@freenetproject.org> [2007-02-28 14:52:43]:
> 
> > * John Szakmeister <jo...@szakmeister.net> [2007-02-28 02:56:38]:
> > 
> > > Ph. Marek wrote:
> > > >On Wednesday 28 February 2007 07:03, Malcolm Rowe wrote:
> > > >...
> > > >>So I'm thinking that somewhere in #3, the APR file buffer is written
> > > >>twice.  I can think of only a few ways to achieve that effect, and none
> > > >>of them seem particularly likely:
> > > >...
> > > >
> > > >If that's repeatable and on a platform allowing for API traces (see eg. 
> > > >strace on linux -- similar things exist for other OS), that could help a 
> > > >lot - it would tell us the sequence of events.
> > > 
> > > That's been a huge problem.  Only 1 user has ever been able to repeat 
> > > the problem (and he no longer can).  Even then it was only after an 
> > > undetermined series of commits.  I've spent hours and hours trying... 
> > > and have had no luck.
> > > 
> > > -John
> > 
> > I doubt I will be able to reproduce it : In spite I kept both the working
> > copy and the directory containing the repository, I haven't managed to
> > so far...
> > 
> > Moreover, I have done a dump/load of the repository up to latest working
> > version and sucessfully commited the *same* patch using the *same*
> > client software.
> > 
> > NextGen$
> 
> Someone reproduced it yesterday on the same repository (same server,
> same version of apache-mpm-worker, same client software : subclipse
> 1.0.3 : an old version of javaSVN) attempting to commit a different
> patch from a different working copy.
> 
> I have enclosed the revision file to this mail in case it helps and will
> ask our commiters to update their client software.
> 
> NextGen$

Today I have encountered an other problem with the same repository :
on revision 12150 svnlook (started from the post-commit hook)
complained about :
"svnlook: Decompression of svndiff data failed"
http://archives.freenetproject.org/message/20070315.215906.f3661837.en.html

Then I managed to commit r12151 sucessfully, and svnlook managed to do
its job on it for some time ; a commit mail got genarated !
http://archives.freenetproject.org/message/20070315.221857.0d594b69.en.html

and since then I can't access the repository anymore... errors and symptoms
 are different from last time... but the result is the same : the repository
is unavailable. The client software has been updated to subclipse 1.2.0 in
 the meantime.

Apache's logs report :
[Thu Mar 15 22:23:21 2007] [error] [client 208.181.82.58] Provider encountered an error while streaming a REPORT response.  [500, #0]
[Thu Mar 15 22:23:21 2007] [error] [client 208.181.82.58] A failure occurred while driving the update report editor  [500, #185005]
[Thu Mar 15 22:23:21 2007] [error] [client 208.181.82.58] Decompression of svndiff data failed  [500, #185005]

Here are links to revision files in case it helps:
http://emu.freenetproject.org/~nextgens/12150
http://emu.freenetproject.org/~nextgens/12151

Apache and mod_dav_svn are the only frontend configured.

nextgens@emu:~$ svn --version|head -1
svn, version 1.4.2 (r22196)
nextgens@emu:~$ svnlook --version|head -1
svnlook, version 1.4.2 (r22196)
nextgens@emu:~$ svnadmin --version|head -1
svnadmin, version 1.4.2 (r22196)
nextgens@emu:~$ dpkg -l libapache2-svn |grep ii
ii  libapache2-svn 1.4.2dfsg1-2   Subversion server modules for Apache
nextgens@emu:~$ dpkg -l apache2-mpm-* |grep ii
ii  apache2-mpm-worker     2.2.3-3.3      High speed threaded model for Apache HTTPD 2

SMART doesn't report any hard-drive error (I know that's not a proof it's
 not faulty)  and the server has 98 days of uptime... There is no obvious
 kernel oops in logs nor complain about filesystem corruption...

I wonder if I'm really (un)lucky or if the hardware is to blame.

NextGen$

Re: Error: Found malformed header in revision file

Posted by NextGen$ <ne...@emu.freenetproject.org>.
* John Szakmeister <jo...@szakmeister.net> [2007-03-09 04:52:24]:

> Florent Daignière (NextGen$) wrote:
> [snip]
> >I have enclosed the revision file to this mail in case it helps and will
> >ask our commiters to update their client software.
> 
> I think your mailer may have munged the attached patch (or mine is). 
> Several locations have too many LF characters.
> 
> -John

Here they are :

http://emu.freenetproject.org/~nextgens/12006 and 
http://emu.freenetproject.org/~nextgens/11912 (first corruption)

nextgens@emu:~$ md5sum public_html/11912 public_html/12006
da4018a4f9001363196f486a1d7f7bb1  public_html/11912
40420dc1719d09e1931f195a14e1e0f6  public_html/12006

NextGen$

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Error: Found malformed header in revision file

Posted by John Szakmeister <jo...@szakmeister.net>.
Florent Daignière (NextGen$) wrote:
[snip]
> I have enclosed the revision file to this mail in case it helps and will
> ask our commiters to update their client software.

I think your mailer may have munged the attached patch (or mine is). 
Several locations have too many LF characters.

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Error: Found malformed header in revision file

Posted by "Florent Daignière (NextGen$)" <ne...@emu.freenetproject.org>.
* Florent Daignière (NextGen$) <ne...@freenetproject.org> [2007-02-28 14:52:43]:

> * John Szakmeister <jo...@szakmeister.net> [2007-02-28 02:56:38]:
> 
> > Ph. Marek wrote:
> > >On Wednesday 28 February 2007 07:03, Malcolm Rowe wrote:
> > >...
> > >>So I'm thinking that somewhere in #3, the APR file buffer is written
> > >>twice.  I can think of only a few ways to achieve that effect, and none
> > >>of them seem particularly likely:
> > >...
> > >
> > >If that's repeatable and on a platform allowing for API traces (see eg. 
> > >strace on linux -- similar things exist for other OS), that could help a 
> > >lot - it would tell us the sequence of events.
> > 
> > That's been a huge problem.  Only 1 user has ever been able to repeat 
> > the problem (and he no longer can).  Even then it was only after an 
> > undetermined series of commits.  I've spent hours and hours trying... 
> > and have had no luck.
> > 
> > -John
> 
> I doubt I will be able to reproduce it : In spite I kept both the working
> copy and the directory containing the repository, I haven't managed to
> so far...
> 
> Moreover, I have done a dump/load of the repository up to latest working
> version and sucessfully commited the *same* patch using the *same*
> client software.
> 
> NextGen$

Someone reproduced it yesterday on the same repository (same server,
same version of apache-mpm-worker, same client software : subclipse
1.0.3 : an old version of javaSVN) attempting to commit a different
patch from a different working copy.

I have enclosed the revision file to this mail in case it helps and will
ask our commiters to update their client software.

NextGen$

Re: Error: Found malformed header in revision file

Posted by "Florent Daignière (NextGen$)" <ne...@emu.freenetproject.org>.
* John Szakmeister <jo...@szakmeister.net> [2007-02-28 02:56:38]:

> Ph. Marek wrote:
> >On Wednesday 28 February 2007 07:03, Malcolm Rowe wrote:
> >...
> >>So I'm thinking that somewhere in #3, the APR file buffer is written
> >>twice.  I can think of only a few ways to achieve that effect, and none
> >>of them seem particularly likely:
> >...
> >
> >If that's repeatable and on a platform allowing for API traces (see eg. 
> >strace on linux -- similar things exist for other OS), that could help a 
> >lot - it would tell us the sequence of events.
> 
> That's been a huge problem.  Only 1 user has ever been able to repeat 
> the problem (and he no longer can).  Even then it was only after an 
> undetermined series of commits.  I've spent hours and hours trying... 
> and have had no luck.
> 
> -John

I doubt I will be able to reproduce it : In spite I kept both the working
copy and the directory containing the repository, I haven't managed to
so far...

Moreover, I have done a dump/load of the repository up to latest working
version and sucessfully commited the *same* patch using the *same*
client software.

NextGen$

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Error: Found malformed header in revision file

Posted by John Szakmeister <jo...@szakmeister.net>.
Ph. Marek wrote:
> On Wednesday 28 February 2007 07:03, Malcolm Rowe wrote:
> ...
>> So I'm thinking that somewhere in #3, the APR file buffer is written
>> twice.  I can think of only a few ways to achieve that effect, and none
>> of them seem particularly likely:
> ...
> 
> If that's repeatable and on a platform allowing for API traces (see eg. strace 
> on linux -- similar things exist for other OS), that could help a lot - it 
> would tell us the sequence of events.

That's been a huge problem.  Only 1 user has ever been able to repeat 
the problem (and he no longer can).  Even then it was only after an 
undetermined series of commits.  I've spent hours and hours trying... 
and have had no luck.

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Error: Found malformed header in revision file

Posted by "Ph. Marek" <ph...@bmlv.gv.at>.
On Wednesday 28 February 2007 07:03, Malcolm Rowe wrote:
...
> So I'm thinking that somewhere in #3, the APR file buffer is written
> twice.  I can think of only a few ways to achieve that effect, and none
> of them seem particularly likely:
...

If that's repeatable and on a platform allowing for API traces (see eg. strace 
on linux -- similar things exist for other OS), that could help a lot - it 
would tell us the sequence of events.


Regards,

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Error: Found malformed header in revision file

Posted by Malcolm Rowe <ma...@farside.org.uk>.
Hi nextgens,

Couple of questions for you below.

On Sat, Feb 24, 2007 at 10:31:25PM +0000, NextGen$ wrote:
> 	Today I have experienced a weird problem with subversion... I googled
> 	a bit for it and found
> 	http://svn.haxx.se/users/archive-2004-12/0453.shtml ... So I joined
> 	the irc channel where I have been told to mail this list.
> 
> 	Here is what has occured : On the client side I have tried to commit
> 	svnadmin verify /path/to/repo says:
> 	[...]
> 	* Verified revision 11911.
> 	svnadmin: Found malformed header in revision file
> 

So I took a look at this problem on IRC right after nextgens posted this
message.

Interestingly, it's not the 'normal' FSFS corruption problem
('underlapping' repeated writes while writing a delta rep), though it
does involve data duplication.  The problem is at the first noderev:

DELTA...
ENDREP
DELTA...
ENDREP
id: 7x2.0.r11912/988
type: file
pred: 7x2.0.r11909/178
count: 55
text: 11912 831 132 42239 29931bab8843c518b26d9ebef4c9a131
cpath: /trunk/freenet/src/freenet/clients/http/StatisticsToadlet.java
copyroot: 0 /

id: 7x2.0.r11912/988
type: file
pred: 7x2.0.r11909/178
count: 55
text: 11912 831 132 42239 29931bab8843c518b26d9ebef4c9a131
cpath: /trunk/freenet/src/freenet/clients/http/StatisticsToadlet.java
copyroot: 0 /

PLAIN...

As you can see, the noderev's been duplicated.  The effect of this is to
break all the subsequently recorded offsets for the following noderevs
and the trailer line.  Removing one of those duplicates with dd fixed
the problem.

So here's the question - how on earth did that happen?  We only ever
write these out to the revision file once, in fs_fs.c:write_final_rev(),
and we get the offset to use just before we write each one.

Okay, so one explanation of why the file offset is wrong might be
because, for buffered files, APR doesn't call lseek() to get the current
file offset, because it can't.  It uses it's own concept of the
location, so if that gets out of sync, we'd see what we have here.

How could that have happened?  I'm not entirely sure.  The sequence of
events here is:

1. We get the offset for the noderev, which has the side effect of
   flushing the APR file buffer to disk.  We know this offset is
   correct because the offset for the noderev is correct.
2. We write out the noderev using a series of apr_file_write()'s.
   This puts all the data into the APR file buffer (it's smaller than 4K,
   the APR buffer size).
3. We get the offset for the directory rep (PLAIN), which flushes the
   noderev to disk.  We know this offset is incorrect because of the
   offset reported in the text part of the directory noderev (not shown).
   [Hmm, actually I guess we don't know that for sure, but I'm going
   with that theory for the moment.]

I'm pretty sure we're not trying to write out the noderev twice (i.e.
calling write_final_rev() twice for the same file entry), since we
unconditionally update the offset in the node id, and so the two
noderevs above would have different offsets.

So I'm thinking that somewhere in #3, the APR file buffer is written
twice.  I can think of only a few ways to achieve that effect, and none
of them seem particularly likely:

a. There's a journalling filesystem in use, and it's decided to replay
   the write of the noderev.  I don't think this is how journalling
   filesystems actually work, but I might be wrong.
   [nextgens: What OS and filesystem are you using for the repository?]

b. There's a bug in APR's apr_file_flush() that's causing the same
   buffer to be flushed to disk more than once.

c. There's a bug in APR's apr_file_seek()/setptr() that's causing the same
   buffer to be flushed to disk more than once.
   [nextgens: Can you tell us what version of APR you're using? (as
              precisely as you can)]

d. Something Else.

For b., I can't really see where a bug would be, but I wouldn't mind
someone else taking a look once nextgens confirms which version of APR to
look at.  I assume that nowadays we can rely on write(2) not returning
EINTR after a partial write - anyone know if this isn't the case?

For c., I again can't see anything obvious, but the implementation is
non-trivial.  Now I know that APR 0.9.7 fixed a problem whereby errors
encountered while flushing during seek weren't reported, but that's why
we raised the minimum APR requirement to 0.9.7... and in any case, I
can't see how that could have this effect.

Regards,
Malcolm