You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2012/07/07 16:09:38 UTC

modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Problem: svnsync of /repos/asf r1356317 fails with:
svnsync: E160013: File not found: revision 1356316, path '/lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html'

Using 1.7.0, but tnelson reports the same problem (and I suspect he uses
a more recent version).

Further investigation points to an added-without-copyfrom tree containing a modified file:

[[[
% svn log -qvr r1356317 https://svn.apache.org/repos/asf/
------------------------------------------------------------------------
r1356317 | rmuir | 2012-07-02 16:13:49 +0000 (Mon, 02 Jul 2012) | 1 line
Changed paths:
   A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA
   A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org
   A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache
   A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr
   A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler
   A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader
   M /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html
   .. (some 2000 more additions, all without copyfrom;  no changes above api-4_0_0_ALPHA)
------------------------------------------------------------------------

% perl -MAI -e "print the relevant portions of the changed-paths in the revision file"
_5.o-1158013.t1356285-tm5g add-dir false false /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA
_9.o-1158013.t1356285-tm5g add-dir false false /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org
_a.o-1158013.t1356285-tm5g add-dir false false /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache
_g.o-1158013.t1356285-tm5g add-dir false false /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr
_1u.o-1158013.t1356285-tm5g add-dir false false /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler
_2i.o-1158013.t1356285-tm5g add-dir false false /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader
_474.o-1158013.t1356285-tm5g modify-file true true /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html
]]]

[[[
% dump-noderev.pl $PWD /lucene/cms/trunk/content/solr/ r1356317 
id: a-1154617.o-1158013.r1356317/8041959
type: dir
pred: a-1154617.o-1158013.r1325510/1655
count: 46
text: 1356317 8041025 921 921 0a2b8b51c531ce2bda18c36042f741e9
cpath: /lucene/cms/trunk/content/solr
copyroot: 1158013 /lucene/cms/trunk/content/solr

% dump-noderev.pl $PWD /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/ r1356317
id: 5-1356317.o-1158013.r1356317/8040796
type: dir
count: 0
text: 1356317 8039544 1239 1239 ca089f246db49f6379be42e03354d290
cpath: /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA
copyroot: 1158013 /lucene/cms/trunk/content/solr

% dump-noderev.pl $PWD /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html r1356317 
id: 474-1356317.o-1158013.r1356317/7575027
type: file
count: 0
text: 1356317 6384884 1715 6603 b07ff835f1126c3b4020fad39a4a68fc 9c8ea9a62a4a8c9c5e010395aa3430cfce957e58 1356285-tm5g/_475
props: 1356317 7574980 34 0 25e6c2f7558b7484000d4d090dea5b92
cpath: /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html
copyroot: 1158013 /lucene/cms/trunk/content/solr

% xxd -s 6384884 -l 6 < db/revs/1356/1356317
0616cf4: 4445 4c54 410a                           DELTA.
]]]

The repository in question is fully world-readable --- there are no read
restrictions in authz:
https://svn.apache.org/repos/infra/infrastructure/trunk/subversion/authorization/asf-authorization-template
(sorry, committers-only link)

The access and operational logs are at people.apache.org:/x1/logarchive/eris/vc/2012/07/02*.bz2 .  The commit
started at 15:10:56 (rev-file birthtime), with the operational log
'commit' and the access log 'MERGE' entries starting at 16:11:33 and
finishing around 16:24:43.5.  The first references to r1356317 in the
logs are at 16:13:50.  (Perhaps the remaining 10 minutes were spent on
updating rep-cache.db?  We did experience issues (consistent with db
corruption) with rep-cache.db on that day.)

How can we have a 'modify-file' within an added-without-copyfrom tree?
Isn't svnsync correct to complain in this case?

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
(I'll answer in more detail later, just quick points for now)

Robert Muir wrote on Tue, Jul 10, 2012 at 08:24:55 -0400:
> On Tue, Jul 10, 2012 at 4:43 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > Robert:
> >
> > How did you commit r1356317?  (the import of the api-4_0_0_ALPHA docs to
> > the CMS)  Did you get any error?  Was
> > api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html
> > handled specially in some way?
> >
> > As per the below, that file was committed in a malformed way --- the
> > metadata claim that it was "modified" rather than "added" --- and we're
> > trying to find out how that happened.
> 
> Hi Daniel: I did a normal 'svn add' + 'svn commit' here.
> 
> I did had trouble committing this. In fact Steven Rowe (sarowe@) had
> to help cleanup so we could get the documentation out on the site for
> the release.
> Here is the timeline I remember:
> 
> 1. did a mega-commit (r1356265) before this for the Apache Lucene
> javadocs that took quite some time. This succeeded.
> 2. after that succeeded, I did this one for the Apache Solr javadocs.
> But, I think this commit possibly overlapped with some SVN maintenance
> on July 2 ?
> 
> #asfinfra IRC Bot ‏@infrabot
> svn .apache.org to be restarted briefly for short maintenance -- <danielsh>
> 11:05 AM - 2 Jul 12 via infrabot

I presume that's UTC-0400

> 

Good call.  Your commit was:
r1356317 | rmuir | 2012-07-02 16:13:49 +0000 (Mon, 02 Jul 2012)

The svn maintenance you refer to was at 16:05, related to rep-cache.db
errors.  (We should have them in httpd/irc logs.)

Your earlier commit was:
r1356265 | rmuir | 2012-07-02 14:44:31 +0000 (Mon, 02 Jul 2012)

and it touched ~5700 files.

> 3. during this time I had a frazzled connection in general, it was the
> Monday after the Washington-DC area storm outages.
> 
> I'm honestly having trouble remembering, but I am fairly positive I
> issued the 'svn commit' command before this maintenance. And I believe
> the commit succeeded, but the resulting buildbot action failed!
> 
> -- 
> lucidimagination.com

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Robert Muir <rc...@gmail.com>.
On Tue, Jul 10, 2012 at 4:43 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Robert:
>
> How did you commit r1356317?  (the import of the api-4_0_0_ALPHA docs to
> the CMS)  Did you get any error?  Was
> api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html
> handled specially in some way?
>
> As per the below, that file was committed in a malformed way --- the
> metadata claim that it was "modified" rather than "added" --- and we're
> trying to find out how that happened.

Hi Daniel: I did a normal 'svn add' + 'svn commit' here.

I did had trouble committing this. In fact Steven Rowe (sarowe@) had
to help cleanup so we could get the documentation out on the site for
the release.
Here is the timeline I remember:

1. did a mega-commit (r1356265) before this for the Apache Lucene
javadocs that took quite some time. This succeeded.
2. after that succeeded, I did this one for the Apache Solr javadocs.
But, I think this commit possibly overlapped with some SVN maintenance
on July 2 ?

#asfinfra IRC Bot ‏@infrabot
svn .apache.org to be restarted briefly for short maintenance -- <danielsh>
11:05 AM - 2 Jul 12 via infrabot

3. during this time I had a frazzled connection in general, it was the
Monday after the Washington-DC area storm outages.

I'm honestly having trouble remembering, but I am fairly positive I
issued the 'svn commit' command before this maintenance. And I believe
the commit succeeded, but the resulting buildbot action failed!

-- 
lucidimagination.com

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Robert:

How did you commit r1356317?  (the import of the api-4_0_0_ALPHA docs to
the CMS)  Did you get any error?  Was
api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html
handled specially in some way?

As per the below, that file was committed in a malformed way --- the
metadata claim that it was "modified" rather than "added" --- and we're
trying to find out how that happened.

Thanks

Daniel
(svn PMC hat + infra hat)


Daniel Shahaf wrote on Sat, Jul 07, 2012 at 15:09:38 +0100:
> Problem: svnsync of /repos/asf r1356317 fails with:
> svnsync: E160013: File not found: revision 1356316, path '/lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html'
> 
> Using 1.7.0, but tnelson reports the same problem (and I suspect he uses
> a more recent version).
> 
> Further investigation points to an added-without-copyfrom tree containing a modified file:
> 
> [[[
> % svn log -qvr r1356317 https://svn.apache.org/repos/asf/
> ------------------------------------------------------------------------
> r1356317 | rmuir | 2012-07-02 16:13:49 +0000 (Mon, 02 Jul 2012) | 1 line
> Changed paths:
>    A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA
>    A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org
>    A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache
>    A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr
>    A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler
>    A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader
>    M /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html
>    .. (some 2000 more additions, all without copyfrom;  no changes above api-4_0_0_ALPHA)


> ------------------------------------------------------------------------
> 
> % perl -MAI -e "print the relevant portions of the changed-paths in the revision file"
> _5.o-1158013.t1356285-tm5g add-dir false false /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA
> _9.o-1158013.t1356285-tm5g add-dir false false /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org
> _a.o-1158013.t1356285-tm5g add-dir false false /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache
> _g.o-1158013.t1356285-tm5g add-dir false false /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr
> _1u.o-1158013.t1356285-tm5g add-dir false false /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler
> _2i.o-1158013.t1356285-tm5g add-dir false false /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader
> _474.o-1158013.t1356285-tm5g modify-file true true /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html
> ]]]
> 
> [[[
> % dump-noderev.pl $PWD /lucene/cms/trunk/content/solr/ r1356317 
> id: a-1154617.o-1158013.r1356317/8041959
> type: dir
> pred: a-1154617.o-1158013.r1325510/1655
> count: 46
> text: 1356317 8041025 921 921 0a2b8b51c531ce2bda18c36042f741e9
> cpath: /lucene/cms/trunk/content/solr
> copyroot: 1158013 /lucene/cms/trunk/content/solr
> 
> % dump-noderev.pl $PWD /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/ r1356317
> id: 5-1356317.o-1158013.r1356317/8040796
> type: dir
> count: 0
> text: 1356317 8039544 1239 1239 ca089f246db49f6379be42e03354d290
> cpath: /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA
> copyroot: 1158013 /lucene/cms/trunk/content/solr
> 
> % dump-noderev.pl $PWD /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html r1356317 
> id: 474-1356317.o-1158013.r1356317/7575027
> type: file
> count: 0
> text: 1356317 6384884 1715 6603 b07ff835f1126c3b4020fad39a4a68fc 9c8ea9a62a4a8c9c5e010395aa3430cfce957e58 1356285-tm5g/_475
> props: 1356317 7574980 34 0 25e6c2f7558b7484000d4d090dea5b92
> cpath: /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html
> copyroot: 1158013 /lucene/cms/trunk/content/solr
> 
> % xxd -s 6384884 -l 6 < db/revs/1356/1356317
> 0616cf4: 4445 4c54 410a                           DELTA.
> ]]]
> 
> The repository in question is fully world-readable --- there are no read
> restrictions in authz:
> https://svn.apache.org/repos/infra/infrastructure/trunk/subversion/authorization/asf-authorization-template
> (sorry, committers-only link)
> 
> The access and operational logs are at people.apache.org:/x1/logarchive/eris/vc/2012/07/02*.bz2 .  The commit
> started at 15:10:56 (rev-file birthtime), with the operational log
> 'commit' and the access log 'MERGE' entries starting at 16:11:33 and
> finishing around 16:24:43.5.  The first references to r1356317 in the
> logs are at 16:13:50.  (Perhaps the remaining 10 minutes were spent on
> updating rep-cache.db?  We did experience issues (consistent with db
> corruption) with rep-cache.db on that day.)
> 
> How can we have a 'modify-file' within an added-without-copyfrom tree?
> Isn't svnsync correct to complain in this case?

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Branko Čibej <br...@wandisco.com>.
On 23.07.2012 21:33, Trent Nelson wrote:
>> CPU 7 1 instruction cache TSC 991bb5281cc5 
>> ADDR 7c3cffe80 
>>   Instruction cache ECC error
>>        bit46 = corrected ecc error
>>        bit62 = error overflow (multiple errors)

Oops.

>> (P.S. Anyone in the market for a cheap Sun Fire v40z?  Very reliable!)

Try the Smithsonian? Nah, not rare enough.

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download


Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Trent Nelson <tr...@snakebite.org>.
On Mon, Jul 16, 2012 at 10:39:58AM -0700, Trent Nelson wrote:
>
> On Jul 16, 2012, at 12:38 PM, Daniel Shahaf wrote:
>
> > Trent Nelson wrote on Mon, Jul 16, 2012 at 08:58:09 -0700:
> >> Somewhat related: is this a FreeBSD box?
> >
> > Yes, it's eris from http://www.apache.org/dev/machines.
> >
> >> ports/sysutils/mcelog is useful for getting info on any ECC errors
> >> that might have occurred.
> >
> > Thanks for the pointer.  The port description says: "The primary purpose
> > is to provide a way to decode MCE output from the FreeBSD kernel into
> > something more human-readable" --- how to get the "raw" MCE output?  I
> > don't see "mce" mentioned in `sysctl -a` or /var/log/messages.
>
> Yeah it's definitely on the cryptic side.  I'm dubious as to whether or
> not the majority of features mentioned in the man page actually work.
>
> From experience, I simply `pkg_add -r mcelog`'d and then ran `mcelog`
> on a FreeBSD box of mine that looked like it had some wonky DIMMs.  I
> noticed a MCE line in the console log, so I ran `mcelog`, and wallah,
> heaps of info about the error (the exact DIMM/slot was handy).

    Just had a box play up again in the same manner.  For the archives,
    here's the type of message that'll show up on the system console:

MCA: Address 0x7d4ffbcc0
MCA: Bank 1, Status 0xd400400000000853
MCA: Global Cap 0x0000000000000105, Status 0x0000000000000000
MCA: Vendor "AuthenticAMD", ID 0x20f12, APIC ID 7
MCA: CPU 7 COR OVER BUSLG Source IRD Memory
MCA: Address 0x7c3cffe80
MCA: Bank 2, Status 0xd000400000000863
MCA: Global Cap 0x0000000000000105, Status 0x0000000000000000
MCA: Vendor "AuthenticAMD", ID 0x20f12, APIC ID 7
MCA: CPU 7 COR OVER BUSLG Source PREFETCH Memory

    Simply running `mcelog` on that box produces this (snipped the first
    123 examples):

STATUS d471c00000000833 MCGSTATUS 0
MCGCAP 105 APICID 7 SOCKETID 0 
CPUID Vendor AMD Family 15 Model 33
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 124
CPU 7 1 instruction cache TSC 991bb5281cc5 
ADDR 7c3cffe80 
  Instruction cache ECC error
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
             instruction fetch mem transaction
             memory access, level generic'
STATUS d400400000000853 MCGSTATUS 0
MCGCAP 105 APICID 7 SOCKETID 0 
CPUID Vendor AMD Family 15 Model 33
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 125
CPU 7 2 bus unit TSC 991bb528226c 
  L2 cache ECC error
  Bus or cache array error
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
             prefetch mem transaction
             memory access, level generic'

    If running `mcelog` doesn't produce anything, it's probably not an
    ECC issue.


        Trent.

(P.S. Anyone in the market for a cheap Sun Fire v40z?  Very reliable!)

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Trent Nelson <tr...@snakebite.org>.
On Jul 16, 2012, at 12:38 PM, Daniel Shahaf wrote:

> Trent Nelson wrote on Mon, Jul 16, 2012 at 08:58:09 -0700:
>> Somewhat related: is this a FreeBSD box?
> 
> Yes, it's eris from http://www.apache.org/dev/machines.
> 
>> ports/sysutils/mcelog is useful for getting info on any ECC errors
>> that might have occurred.
> 
> Thanks for the pointer.  The port description says: "The primary purpose
> is to provide a way to decode MCE output from the FreeBSD kernel into
> something more human-readable" --- how to get the "raw" MCE output?  I
> don't see "mce" mentioned in `sysctl -a` or /var/log/messages.

Yeah it's definitely on the cryptic side.  I'm dubious as to whether or
not the majority of features mentioned in the man page actually work.

>From experience, I simply `pkg_add -r mcelog`'d and then ran `mcelog`
on a FreeBSD box of mine that looked like it had some wonky DIMMs.  I
noticed a MCE line in the console log, so I ran `mcelog`, and wallah,
heaps of info about the error (the exact DIMM/slot was handy).

> 
>> Is the repo living on ZFS?
> 
> FreeBSD 8.2, zpool v15, zfs v4

Oh my, v15!  Insert standard recommendation of upgrading to v28 (it
landed in 8-stable around July last year) -- huge improvements in the
latter over v15.

That being said, one of my production boxes (that happens to mirror asf,
incidentally), is still on 8.2-stable w/ v15.  Not ideal, but meh, it's
production, and due for an upgrade in a few months.


> 
>> Don't suppose you've got a non-standard vfs.zfs.txg.timeout (greater
>> than 5 seconds?) set?  That could have exacerbated the situation.
>> 
> 
> We have the default setting, but the default is greater than 5 seconds:
> 
> eris,0:/boot% sysctl -a | grep txg 
> vfs.zfs.txg.write_limit_override: 0
> vfs.zfs.txg.synctime: 5
> vfs.zfs.txg.timeout: 30
> 

Ah, looks like the timeout defaults to 30 on v15.  It's been changed to
5 seconds on v28.

> Is there somewhere documentation of what the txg timeout _is_ (as
> opposed to what are the effects of changing the knob)?  Or is the only
> documentation in the source tree?

It tells ZFS how long it's allowed to postpone writes to stable storage.
(Does Subversion explicitly request synchronous writes?  Or do any sync/
fsync dances?)

The reason I thought of this is because of the timing that would have 
had to play out for the bit flip to be plausible.  It would have had to
have happened after the txn->rev dance, but before the data was written
to storage.

All of ZFS's bad-ass checksumming and data healing measures are useless
if the data becomes corrupted in RAM before it's written to disk; ECC is
the only defense for that.  ECC's good at catching single bit errors; 
not so good at catching anything more, so it's plausible for a bit flip
to go completely undetected, even with ECC -- but you should have at 
least *one* 'corrected' entry in the mcelog (or on the console -- don't 
suppose you log that to a file?).

Might be worth lowering vfs.zfs.txg.timeout down to, say, 5 seconds; at
least until an upgrade to v28 is viable.


	Trent.


Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Trent Nelson wrote on Mon, Jul 16, 2012 at 08:58:09 -0700:
> Somewhat related: is this a FreeBSD box?

Yes, it's eris from http://www.apache.org/dev/machines.

> ports/sysutils/mcelog is useful for getting info on any ECC errors
> that might have occurred.

Thanks for the pointer.  The port description says: "The primary purpose
is to provide a way to decode MCE output from the FreeBSD kernel into
something more human-readable" --- how to get the "raw" MCE output?  I
don't see "mce" mentioned in `sysctl -a` or /var/log/messages.

> Is the repo living on ZFS?

FreeBSD 8.2, zpool v15, zfs v4

> Don't suppose you've got a non-standard vfs.zfs.txg.timeout (greater
> than 5 seconds?) set?  That could have exacerbated the situation.
> 

We have the default setting, but the default is greater than 5 seconds:

eris,0:/boot% sysctl -a | grep txg 
vfs.zfs.txg.write_limit_override: 0
vfs.zfs.txg.synctime: 5
vfs.zfs.txg.timeout: 30

Is there somewhere documentation of what the txg timeout _is_ (as
opposed to what are the effects of changing the knob)?  Or is the only
documentation in the source tree?

Thanks!

Daniel


> 
> 	Trent.

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Trent Nelson <tr...@snakebite.org>.
On Jul 15, 2012, at 9:32 AM, Daniel Shahaf wrote:

> Daniel Shahaf wrote on Tue, Jul 10, 2012 at 09:46:55 +0100:
>> Pending rmuir@'s feedback, then, I'll go ahead and edit the revision
>> file in-place.  No change should be needed to any of the svn mirrors.
> 
> I have now made the following edit:    s/modify-file/ add-file  /
> 
> svnsync has successfully synced r1356317.  (As I write this, the
> post-commit fs processing is still running; 'current' on harmonia (the
> mirror) was updated at 13:24:29 UTC.)  I do not plan to edit
> rep-cache.db to remove references to the rep of the file in question.

My mirrors are syncing fine now, thanks!


> I am not opening an svn bug yet as there is no evidence that the bug was
> in svn (as opposed to, say, a single bit flip in svn_fs_path_change_kind_t).

I'm inclined to agree.  A bit flip is an entirely plausible explanation for something this odd.

Somewhat related: is this a FreeBSD box?  ports/sysutils/mcelog is useful for getting info on any ECC errors that might have occurred.  Is the repo living on ZFS?  Don't suppose you've got a non-standard vfs.zfs.txg.timeout (greater than 5 seconds?) set?  That could have exacerbated the situation.


	Trent.

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Tue, Jul 10, 2012 at 09:46:55 +0100:
> Pending rmuir@'s feedback, then, I'll go ahead and edit the revision
> file in-place.  No change should be needed to any of the svn mirrors.

I have now made the following edit:    s/modify-file/ add-file  /

svnsync has successfully synced r1356317.  (As I write this, the
post-commit fs processing is still running; 'current' on harmonia (the
mirror) was updated at 13:24:29 UTC.)  I do not plan to edit
rep-cache.db to remove references to the rep of the file in question.

I am not opening an svn bug yet as there is no evidence that the bug was
in svn (as opposed to, say, a single bit flip in svn_fs_path_change_kind_t).

Thanks all.

Daniel

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Trent Nelson wrote on Mon, Jul 09, 2012 at 23:17:28 -0700:
> 
> 
> On 7/8/12 11:48 AM, "Daniel Shahaf" <d....@daniel.shahaf.name> wrote:
> 
> >Daniel Shahaf wrote on Sat, Jul 07, 2012 at 15:09:38 +0100:
> >> How can we have a 'modify-file' within an added-without-copyfrom tree?
> 
> That's a pretty impressive invariant violation.  Enversion would have
> caught it, but only because we assert that a modify's previous path must
> match its current path -- not because we specifically look for this
> situation.
> 
> I've also never come across this in the wild.  We should definitely ping
> `rmuir` to find out how he committed that revision; platform, version, etc.
> 

Done.

> 
> >>Isn't svnsync correct to complain in this case?
> 
> Definitely.
> 
> >I'd like to resolve the situation on the live svn.a.o repository.  I see
> >a couple of ways to do so:
> >
> >- Hand-edit the revision file, changing "modify-file" to "add-file   "
> >
> >- Eliminate r1356317 from history: create an svnsync copy of /repos/asf
> >  using an authz file that excludes
> >/lucene/cms/trunk/content/solr/api-4_0_0_ALPHA
> >  (or maybe just 
> >/lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loa
> >der/package-summary.html)
> >  and have the Lucene PMC recreate that tag later
> >
> >Thoughts?
> 
> I'd lean towards a scripted version of option 1.  You could then
> disseminate the script to svnsync mirror admins (like myself), perhaps via
> an ASF/infrastructure blog post?
> 

There won't be a need for that: recent version of svnsync refuse to sync
r1356317 because of the invariant violation (and I imagine anyone who
mirrors the ASF repos uses a recent svnsync), so only svn-master.a.o has
the corruption.  (Even the EU mirror[1] doesn't have it.)

Pending rmuir@'s feedback, then, I'll go ahead and edit the revision
file in-place.  No change should be needed to any of the svn mirrors.

Thanks,

Daniel

[1] Note that no svn*.apache.org DNS name resolves to the EU mirror
right now --- not even svn.eu.a.o.

> 
> 	Trent.
> 

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Tue, Jul 10, 2012 at 21:54:39 +0200:
> On Tue, Jul 10, 2012 at 2:43 PM, Robert Muir <rc...@gmail.com> wrote:
> > On Tue, Jul 10, 2012 at 2:17 AM, Trent Nelson <tr...@snakebite.org> wrote:
> >>
> >> I've also never come across this in the wild.  We should definitely ping
> >> `rmuir` to find out how he committed that revision; platform, version, etc.
> >>
> >
> > Hi Trent:
> > For this commit, I did a 'cp -r ' of the folder from our release
> > candidate to my checkout, and then 'svn add', followed by 'svn
> > commit'.
> >
> > Here's my system info (let me know if you need more):
> >
> > uname -a:
> > Linux beast 3.2.0-24-generic #37-Ubuntu SMP Wed Apr 25 08:43:22 UTC
> > 2012 x86_64 x86_64 x86_64 GNU/Linux
> >
> > svn --version:
> > svn, version 1.7.2 (r1207936)
> >    compiled Dec  1 2011, 12:30:57
> 
> I'm wondering if there isn't a client side bug here, which really
> caused it to send a modify-file inside an add-without-copyfrom
> directory.
> 

First of all, even if there is such a bug then it's still a bug in the
server that the commit was allowed (and it's doubly odd that the same
commit is not allowed when done via svnsync rather than via rmuir's
svn).

> I'm specifically thinking in the direction of "file translation" stuff
> and svn:eol-style=native. These files have svn:eol-style=native. But
> that by itself can't be the problem, because a lot of files in this
> commit have eol-style native, and only one file has the "modify"
> problem. Still, I know that if a client fails to "eol-normalize" a
> file with eol-style=native, this can cause weird problems (files
> showing up as modified even though you haven't modified them [1] --
> see also [2]).
> 
> What happens if, for some reason, the client fails to eol-normalize a
> newly added file, within an added-without-copyfrom directory (i.e.

Why would that have happened for only one of the numerous
package-summary.html files in rmuir's commit?

> client sends CRLF line termination to the server instead of LF as it
> should)? Maybe that causes a modify-file?
> 
> Or maybe something like this (a mixup in file translation) happened
> during the server-side MERGE?
> 
> Now, it's unlikely that this is what happened here (the base file of
> the dreaded package-summary.html@1356317 is correctly LF-terminated --
> and the client platform is unix, so should have been LF-terminated
> from the start (except if the 'cp -r' was done from a network drive
> that was also used on a Windows system or something like that)). But
> still, eol-translation can cause weirdness ...
> 
> Oh, and there isn't anything in the pre-commit hook which could
> possibly modify transactions in mid-flight, is there?
> 

No.

https://svn.apache.org/repos/infra/infrastructure/trunk/subversion/hooks/pre-commit
(non-public link, sorry)

> [1] http://svn.haxx.se/users/archive-2011-10/0291.shtml
> [2] http://subversion.tigris.org/issues/show_bug.cgi?id=4065 (server
> should enforce LF normalization for svn:eol-style=native files)
> 
> -- 
> Johan

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Johan Corveleyn <jc...@gmail.com>.
On Tue, Jul 10, 2012 at 2:43 PM, Robert Muir <rc...@gmail.com> wrote:
> On Tue, Jul 10, 2012 at 2:17 AM, Trent Nelson <tr...@snakebite.org> wrote:
>>
>> I've also never come across this in the wild.  We should definitely ping
>> `rmuir` to find out how he committed that revision; platform, version, etc.
>>
>
> Hi Trent:
> For this commit, I did a 'cp -r ' of the folder from our release
> candidate to my checkout, and then 'svn add', followed by 'svn
> commit'.
>
> Here's my system info (let me know if you need more):
>
> uname -a:
> Linux beast 3.2.0-24-generic #37-Ubuntu SMP Wed Apr 25 08:43:22 UTC
> 2012 x86_64 x86_64 x86_64 GNU/Linux
>
> svn --version:
> svn, version 1.7.2 (r1207936)
>    compiled Dec  1 2011, 12:30:57

I'm wondering if there isn't a client side bug here, which really
caused it to send a modify-file inside an add-without-copyfrom
directory.

I'm specifically thinking in the direction of "file translation" stuff
and svn:eol-style=native. These files have svn:eol-style=native. But
that by itself can't be the problem, because a lot of files in this
commit have eol-style native, and only one file has the "modify"
problem. Still, I know that if a client fails to "eol-normalize" a
file with eol-style=native, this can cause weird problems (files
showing up as modified even though you haven't modified them [1] --
see also [2]).

What happens if, for some reason, the client fails to eol-normalize a
newly added file, within an added-without-copyfrom directory (i.e.
client sends CRLF line termination to the server instead of LF as it
should)? Maybe that causes a modify-file?

Or maybe something like this (a mixup in file translation) happened
during the server-side MERGE?

Now, it's unlikely that this is what happened here (the base file of
the dreaded package-summary.html@1356317 is correctly LF-terminated --
and the client platform is unix, so should have been LF-terminated
from the start (except if the 'cp -r' was done from a network drive
that was also used on a Windows system or something like that)). But
still, eol-translation can cause weirdness ...

Oh, and there isn't anything in the pre-commit hook which could
possibly modify transactions in mid-flight, is there?

[1] http://svn.haxx.se/users/archive-2011-10/0291.shtml
[2] http://subversion.tigris.org/issues/show_bug.cgi?id=4065 (server
should enforce LF normalization for svn:eol-style=native files)

-- 
Johan

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Robert Muir <rc...@gmail.com>.
On Tue, Jul 10, 2012 at 2:17 AM, Trent Nelson <tr...@snakebite.org> wrote:
>
> I've also never come across this in the wild.  We should definitely ping
> `rmuir` to find out how he committed that revision; platform, version, etc.
>

Hi Trent:
For this commit, I did a 'cp -r ' of the folder from our release
candidate to my checkout, and then 'svn add', followed by 'svn
commit'.

Here's my system info (let me know if you need more):

uname -a:
Linux beast 3.2.0-24-generic #37-Ubuntu SMP Wed Apr 25 08:43:22 UTC
2012 x86_64 x86_64 x86_64 GNU/Linux

svn --version:
svn, version 1.7.2 (r1207936)
   compiled Dec  1 2011, 12:30:57

Copyright (C) 2011 The Apache Software Foundation.
This software consists of contributions made by many people; see the NOTICE
file for more information.
Subversion is open source software, see http://subversion.apache.org/

The following repository access (RA) modules are available:

* ra_neon : Module for accessing a repository via WebDAV protocol using Neon.
  - handles 'http' scheme
  - handles 'https' scheme
* ra_svn : Module for accessing a repository using the svn network protocol.
  - with Cyrus SASL authentication
  - handles 'svn' scheme
* ra_local : Module for accessing a repository on local disk.
  - handles 'file' scheme



-- 
lucidimagination.com

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Philip Martin <ph...@wandisco.com>.
Trent Nelson <tr...@snakebite.org> writes:

> On 7/8/12 11:48 AM, "Daniel Shahaf" <d....@daniel.shahaf.name> wrote:
>
>>Daniel Shahaf wrote on Sat, Jul 07, 2012 at 15:09:38 +0100:
>>> How can we have a 'modify-file' within an added-without-copyfrom tree?
>
> That's a pretty impressive invariant violation.  Enversion would have
> caught it, but only because we assert that a modify's previous path must
> match its current path -- not because we specifically look for this
> situation.

"svnadmin verify" detects the problem because the path does not exist in
the previous revision:

svnadmin: E160013: File not found: revision 1356316, path
'/lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html

That is really just a side-effect of another check; it's not a reliable
way to detect this sort of corruption since it could be defeated if the
modify-file occurred within a replace-without-copyfrom, rather than an
add-without-copyfrom, and the path existed in the replaced tree.

> I've also never come across this in the wild.  We should definitely ping
> `rmuir` to find out how he committed that revision; platform, version, etc.

The operation log on people.a.o has:

[02/Jul/2012:16:11:33 +0000] xx.xx.xx.xx rmuir commit r1356317 790 -

The access log has:

svn.apache.org xx.xx.xx.xx - rmuir [02/Jul/2012:16:11:33 +0000] "MERGE /repos/
asf/lucene/cms/trunk/content/solr HTTP/1.1" 200 49122 "-" "SVN/1.7.2 neon/0.29.6
" - 443

It's a v2 protocol commit.

These lines are timestamped 16:11:33 but the surrounding lines are
timestamped 16:24:43 so there was a 13min server-side post-commit
period.  Perhaps the SQLite rep-cache update?

>>>Isn't svnsync correct to complain in this case?
>
> Definitely.
>
>>I'd like to resolve the situation on the live svn.a.o repository.  I see
>>a couple of ways to do so:
>>
>>- Hand-edit the revision file, changing "modify-file" to "add-file   "
>>
>>- Eliminate r1356317 from history: create an svnsync copy of /repos/asf
>>  using an authz file that excludes
>>/lucene/cms/trunk/content/solr/api-4_0_0_ALPHA
>>  (or maybe just 
>>/lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loa
>>der/package-summary.html)
>>  and have the Lucene PMC recreate that tag later
>>
>>Thoughts?
>
> I'd lean towards a scripted version of option 1.  You could then
> disseminate the script to svnsync mirror admins (like myself), perhaps via
> an ASF/infrastructure blog post?

I think it should be possible to edit the revision file.  Changing
"modify-file" to "add-file" would reduce the length of the file but I
think that is OK as all the offsets within the file are to locations
before the change.  Or we could use "add-file   " with blank padding as
svn_cstring_tokenize will skip the extra blanks.

-- 
Cerified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download

Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Trent Nelson <tr...@snakebite.org>.

On 7/8/12 11:48 AM, "Daniel Shahaf" <d....@daniel.shahaf.name> wrote:

>Daniel Shahaf wrote on Sat, Jul 07, 2012 at 15:09:38 +0100:
>> How can we have a 'modify-file' within an added-without-copyfrom tree?

That's a pretty impressive invariant violation.  Enversion would have
caught it, but only because we assert that a modify's previous path must
match its current path -- not because we specifically look for this
situation.

I've also never come across this in the wild.  We should definitely ping
`rmuir` to find out how he committed that revision; platform, version, etc.


>>Isn't svnsync correct to complain in this case?

Definitely.

>I'd like to resolve the situation on the live svn.a.o repository.  I see
>a couple of ways to do so:
>
>- Hand-edit the revision file, changing "modify-file" to "add-file   "
>
>- Eliminate r1356317 from history: create an svnsync copy of /repos/asf
>  using an authz file that excludes
>/lucene/cms/trunk/content/solr/api-4_0_0_ALPHA
>  (or maybe just 
>/lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loa
>der/package-summary.html)
>  and have the Lucene PMC recreate that tag later
>
>Thoughts?

I'd lean towards a scripted version of option 1.  You could then
disseminate the script to svnsync mirror admins (like myself), perhaps via
an ASF/infrastructure blog post?


	Trent.


Re: modify-file within an added tree (r1356317) ; possibly rep-cache.db -related

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Daniel Shahaf wrote on Sat, Jul 07, 2012 at 15:09:38 +0100:
> Problem: svnsync of /repos/asf r1356317 fails with:
> svnsync: E160013: File not found: revision 1356316, path '/lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html'
> 
> Using 1.7.0, but tnelson reports the same problem (and I suspect he uses
> a more recent version).
> 
> Further investigation points to an added-without-copyfrom tree containing a modified file:
> 
> [[[
> % svn log -qvr r1356317 https://svn.apache.org/repos/asf/
> ------------------------------------------------------------------------
> r1356317 | rmuir | 2012-07-02 16:13:49 +0000 (Mon, 02 Jul 2012) | 1 line
> Changed paths:
>    A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA
>    A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org
>    A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache
>    A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr
>    A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler
>    A /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader
>    M /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html
>    .. (some 2000 more additions, all without copyfrom;  no changes above api-4_0_0_ALPHA)
> ------------------------------------------------------------------------
...
> How can we have a 'modify-file' within an added-without-copyfrom tree?
> Isn't svnsync correct to complain in this case?

I'd like to resolve the situation on the live svn.a.o repository.  I see
a couple of ways to do so:

- Hand-edit the revision file, changing "modify-file" to "add-file   "

- Eliminate r1356317 from history: create an svnsync copy of /repos/asf
  using an authz file that excludes /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA
  (or maybe just /lucene/cms/trunk/content/solr/api-4_0_0_ALPHA/org/apache/solr/handler/loader/package-summary.html)
  and have the Lucene PMC recreate that tag later

Thoughts?