You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@subversion.apache.org by mike south <ms...@mcclatchyinteractive.com> on 2005/03/15 07:10:39 UTC

bug in svn diff and related?

Hi,

I posted a couple days ago regarding an issue where I was getting 
unexpected output from svn diff.

Now I have a script to reproduce it, and I think that the behavior is 
either a bug or a serious omission from the documentation.

It looks to my untrained (and uninformed!) eye as though svn diff may be 
checking the mtime of a file against the 
.svn/text-base/[filename].svn-base version (the "pristine copy") of the 
file, and skipping the diff if the mtimes are the same.

I have code that is adding an empty file to the working copy, 
committing, then filling the file with contents, then committing that.

What appears to me to be happening is that some times the commit of the 
empty file (and the writing of its .svn/text-base copy) are happening 
within the same second that the modify of the file happens.  Then when 
you try to commit the modified file subversion doesn't see the 
modifications, and ignores your commit attempt.

If I go into a working copy and save the included script as 'test.pl', 
then run

     perl test.pl 1
     perl test.pl 2
     perl test.pl 3
     perl test.pl 4

I find that sometimes the commit of the changed file is ignored, and
sometimes it isn't.  I think that this is because some times I get
lucky and the commit of the change happens in a different second than
the writing of the .svn/text-base copy (speculative, but seems backed up 
by experimentation that I'm not demonstrating below).

If I manually set the mtime with perl (three commented-out lines in the
script below), I always see the change being ignored.

If I touch a file that has been ignored and commit, I can commit after 
that and it will send the file.

Here is the script:

     #!perl -w
     use strict;
     my $file=shift;
     die "usage:$0 filename" unless $file;
     die "$file exists, try another" if -e $file;
     `touch $file`;
     warn `svn add $file`;
     warn `svn commit -m 'adding empty' $file`;
     `echo "here there be contents" >>$file`;
     # put these lines in to see the problem every time
     #my $base_file="./.svn/text-base/$file.svn-base";
     #my $mtime = (stat $base_file)[9];
     #utime $mtime, $mtime, $file;
     warn ( `svn commit -m 'trying to update' $file` || "commit did not 
do anything for $file");


Here is some sample output with me running it as
described above:

     [msouth@rpub200d svn_diff_issue]$ perl test.pl 1
     A         1
     Adding         1
     Transmitting file data .
     Committed revision 33.
     commit did not do anything for 1 at test.pl line 13.
     [msouth@rpub200d svn_diff_issue]$ perl test.pl 2
     A         2
     Adding         2
     Transmitting file data .
     Committed revision 34.
     Sending        2
     Transmitting file data .
     Committed revision 35.
     [msouth@rpub200d svn_diff_issue]$ perl test.pl 3
     A         3
     Adding         3
     Transmitting file data .
     Committed revision 36.
     Sending        3
     Transmitting file data .
     Committed revision 37.
     [msouth@rpub200d svn_diff_issue]$ perl test.pl 4
     A         4
     Adding         4
     Transmitting file data .
     Committed revision 38.
     commit did not do anything for 4 at test.pl line 13.

In the above tests, 1 and 4 exhibited the problem.  (For 2 and 3 you can 
see that two commits succeeded, where on 1 and 4 you get "commit did not 
do anything" warnings from the script.)  Here is more activity with file 
1 to demonstrate what I was saying about it.  It is different from its 
pristine copy but svn doesn't seem to see that:

     [msouth@rpub200d svn_diff_issue]$ cat 1
     here there be contents
     [msouth@rpub200d]$ svn cat 1
     [msouth@rpub200d]$ svn diff 1
     [msouth@rpub200d]$ diff .svn/text-base/1.svn-base 1
     0a1
     > here there be contents
     [msouth@rpub200d]$ svn commit 1
     [msouth@rpub200d]$ touch 1
     [msouth@rpub200d]$ svn commit -m'this will send the file' 1
     Sending        1
     Transmitting file data .
     Committed revision 40.

please let me know if there is any other information I can provide.

mike

-- 
-----------------------------------------
       Name : Mike South
      Title : Software Developer
    Company : McClatchy Interactive
Work Phone : (919) 861-1259
   Work Fax : (919) 861-1300
Work Email : msouth@mcclatchyinteractive.com
        AIM : msouthmi
   Web Site : http://www.mcclatchyinteractive.com/
-----------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: recognizing unchanged content quickly (was: bug in svn diff and related?)

Posted by Travis P <sv...@castle.fastmail.fm>.

On Mar 16, 2005, at 7:27 AM, Mark Phippard wrote:

> Did you try this with trunk?

Nope, 1.1.3.  I'll have to give trunk a try.

-Travis


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: recognizing unchanged content quickly (was: bug in svn diff and related?)

Posted by Mark Phippard <Ma...@softlanding.com>.

Travis <sv...@castle.fastmail.fm> wrote on 03/16/2005 12:09:45 AM:

> I've got a related question:  lets say that I do change timestamps 
> (this is to fool make into allowing most of us to share hundreds-of-meg 
> of slow-to-build products via symlink to a common place when only 
> working on a subset of the project), can I do any operation that will 
> get Subversion to update the stamps in entries file so it doesn't do 
> the slow content check all the time?
> 
> I just did an experiment with update and clean and I'm not seeing any 
> difference in the text-time in the entries file.  I remember this 
> discussed on dev@ but don't recall any action being taken and my 
> experiment appears to confirm that.  Is that your recollection as well?

I recall a similar discussion on dev@.  I seem to remember that the 
conclusion was to change svn clean to automatically fix the timestamps in 
the .svn/entries file.  Either that or it was just anytime Subversion 
resorted to a byte by byte compare and the results came back equal it 
would automatically fixup the file so that it didn't happen again.

Did you try this with trunk?

Mark


_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

recognizing unchanged content quickly (was: bug in svn diff and related?)

Posted by Travis <sv...@castle.fastmail.fm>.

On Mar 15, 2005, at 10:26 PM, Ben Collins-Sussman wrote:

> The speedup is noticeable, and affects everyone all the time.  Try 
> switching the algorithm and doing some timing tests to see for 
> yourself.

I admit surprise.  I'll give that a try if I can, but will take your 
word for it because it sounds like you or someone you know has tried 
it.

> The situation you're worried about -- a process that tweaks a 
> working-file's timestamp into the past -- is incredibly rare.  We 
> simply never hear about it.

I've got a related question:  lets say that I do change timestamps 
(this is to fool make into allowing most of us to share hundreds-of-meg 
of slow-to-build products via symlink to a common place when only 
working on a subset of the project), can I do any operation that will 
get Subversion to update the stamps in entries file so it doesn't do 
the slow content check all the time?

I just did an experiment with update and clean and I'm not seeing any 
difference in the text-time in the entries file.  I remember this 
discussed on dev@ but don't recall any action being taken and my 
experiment appears to confirm that.  Is that your recollection as well?

-Travis

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by David Ripton <dr...@ripton.net>.

On 2005.03.16 20:44:15 +0000, John Szakmeister wrote:
> On Wednesday 16 March 2005 11:58, Travis P wrote:
> > David,
> >
> > Are you aware of any filesystems that maintain timestamps on files with
> > any finer granularity than 1 second?  I believe such would be the
> > exception.
> 
> Theoretically, NTFS can support 100ns second resolution.  I don't believe 
> it does any better than a second in practice though.

XFS and JFS support nanosecond timestamps.  So do filesystems like proc
and tmpfs under Linux 2.6, though they don't really count.  I agree
that these are exceptions, and that most people still use filesystems
with 1s timestamp resolution.

This appears to not really matter much, because svn works around the
potential problem by sleeping (until the next second, plus a tenth) to
force timestamps to change.

svn_sleep_for_timestamps is not quite ideal, because it assumes file
timestamp resolution is always 1s.  FAT has 2s timestamp resolution, so
maybe this isn't quite conservative enough.  (Though I haven't tried
reproducing the original poster's problem on FAT, which I'd want to do
before actually claiming this needs fixing.)

The original poster's problem was related to NFS, and went away when he
put the working copy on a local filesystem.  I suspect clock skew versus
the NFS server, though that's not confirmed.

IMO, the NFS entry in the svn FAQ is too optimistic.  I will attempt to
add some caveats to it and send a doc patch.

-- 
David Ripton    dripton@ripton.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by John Szakmeister <jo...@szakmeister.net>.

On Wednesday 16 March 2005 11:58, Travis P wrote:
> On Mar 16, 2005, at 6:54 AM, David Ripton wrote:
> > I don't think the design intent of svn is to only support one-second
> > precision, since it uses apr_time_t with microsecond precision.
> >
> > I suspect that the results you're seeing are due to a svn or apr bug,
> > or
> > a filesystem with low-resolution timestamps.  What filesystem and OS
> > are
> > you using?
>
> David,
>
> Are you aware of any filesystems that maintain timestamps on files with
> any finer granularity than 1 second?  I believe such would be the
> exception.

Theoretically, NTFS can support 100ns second resolution.  I don't believe 
it does any better than a second in practice though.

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by Travis P <sv...@castle.fastmail.fm>.

On Mar 16, 2005, at 6:54 AM, David Ripton wrote:

> I don't think the design intent of svn is to only support one-second
> precision, since it uses apr_time_t with microsecond precision.
>
> I suspect that the results you're seeing are due to a svn or apr bug, 
> or
> a filesystem with low-resolution timestamps.  What filesystem and OS 
> are
> you using?

David,

Are you aware of any filesystems that maintain timestamps on files with 
any finer granularity than 1 second?  I believe such would be the 
exception.

I see some evidence (in Linux man pages and header files) that suggests 
you'll see no better than to-the-second resolution on POSIX systems.  
Just because apr_time_t might have room for more precision does not 
mean it is used.

As I recall from earlier list discussion, some MS-DOS filesystems, 
still in use on, for example, USB flash drives, only support 2-second 
granularity, which Subversion attempts to handle with carefully 
inserted delays.

-Travis

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by Daniel Berlin <db...@dberlin.org>.

On Wed, 2005-03-16 at 12:46 -0500, mike south wrote:
> David Ripton wrote:
> > On 2005.03.16 01:46:12 +0000, mike south wrote:
> >  > You don't have to fiddle with mtimes to cause buggy behavior--all you
> >  > have to do is have a machine making the modifications, and it's going to
> >  > get the commit and then the subsequent modification done inside of the
> >  > same second a high percentage of the time.  The test code I posted gave
> >  > me 2 errors out of four tries.
> > 
> > This assumes that file timestamps only have one-second granularity.
> > 
> > I don't think the design intent of svn is to only support one-second
> > precision, since it uses apr_time_t with microsecond precision.
> > 
> > I suspect that the results you're seeing are due to a svn or apr bug, or
> > a filesystem with low-resolution timestamps.  What filesystem and OS are
> > you using?
> 
> os is redhat enterprise 3.  These are (I think) IBM blades.
> 
> Once people started posting about it I did some more testing and learned 
> that on the local disk (ext3 fs), I can't get it to happen.
> 
> We have shared space on a netapp which is nfs mounted.  That is where I 
> am seeing the problem.  The script I posted will fairly frequently 
> create a file that subersion doesn't recognize as being locally modified.
> 

NFS is known to have "issues" with timestamps.

In addition, you have to be sure your clocks are synchronized with that
of the nfs server, or else you run into fun issues where people can
create files that appear to be in the future on your machine.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by mike south <ms...@mcclatchyinteractive.com>.

David Ripton wrote:
> On 2005.03.16 01:46:12 +0000, mike south wrote:
>  > You don't have to fiddle with mtimes to cause buggy behavior--all you
>  > have to do is have a machine making the modifications, and it's going to
>  > get the commit and then the subsequent modification done inside of the
>  > same second a high percentage of the time.  The test code I posted gave
>  > me 2 errors out of four tries.
> 
> This assumes that file timestamps only have one-second granularity.
> 
> I don't think the design intent of svn is to only support one-second
> precision, since it uses apr_time_t with microsecond precision.
> 
> I suspect that the results you're seeing are due to a svn or apr bug, or
> a filesystem with low-resolution timestamps.  What filesystem and OS are
> you using?

os is redhat enterprise 3.  These are (I think) IBM blades.

Once people started posting about it I did some more testing and learned 
that on the local disk (ext3 fs), I can't get it to happen.

We have shared space on a netapp which is nfs mounted.  That is where I 
am seeing the problem.  The script I posted will fairly frequently 
create a file that subersion doesn't recognize as being locally modified.

When I run stat on any file anywhere on the system, all the timestamps 
show nine zeroes after the decimal point.  So either I'm one lucky 
bastard or those are truncated :).

When I look in .svn/entries, I see the committed-date with six decimal 
places of precision (I assume it's precision, it's at least non-zero), 
but text-time and prop-time are all six-zeroes after the decimal point.

(Kernel is 2.4.21-20.ELsmp, stat is from coreutils 4.5.3 if that's any 
help.  The depth of my ignorance is such that I have no idea what matters.)

Thanks again everyone for the discussion and suggestions.

mike

> 
> -- 
> David Ripton    dripton@ripton.net
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
> 

-- 
-----------------------------------------
       Name : Mike South
      Title : Software Developer
    Company : McClatchy Interactive
Work Phone : (919) 861-1259
   Work Fax : (919) 861-1300
Work Email : msouth@mcclatchyinteractive.com
        AIM : msouthmi
   Web Site : http://www.mcclatchyinteractive.com/
-----------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by David Ripton <dr...@ripton.net>.

On 2005.03.16 01:46:12 +0000, mike south wrote:
> You don't have to fiddle with mtimes to cause buggy behavior--all you 
> have to do is have a machine making the modifications, and it's going to 
> get the commit and then the subsequent modification done inside of the 
> same second a high percentage of the time.  The test code I posted gave 
> me 2 errors out of four tries.

This assumes that file timestamps only have one-second granularity.

I don't think the design intent of svn is to only support one-second
precision, since it uses apr_time_t with microsecond precision.

I suspect that the results you're seeing are due to a svn or apr bug, or
a filesystem with low-resolution timestamps.  What filesystem and OS are
you using?

-- 
David Ripton    dripton@ripton.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by mike south <ms...@mcclatchyinteractive.com>.

Ben Collins-Sussman wrote:
> 
> On Mar 15, 2005, at 10:15 PM, Travis wrote:
>  >
>  > True:  I am considering the cost of the extra integer comparison and
>  > branch to be negligible.  If you are decreasing the accuracy of the
>  > heuristic because of that cost, it seems like an unworthwhile
>  > micro-optimization to me.
> 
> The speedup is noticeable, and affects everyone all the time.  Try
> switching the algorithm and doing some timing tests to see for
> yourself.
> 
> The situation you're worried about -- a process that tweaks a
> working-file's timestamp into the past -- is incredibly rare.  We
> simply never hear about it.
> 
> So in theory, yes, we're not as accurate as we could be.  In practice,
> this just doesn't matter.  The performance tradeoff is well worth it. 
> This is based on 5 years of Subversion usage and (?) 15 years of
> widespread CVS usage.  :-)

In the five years of subversion usage and the 15 years of CVS usage, how 
many of the commits were being done by machine by dopey users like me 
who figure that the subversion perl API "just works"?  We're in a whole 
different era now that there is a perl api, because now machine checkins 
and modifications are feasible for the much-less-intelligent.  That has 
quite possibly never been true in the past.

You don't have to fiddle with mtimes to cause buggy behavior--all you 
have to do is have a machine making the modifications, and it's going to 
get the commit and then the subsequent modification done inside of the 
same second a high percentage of the time.  The test code I posted gave 
me 2 errors out of four tries.

In my case, we are having a user put data into the system, and (they 
think) that first data they put in is committed to the repository, 
always recoverable.  Except that it doesn't get committed.  Then say 
they make a change, save it, and it breaks something.  They then have 
the option of reverting to a previous revision.  So they do that, but 
the first revision in the list is the empty thing that was initially 
committed, and the next one is their broken one.  Data loss.

Now, if there were a force flag* we could pass to commit, we could avoid 
data loss by passing that when we know we want to bypass the mtime 
dance.  That would allow people to have some control over that algorithm 
from outside of the source code.

I'm about to go implement a fix for our situation which--you guessed 
it--modifies the mtime of the file by one second to ensure that our 
users don't lose data.  So those rare programs that fiddle with 
timestamps are now a bit less rare.

I understand that a large majority of svn usage is going to be by humans 
editing source files by hand and very rarely getting anything done in 
sub-second intervals.  The problem is, it's a great tool, and one of the 
hallmarks of a great tool is all the things people start to do with it 
that the original designers never intended.

mike

*I searched the dev list and I see that there is a lot of opposition to 
a --force flag on commit.  The arguments there were all about the fact 
that it would sacrifice accuracy.  However the above argument says that 
the algorithm should stay the way it is in the interest of speed, at the 
expense of accuracy.  One could argue that if you want the speed that 
dependence on mtime gives you, then you should give us --force so that 
we can manually take care of the accuracy that has been sacrificed in 
the name of speed.

One could also argue that I just need to run a (non-svn) diff against 
the .svn base myself because single file operations are the exception. 
But my point is that it would be better for the tool to let me do 
exceptional things with it.

A happy medium solution might be to have a flag --slow-check that goes 
straight to char-by-char comparison or whatever.

You could also make it compile-time configurable whether commit 
supported --force so people that aren't using it in exceptional ways 
don't have to worry about their silly users using --force when what they 
really need to do is fix their clocks.  In our case we use subversion in 
both the traditional, human-edited files mode and machine mode, but we 
do it in different places.  We could (fairly) easily use different binaries.

And I could have hacked the source myself and set it up how I wanted it 
in half the time it took me to write this, I know :).

Anyway, now that I've ranted at length, let me again thank all you folks 
for the wonderful work on subversion, we really love it.  The control 
over directories and symlinks alone saves us a tremendous amount of 
hassle. Keep up the good work!

-- 
-----------------------------------------
       Name : Mike South
      Title : Software Developer
    Company : McClatchy Interactive
Work Phone : (919) 861-1259
   Work Fax : (919) 861-1300
Work Email : msouth@mcclatchyinteractive.com
        AIM : msouthmi
   Web Site : http://www.mcclatchyinteractive.com/
-----------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by Ben Collins-Sussman <su...@collab.net>.

On Mar 15, 2005, at 10:15 PM, Travis wrote:
>
> True:  I am considering the cost of the extra integer comparison and 
> branch to be negligible.  If you are decreasing the accuracy of the 
> heuristic because of that cost, it seems like an unworthwhile 
> micro-optimization to me.

The speedup is noticeable, and affects everyone all the time.  Try 
switching the algorithm and doing some timing tests to see for 
yourself.

The situation you're worried about -- a process that tweaks a 
working-file's timestamp into the past -- is incredibly rare.  We 
simply never hear about it.

So in theory, yes, we're not as accurate as we could be.  In practice, 
this just doesn't matter.  The performance tradeoff is well worth it.  
This is based on 5 years of Subversion usage and (?) 15 years of 
widespread CVS usage.  :-)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by Travis <sv...@castle.fastmail.fm>.

On Mar 16, 2005, at 2:12 AM, Peter N. Lundblad wrote:

> On Tue, 15 Mar 2005, Travis wrote:
>
>> On Mar 15, 2005, at 3:23 PM, Travis P wrote:
>>
>>> On Mar 15, 2005, at 8:31 AM, Ben Collins-Sussman wrote:
>>>
>>>> The algorithm is extremely similar to what CVS does:
>>>>
>
> First: stat working file and compare timestamp to the timestamp stored 
> in
> the entries file.
>
>>>>    stat working and textbase files.
> ...
>> True:  I am considering the cost of the extra integer comparison and
>> branch to be negligible.  If you are decreasing the accuracy of the
>> heuristic because of that cost, it seems like an unworthwhile
>> micro-optimization to me.  Do you really think the cost of the
>> comparison and branch are not negligible on the order of tens or maybe
>> hundreds of thousands of files in a wc?
>>
> Of course, it is negible. But the problem is the first step in the
> algorithm. It doesn't need to stat the base file in a common case, 
> since
> it has the timestamp in .svn/entries.  If this was a real (and
> reasonably common) problem, we
> could of course store the base filesize in the entries file as well.
>
> Just to clarify

Thanks for the clarification.  That makes much more sense to me.

-Travis


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.

On Tue, 15 Mar 2005, Travis wrote:

>
> On Mar 15, 2005, at 9:46 PM, Ben Collins-Sussman wrote:
>
> > On Mar 15, 2005, at 3:23 PM, Travis P wrote:
> >
> >> On Mar 15, 2005, at 8:31 AM, Ben Collins-Sussman wrote:
> >>
> >>> The algorithm is extremely similar to what CVS does:
> >>>

First: stat working file and compare timestamp to the timestamp stored in
the entries file.

> >>>    stat working and textbase files.
> >>>    if (mtimes of working and textbase are equal):
> >>>         return NOT_CHANGED;
> >>>    else if (filesizes of working and textbase are unequal):
> >>>         return CHANGED;
> >>>    else
> >>>         compare the files byte-by-byte.  /* very slow */
> ...
> >>    if (filesizes of working and textbase are unequal):
> >>         return CHANGED;
> >>    else if (mtimes of working and textbase are equal):
> >>         return NOT_CHANGED;
> >>    else
> >>         compare the files byte-by-byte.  /* very slow */
> >>
> >
...
> > And also, I'd argue this is slower.  99% of the time, almost every
> > file in the tree will be unchanged, and will have identical
> > working/textbase timestamps.  Using the current algorithm, it means
> > that 99% of the time we get a definitive "answer" to the question on
> > the first comparison.  In your algorithm, we'd end up doing two
> > comparsions nearly all the time, instead of one.
>
...
> True:  I am considering the cost of the extra integer comparison and
> branch to be negligible.  If you are decreasing the accuracy of the
> heuristic because of that cost, it seems like an unworthwhile
> micro-optimization to me.  Do you really think the cost of the
> comparison and branch are not negligible on the order of tens or maybe
> hundreds of thousands of files in a wc?
>
Of course,it is negible. But the problem is the first step in the
algorithm. It doesn't need to stat the base file in a common case, since
it has the timestamp in .svn/entries.  If this was a real (and
reasonably common) problem, we
could ofcourse store the base filesize in the entries file as well.

Just to clarify,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by Travis <sv...@castle.fastmail.fm>.

On Mar 15, 2005, at 9:46 PM, Ben Collins-Sussman wrote:

> On Mar 15, 2005, at 3:23 PM, Travis P wrote:
>
>> On Mar 15, 2005, at 8:31 AM, Ben Collins-Sussman wrote:
>>
>>> The algorithm is extremely similar to what CVS does:
>>>
>>>    stat working and textbase files.
>>>    if (mtimes of working and textbase are equal):
>>>         return NOT_CHANGED;
>>>    else if (filesizes of working and textbase are unequal):
>>>         return CHANGED;
>>>    else
>>>         compare the files byte-by-byte.  /* very slow */
...
>>    if (filesizes of working and textbase are unequal):
>>         return CHANGED;
>>    else if (mtimes of working and textbase are equal):
>>         return NOT_CHANGED;
>>    else
>>         compare the files byte-by-byte.  /* very slow */
>>
>
> Why is this less likely to return a false answer?
>
> And also, I'd argue this is slower.  99% of the time, almost every 
> file in the tree will be unchanged, and will have identical 
> working/textbase timestamps.  Using the current algorithm, it means 
> that 99% of the time we get a definitive "answer" to the question on 
> the first comparison.  In your algorithm, we'd end up doing two 
> comparsions nearly all the time, instead of one.

Checking the meta-data as a proxy for the content is a heuristic.  
People's processes mess with timestamps for various reasons.  With the 
comparisons reversed as I suggested, users have to preserve two pieces 
of this meta-information to prevent the algorithm from returning 
NOT_CHANGED when the file has actually changed.  This is less likely to 
happen as an unintended consequence.

True:  I am considering the cost of the extra integer comparison and 
branch to be negligible.  If you are decreasing the accuracy of the 
heuristic because of that cost, it seems like an unworthwhile 
micro-optimization to me.  Do you really think the cost of the 
comparison and branch are not negligible on the order of tens or maybe 
hundreds of thousands of files in a wc?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by Josef Wolf <jw...@raven.inka.de>.

On Tue, Mar 15, 2005 at 09:46:49PM -0600, Ben Collins-Sussman wrote:

[ ... ]
> >>The algorithm is extremely similar to what CVS does:
> >>
> >>   stat working and textbase files.
> >>   if (mtimes of working and textbase are equal):
> >>        return NOT_CHANGED;
> >>   else if (filesizes of working and textbase are unequal):
> >>        return CHANGED;
> >>   else
> >>        compare the files byte-by-byte.  /* very slow */
> >>
> >Ben,
> >
> >I'm curious why you don't use this alternative heuristic as less 
> >likely to return a false answer and no slower (I'm making the common 
> >assumption that getting mtime and size are both stat operations that 
> >are fetched in one operation):
> >
> >   if (filesizes of working and textbase are unequal):
> >        return CHANGED;
> >   else if (mtimes of working and textbase are equal):
> >        return NOT_CHANGED;
> >   else
> >        compare the files byte-by-byte.  /* very slow */
> 
> Why is this less likely to return a false answer?

Hmmm, my irony-detector must be broken somehow.  Assume different file
sizes but equal mtimes:  First algorithm will return NOT_CHANGED while
the second returns CHANGED.  Obviously, the first algorithm gives the
wrong answer.  Given that this is the only case where the algorithms
disagree, the second algorithm _is_ less likely to return a false answer.

> And also, I'd argue this is slower.  99% of the time, almost every file 
> in the tree will be unchanged, and will have identical working/textbase 
> timestamps.  Using the current algorithm, it means that 99% of the time 
> we get a definitive "answer" to the question on the first comparison.  

And this "answer" is less likely to be correct with the first algorithm
than with the second one.

> In your algorithm, we'd end up doing two comparsions nearly all the 
> time, instead of one.

Clearly, here is a speed/correctness tradeoff.  It would be very
interesting how big the difference is in reality.

-- 
No software patents!
-- Josef Wolf -- jw@raven.inka.de --

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by Ben Collins-Sussman <su...@collab.net>.

On Mar 15, 2005, at 3:23 PM, Travis P wrote:

>
> On Mar 15, 2005, at 8:31 AM, Ben Collins-Sussman wrote:
>
>> Yes, there is a common function used throughout the svn codebase -- 
>> its only purpose is to return a "yes" or "no" answer to the question, 
>> "has this file been changed by the user?"  I think it's called 
>> svn_wc_text_modified_p().
>>
>> The algorithm is extremely similar to what CVS does:
>>
>>    stat working and textbase files.
>>    if (mtimes of working and textbase are equal):
>>         return NOT_CHANGED;
>>    else if (filesizes of working and textbase are unequal):
>>         return CHANGED;
>>    else
>>         compare the files byte-by-byte.  /* very slow */
>>
>> This function is used by 'svn status', 'svn commit', 'svn update', 
>> and many other commands.  Without the mtime check, these command 
>> commands would be EXTREMELY slow.
>
> Ben,
>
> I'm curious why you don't use this alternative heuristic as less 
> likely to return a false answer and no slower (I'm making the common 
> assumption that getting mtime and size are both stat operations that 
> are fetched in one operation):
>
>    if (filesizes of working and textbase are unequal):
>         return CHANGED;
>    else if (mtimes of working and textbase are equal):
>         return NOT_CHANGED;
>    else
>         compare the files byte-by-byte.  /* very slow */
>

Why is this less likely to return a false answer?

And also, I'd argue this is slower.  99% of the time, almost every file 
in the tree will be unchanged, and will have identical working/textbase 
timestamps.  Using the current algorithm, it means that 99% of the time 
we get a definitive "answer" to the question on the first comparison.  
In your algorithm, we'd end up doing two comparsions nearly all the 
time, instead of one.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by Travis P <sv...@castle.fastmail.fm>.

On Mar 15, 2005, at 8:31 AM, Ben Collins-Sussman wrote:

> Yes, there is a common function used throughout the svn codebase -- 
> its only purpose is to return a "yes" or "no" answer to the question, 
> "has this file been changed by the user?"  I think it's called 
> svn_wc_text_modified_p().
>
> The algorithm is extremely similar to what CVS does:
>
>    stat working and textbase files.
>    if (mtimes of working and textbase are equal):
>         return NOT_CHANGED;
>    else if (filesizes of working and textbase are unequal):
>         return CHANGED;
>    else
>         compare the files byte-by-byte.  /* very slow */
>
> This function is used by 'svn status', 'svn commit', 'svn update', and 
> many other commands.  Without the mtime check, these command commands 
> would be EXTREMELY slow.

Ben,

I'm curious why you don't use this alternative heuristic as less likely 
to return a false answer and no slower (I'm making the common 
assumption that getting mtime and size are both stat operations that 
are fetched in one operation):

    if (filesizes of working and textbase are unequal):
         return CHANGED;
    else if (mtimes of working and textbase are equal):
         return NOT_CHANGED;
    else
         compare the files byte-by-byte.  /* very slow */

-Travis


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by Ben Collins-Sussman <su...@collab.net>.

On Mar 15, 2005, at 7:33 AM, David Ripton wrote:

> On 2005.03.15 02:10:39 +0000, mike south wrote:
>> It looks to my untrained (and uninformed!) eye as though svn diff may 
>> be
>> checking the mtime of a file against the
>> .svn/text-base/[filename].svn-base version (the "pristine copy") of 
>> the
>> file, and skipping the diff if the mtimes are the same.
>
> I haven't actually looked closely at the code, but I believe this is 
> the
> case.

Yes, there is a common function used throughout the svn codebase -- its 
only purpose is to return a "yes" or "no" answer to the question, "has 
this file been changed by the user?"  I think it's called 
svn_wc_text_modified_p().

The algorithm is extremely similar to what CVS does:

    stat working and textbase files.
    if (mtimes of working and textbase are equal):
         return NOT_CHANGED;
    else if (filesizes of working and textbase are unequal):
         return CHANGED;
    else
         compare the files byte-by-byte.  /* very slow */

This function is used by 'svn status', 'svn commit', 'svn update', and 
many other commands.  Without the mtime check, these command commands 
would be EXTREMELY slow.

The libsvn_client functions are also very careful to put a 'sleep 1' in 
at the correct places (just like CVS does), to exactly avoid the 
situation you've described in your email.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: bug in svn diff and related?

Posted by David Ripton <dr...@ripton.net>.

On 2005.03.15 02:10:39 +0000, mike south wrote:
> It looks to my untrained (and uninformed!) eye as though svn diff may be 
> checking the mtime of a file against the 
> .svn/text-base/[filename].svn-base version (the "pristine copy") of the 
> file, and skipping the diff if the mtimes are the same.

I haven't actually looked closely at the code, but I believe this is the
case.

I've seen a related issue, where clock skew between the local machine
and the NFS server that hosted the working copy caused svn:keyword
expansions to look like local modifications.

> I have code that is adding an empty file to the working copy, 
> committing, then filling the file with contents, then committing that.
> 
> What appears to me to be happening is that some times the commit of the 
> empty file (and the writing of its .svn/text-base copy) are happening 
> within the same second that the modify of the file happens.  Then when 
> you try to commit the modified file subversion doesn't see the 
> modifications, and ignores your commit attempt.

I don't know if this actually qualifies as a bug, but it's certainly
tricky behavior that could be documented better.

In your case, *some* time certainly elapses between the checkout and the
checkin, so using precise enough timestamp checks to make it impossible
to sneak the two operations into the same tick (and/or a one-tick sleep) 
should avoid the problem.

A quick scan of the code shows that svn_wc__timestamps_equal_p (in
questions.c) uses apr_time_t, which is supposed to have microsecond
resolution, not one-second resolution.  So this shouldn't happen, unless
there's a bug, or you're using an OS or filesystem that doesn't support
very precise timestamps.  (I don't think you said what you were
running.)

I assume that if you put "sleep 1" at the right place in your Perl
script, the problem goes away.  Does it go away if you sleep for 0.01
seconds?  (I think you have to use Time::HiRes, or select, to sleep for
less than a second in Perl.)

If you don't get a better response here, you might want to try the dev
list.

-- 
David Ripton    dripton@ripton.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org