You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Kasia Trapszo <kt...@tickets.com> on 2004/07/15 18:16:28 UTC

Bug with revisions?

I ran into this problem today and it looks to be a nasty bug..

File was changed, but the timestamp on the file is the same as the
previous revision.  Running svn diff shows no difference and cannot
commit the file to the repository (as subversion thinks it hasn't been
edited). Do a normal diff on the file (not svn) and there's a
difference.. Do a 'touch' to change the date and now svn sees the
difference. 

This should be easy to reproduce. 

svn version 1.0.4 (r9844) on solaris 7



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Bug with revisions?

Posted by "C.A.T.Magic" <c....@gmx.at>.
C. Michael Pilato wrote:

> Oh, wait.  I just remembered why we don't do this already.  Keyword
> substitution/EOL translation.  I think our full algorithm is:
> 
>    if timestamps not different:
>        return no_diff
>    if keywords or eol-style enabled:
>        de-translate file to tmpfile

I'd suggest,
write the filesize of the real WC file into entries,
(i.e. the size INCLUDING the EOL conversions).
would save a lot of 'de-translation' time.

:)
c.a.t.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Bug with revisions?

Posted by "C. Michael Pilato" <cm...@collab.net>.
"C. Michael Pilato" <cm...@collab.net> writes:

> Kasia Trapszo <kt...@tickets.com> writes:
> 
> > Wouldn't it be safer to hash the size & timestamp and use that instead?
> > I don't think that would affect performance significantly considering
> > both filesize and timestamp come from the inode.. but then this is
> > hardly my area of expertise.. 
> 
> Hm.  I like this idea.  Why not propose it on the dev@ list?

Oh, wait.  I just remembered why we don't do this already.  Keyword
substitution/EOL translation.  I think our full algorithm is:

   if timestamps not different:
       return no_diff
   if keywords or eol-style enabled:
       de-translate file to tmpfile
       if tmpfile size != basefile size:
           return diff
       if tmpfile and basefile differ bytewise:
          return diff
   if file size != basefile size:
       return diff
   if file and basefile differ bytewise:
       return diff
   return no_diff

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Bug with revisions?

Posted by "C. Michael Pilato" <cm...@collab.net>.
Kasia Trapszo <kt...@tickets.com> writes:

> Wouldn't it be safer to hash the size & timestamp and use that instead?
> I don't think that would affect performance significantly considering
> both filesize and timestamp come from the inode.. but then this is
> hardly my area of expertise.. 

Hm.  I like this idea.  Why not propose it on the dev@ list?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Bug with revisions?

Posted by Kasia Trapszo <kt...@tickets.com>.
Wouldn't it be safer to hash the size & timestamp and use that instead?
I don't think that would affect performance significantly considering
both filesize and timestamp come from the inode.. but then this is
hardly my area of expertise.. 

	
It just strikes me as incredibly unsafe to rely on timestamp alone..
particularly for diff output! Perhaps at least provide an option to
force a diff even if the timestamp is the same?  Just for those few
times when we do something that's scripted...

k.




On Thu, 2004-07-15 at 15:16, C. Michael Pilato wrote:
> Kasia Trapszo <kt...@tickets.com> writes:
> 
> > I'm not mocking with timestamps.. this was something that was scripted
> > and while unlikely, I suppose my script was fast enough to write two
> > version of a file within the same second.. you are looking at the
> > timestamp up to the second, right?
> 
> Last I knew, Subversion was supposed to sleep() for a second in
> relevant situations so that this exact problem would be avoided (so
> long as the Subversion process wasn't backgrounded, I suppose).
-- 
Kasia Trapszo <kt...@tickets.com> 
Software Engineer - Tickets.com

office: 203-741-3028      cell: 860-916-8179
aim: kasiachick            yim: kasiachick

     Telepathy available upon request


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Bug with revisions?

Posted by "C. Michael Pilato" <cm...@collab.net>.
Kasia Trapszo <kt...@tickets.com> writes:

> I'm not mocking with timestamps.. this was something that was scripted
> and while unlikely, I suppose my script was fast enough to write two
> version of a file within the same second.. you are looking at the
> timestamp up to the second, right?

Last I knew, Subversion was supposed to sleep() for a second in
relevant situations so that this exact problem would be avoided (so
long as the Subversion process wasn't backgrounded, I suppose).

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Bug with revisions?

Posted by Kasia Trapszo <kt...@tickets.com>.
I'm not mocking with timestamps.. this was something that was scripted
and while unlikely, I suppose my script was fast enough to write two
version of a file within the same second.. you are looking at the
timestamp up to the second, right?

Btw.. file size was different for all that matters.

k.

On Thu, 2004-07-15 at 14:48, Ben Collins-Sussman wrote:
> When scanning a large tree, 99% of files aren't changed, so it's much
> faster to do the timestamp check first... it allows us to move on to the
> next file ASAP.   If we checked filesize first, it would almost always
> be inconclusive ("filesize is the same"), and force us to check the
> timestamp anyway.
> 
> (And yes, I believe the timestamp/filesize are being grabbed from the
> working file in a single stat() already.  We compare the timestamp to
> the one recorded in the .svn/entries file (which is cached in memory).
> If a filesize check is necessary, we compare the filesize to that of the
> .svn/text-base/ copy, which requires another stat().)
> 
> The basic rule-of-trust here is:  we assume the user isn't mucking
> around and creating false timestamps.  Don't Do That.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Bug with revisions?

Posted by "C.A.T.Magic" <c....@gmx.at>.
Jani Averbach wrote:
> More worse, if we bail back to the filesize check we have to first call
> "svn_wc_translated_file" which will create translated copy of
> text-base, and this is a _heavy_ operation. This have to do because we
> don't have the original filesize of translated file in entries
> file. It would be nice to have an original filesize and md5 sum of
> translated file, but we don't have...


is there any problem to extend the XML entries file?

[...]
<entry
    committed-rev="1"
    name="file1.txt"
    text-time="2004-07-12T01:17:50.000000Z"
/>
[...]

could simply get a new entry

    committed-size="3423423"

older svn clients ofcourse would ignore (and remove) that
entry, but even if it does, it's still not worse than it is now.


:)
======
c.a.t.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Bug with revisions?

Posted by Jani Averbach <ja...@jaa.iki.fi>.
On 2004-07-15 13:48-0500, Ben Collins-Sussman wrote:
>
> (And yes, I believe the timestamp/filesize are being grabbed from the
> working file in a single stat() already.  We compare the timestamp to
> the one recorded in the .svn/entries file (which is cached in memory).
> If a filesize check is necessary, we compare the filesize to that of the
> .svn/text-base/ copy, which requires another stat().)
>

More worse, if we bail back to the filesize check we have to first call
"svn_wc_translated_file" which will create translated copy of
text-base, and this is a _heavy_ operation. This have to do because we
don't have the original filesize of translated file in entries
file. It would be nice to have an original filesize and md5 sum of
translated file, but we don't have...

BR, Jani

-- 
Jani Averbach


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Bug with revisions?

Posted by Ben Collins-Sussman <su...@collab.net>.
On Thu, 2004-07-15 at 13:41, Bryan Donlan wrote:

> Wouldn't it be better to check the filesize first? On *nix at least
> both size and time could be checked with one function call, and if
> timestamp resolution (like in FAT) stops the timestamp from changing
> there's at least a chance it'd be detected.

When scanning a large tree, 99% of files aren't changed, so it's much
faster to do the timestamp check first... it allows us to move on to the
next file ASAP.   If we checked filesize first, it would almost always
be inconclusive ("filesize is the same"), and force us to check the
timestamp anyway.

(And yes, I believe the timestamp/filesize are being grabbed from the
working file in a single stat() already.  We compare the timestamp to
the one recorded in the .svn/entries file (which is cached in memory).
If a filesize check is necessary, we compare the filesize to that of the
.svn/text-base/ copy, which requires another stat().)

The basic rule-of-trust here is:  we assume the user isn't mucking
around and creating false timestamps.  Don't Do That.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Bug with revisions?

Posted by Bryan Donlan <bd...@gmail.com>.
On Thu, 15 Jul 2004 13:34:31 -0500, Ben Collins-Sussman
<su...@collab.net> wrote:
> On Thu, 2004-07-15 at 13:16, Kasia Trapszo wrote:
> > I ran into this problem today and it looks to be a nasty bug..
> >
> > File was changed, but the timestamp on the file is the same as the
> > previous revision.  Running svn diff shows no difference and cannot
> > commit the file to the repository (as subversion thinks it hasn't been
> > edited). Do a normal diff on the file (not svn) and there's a
> > difference.. Do a 'touch' to change the date and now svn sees the
> > difference.
> >
> > This should be easy to reproduce.
> 
> This isn't a bug, it's a deliberate SVN design.  SVN algorithm uses the
> same algorithm as CVS to decide if a file has changed:
> 
> if (timestamp unchanged)
>  return FILE_UNCHANGED;
> else if (filesize changed)
>  return FILE_CHANGED;
> else
>  return do_brute_force_byte_for_byte_comparison();
> 
> If the timetstamp-trick weren't the first line of defense, then it would
> take *hours* to run 'svn status', or run 'svn commit'.

Wouldn't it be better to check the filesize first? On *nix at least
both size and time could be checked with one function call, and if
timestamp resolution (like in FAT) stops the timestamp from changing
there's at least a chance it'd be detected.

-- 
bd

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Bug with revisions?

Posted by Ben Collins-Sussman <su...@collab.net>.
On Thu, 2004-07-15 at 13:16, Kasia Trapszo wrote:
> I ran into this problem today and it looks to be a nasty bug..
> 
> File was changed, but the timestamp on the file is the same as the
> previous revision.  Running svn diff shows no difference and cannot
> commit the file to the repository (as subversion thinks it hasn't been
> edited). Do a normal diff on the file (not svn) and there's a
> difference.. Do a 'touch' to change the date and now svn sees the
> difference. 
> 
> This should be easy to reproduce. 

This isn't a bug, it's a deliberate SVN design.  SVN algorithm uses the
same algorithm as CVS to decide if a file has changed:

if (timestamp unchanged)
  return FILE_UNCHANGED;
else if (filesize changed)
  return FILE_CHANGED;
else
  return do_brute_force_byte_for_byte_comparison();

If the timetstamp-trick weren't the first line of defense, then it would
take *hours* to run 'svn status', or run 'svn commit'.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org