You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@subversion.apache.org by Johan Corveleyn <jc...@gmail.com> on 2012/04/20 11:57:52 UTC

script to detect timestamp-mismatches (inefficiency) in (1.7) working copies?

Hi all,

As some of you may know, SVN uses filesize and last-mod-time of files
to optimize 'svn status' and other commands (if filesize and
last-mod-time of the file match those in the wc-metadata, the file is
assumed to be unchanged, and doesn't have to be read --- only if these
don't match, the file is read and checksummed, and compared with the
pristine file). If last-mod-times in the wc-metadata are out of sync
with those on the filesystem, a working copy becomes significantly
slower, because 'svn status' has to read all files in full to see
which were modified. Such timestamp mismatches can occur for instance
if the user (or some tool) 'touches' files, or if a working copy is
copied to another disk (this happens quite a lot in my company, f.i.
when developers get a new PC), or just recursively copied without
preserving last-mod-times.

Now, all is not lost if there are mismatches: 'svn cleanup' corrects
the last-mod-times in the wc-metadata (for all files which are still
identical to the pristine, the last-mod-time in metadata will be
updated). But the problem is that users are lazy, and they won't run
'svn cleanup' unless they know it's necessary / beneficial. So I'd
like to help them a bit by offering an easy way to detect if their
working copy contains a significant amount of timestamp-mismatches. A
bit like the 'Analyze' action in the Windows Defrag utility, which can
quickly report an estimated percentage of fragmentation, and based on
that suggests that you should or shouldn't defrag.

But I hate to reinvent the wheel, so: has anyone already written such
a script? I'm mainly interested in 1.7 working copies, but scripts for
1.6 may be interesting as well (we still have plenty of such working
copies lying around here).

Ideally this could become some builtin functionality of svn core, for
instance as an extra option to 'svn status'. Because 'svn status'
already does all the work required to detect these
identical-files-with-mismatching-timestamps (the files which it had to
read/compare in full (because the timestamp didn't match), yet were
completely identical). Maybe status could someday grow an option to
report those files (either a full list, or in some summarized way).

-- 
Johan

Re: script to detect timestamp-mismatches (inefficiency) in (1.7) working copies?

Posted by Johan Corveleyn <jc...@gmail.com>.

On Fri, Apr 20, 2012 at 2:57 PM, Stefan Sperling <st...@elego.de> wrote:
> On Fri, Apr 20, 2012 at 02:38:19PM +0200, Stefan Sperling wrote:
>> On Fri, Apr 20, 2012 at 01:16:24PM +0100, Philip Martin wrote:
>> > Note that when status acquires the lock it has to repeat the timestamp
>> > check, it cannot be assumed that the timestamp is still broken, or that
>> > the file still exists, or that it is still a file, etc.
>>
>> Yes, that's why I said the timestamp would need to be checked again.
>> Of course, node kind etc. would need to be verified again as well.
>
> Filed this idea as http://subversion.tigris.org/issues/show_bug.cgi?id=4162
> Let's see if somebody wants to tackle it :)

Thanks.

As much as I'd like to see this implemented one day, I'm not sure if
I'll have the time to tackle this. So if anyone else is interested,
that'd be very welcome :).

In the meantime I'll start writing a script to detect "working copies
in need of cleanup" (unless someone else lets me know that something
like this has been written already). Such a script might do just a
quick check and exit early if say more than 10 mismatches (or even
just 1) are found (and have a --full option to keep going). Even if
the above feature gets implemented, such a script might still be
useful for 1.7 working copies etc ...

-- 
Johan

Re: script to detect timestamp-mismatches (inefficiency) in (1.7) working copies?

Posted by Stefan Sperling <st...@elego.de>.

On Fri, Apr 20, 2012 at 02:38:19PM +0200, Stefan Sperling wrote:
> On Fri, Apr 20, 2012 at 01:16:24PM +0100, Philip Martin wrote:
> > Note that when status acquires the lock it has to repeat the timestamp
> > check, it cannot be assumed that the timestamp is still broken, or that
> > the file still exists, or that it is still a file, etc.
> 
> Yes, that's why I said the timestamp would need to be checked again.
> Of course, node kind etc. would need to be verified again as well.

Filed this idea as http://subversion.tigris.org/issues/show_bug.cgi?id=4162
Let's see if somebody wants to tackle it :)

Re: script to detect timestamp-mismatches (inefficiency) in (1.7) working copies?

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Stefan Sperling wrote on Fri, Apr 20, 2012 at 14:38:19 +0200:
> Such problems didn't happen during the 1.6->1.7 timeframe as far as I
> can see (people got an error message and understood what was going on).
> 

But in the 1.6->1.7 upgrade people had to go through all working copies
managed by unmanned daemons and manually upgrade all of them.

> If some people want to keep auto-upgrade, let's add a prompt
> ("auto-upgrade this working copy? yes/no") or a knob to the client config
> that advanced users can use to enable silent auto-upgrade. In which case
> 'svn status' would auto-upgrade when updating timestamps.

Another option is not to make format upgrades mandatory --- just like
svn 1.7 can write to format-1 FSFS filesystems.  I don't recall what we
want to bump the format in 1.8 for so I can't tell how realistic that is.

Re: script to detect timestamp-mismatches (inefficiency) in (1.7) working copies?

Posted by Stefan Sperling <st...@elego.de>.

On Fri, Apr 20, 2012 at 01:16:24PM +0100, Philip Martin wrote:
> Stefan Sperling <st...@elego.de> writes:
> 
> > You're not arguing about inter-client effects, right? I.e. a GUI
> > client trying to update its icons while a command line client 'svn
> > status' run is updating timestamps? This situation isn't much different
> > to a command line client performing a concurrent update or commit, which
> > GUIs need to deal with anyway.
> 
> It's probably OK.  What about wc format auto-upgrades?  In the past we
> had read-only operations that retained the old format and write
> operations that auto-upgraded.  Does status auto-upgrade?  Have we
> stopped auto-upgrading altogether?

I think we should stop silent auto-upgrading of working copies forever.

It has caused headaches in environments where multiple clients are installed
and software gets upgraded by admins, software update processes, etc.
I've seen a few situations (during the 1.5->1.6 timeframe) where someone
used TortoiseSVN in a working copy that was also used by eclipse, and the
eclipse project suddently stopped working. It's a nuisance especially for
users who are not even aware that an auto-upgrade happened. Many just
don't know Subversion well enough to be aware of such issues. They know
the minimum of what they need to know to get their day-to-day work done.

Such problems didn't happen during the 1.6->1.7 timeframe as far as I
can see (people got an error message and understood what was going on).

If some people want to keep auto-upgrade, let's add a prompt
("auto-upgrade this working copy? yes/no") or a knob to the client config
that advanced users can use to enable silent auto-upgrade. In which case
'svn status' would auto-upgrade when updating timestamps.

> Note that when status acquires the lock it has to repeat the timestamp
> check, it cannot be assumed that the timestamp is still broken, or that
> the file still exists, or that it is still a file, etc.

Yes, that's why I said the timestamp would need to be checked again.
Of course, node kind etc. would need to be verified again as well.

Re: script to detect timestamp-mismatches (inefficiency) in (1.7) working copies?

Posted by Philip Martin <ph...@wandisco.com>.

Stefan Sperling <st...@elego.de> writes:

> You're not arguing about inter-client effects, right? I.e. a GUI
> client trying to update its icons while a command line client 'svn
> status' run is updating timestamps? This situation isn't much different
> to a command line client performing a concurrent update or commit, which
> GUIs need to deal with anyway.

It's probably OK.  What about wc format auto-upgrades?  In the past we
had read-only operations that retained the old format and write
operations that auto-upgraded.  Does status auto-upgrade?  Have we
stopped auto-upgrading altogether?

Note that when status acquires the lock it has to repeat the timestamp
check, it cannot be assumed that the timestamp is still broken, or that
the file still exists, or that it is still a file, etc.

-- 
Philip

Re: script to detect timestamp-mismatches (inefficiency) in (1.7) working copies?

Posted by Stefan Sperling <st...@elego.de>.

On Fri, Apr 20, 2012 at 11:59:32AM +0100, Philip Martin wrote:
> Stefan Sperling <st...@elego.de> writes:
> 
> > Of course, this would require 'svn status' to obtain a write lock on wc.db.
> > The status operation must not hold a write lock during regular operation
> > because doing so prevents concurrent read access by other clients.
> > To keep the time window where the write lock is held as small as possible,
> > we could collect a list of affected files during the status run while
> > a read-lock is held and update timestamps after the status run for those
> > affected files which still have the same timestamp they had during the run.
> 
> That might cause problems for multi-threaded GUIs.  If one thread is
> running status to update the display it could cause another thread
> attempting update to fail because status has a lock.  So a GUI might
> want to be able to force status to remain read-only.
>
> Which means the GUI has to repair timestamps some other way: by running
> cleanup at intervals say and being aware that the wc is locked.

We can grant control over this at the API level via, say, a new boolean
parameter 'fix_timestamps' in svn_client_status(). We'd probably need
this anyway to provide the current behaviour in backwards-compat APIs.

If the GUI application has a choice in the matter, by allowing or
disallowing a status run to update timestamps, it can synchronise
its own threads accordingly.

I'm fine with bothering client developers with this little detail.
I object to bothering end users with it unless there is no other way.

You're not arguing about inter-client effects, right? I.e. a GUI
client trying to update its icons while a command line client 'svn
status' run is updating timestamps? This situation isn't much different
to a command line client performing a concurrent update or commit, which
GUIs need to deal with anyway.

Re: script to detect timestamp-mismatches (inefficiency) in (1.7) working copies?

Posted by Philip Martin <ph...@wandisco.com>.

Stefan Sperling <st...@elego.de> writes:

> Of course, this would require 'svn status' to obtain a write lock on wc.db.
> The status operation must not hold a write lock during regular operation
> because doing so prevents concurrent read access by other clients.
> To keep the time window where the write lock is held as small as possible,
> we could collect a list of affected files during the status run while
> a read-lock is held and update timestamps after the status run for those
> affected files which still have the same timestamp they had during the run.

That might cause problems for multi-threaded GUIs.  If one thread is
running status to update the display it could cause another thread
attempting update to fail because status has a lock.  So a GUI might
want to be able to force status to remain read-only.  Which means the
GUI has to repair timestamps some other way: by running cleanup at
intervals say and being aware that the wc is locked.

-- 
Philip

Re: script to detect timestamp-mismatches (inefficiency) in (1.7) working copies?

Posted by Stefan Sperling <st...@elego.de>.

On Fri, Apr 20, 2012 at 11:16:27AM +0100, Philip Martin wrote:
> Johan Corveleyn <jc...@gmail.com> writes:
> 
> > Now, all is not lost if there are mismatches: 'svn cleanup' corrects
> > the last-mod-times in the wc-metadata (for all files which are still
> > identical to the pristine, the last-mod-time in metadata will be
> > updated). But the problem is that users are lazy, and they won't run
> > 'svn cleanup' unless they know it's necessary / beneficial. So I'd
> > like to help them a bit by offering an easy way to detect if their
> > working copy contains a significant amount of timestamp-mismatches.
> 
> Can that be done more efficiently than 'svn cleanup'?  Perhaps what you
> want is a 'svn cleanup --verbose' that reports on fixed timestamps and
> redundant pristines?

I don't think adding a new command line option for this will be useful.
Fixing up on-disk timestamps is an obscure low-level operation.
I would guess that most users aren't even aware of the timestamp
optimisation in the first place. Many wouldn't notice the new option
even exists, and most others will probably forget about using it.

Couldn't svn status automatically fix the recorded time-stamp in
meta-data, like svn cleanup does? That way, the problem fixes itself
during normal use. There is one slow status run which compares file
contents and fixes up timestamps in one go. Subsequent status runs
are faster again.

Of course, this would require 'svn status' to obtain a write lock on wc.db.
The status operation must not hold a write lock during regular operation
because doing so prevents concurrent read access by other clients.
To keep the time window where the write lock is held as small as possible,
we could collect a list of affected files during the status run while
a read-lock is held and update timestamps after the status run for those
affected files which still have the same timestamp they had during the run.

Re: script to detect timestamp-mismatches (inefficiency) in (1.7) working copies?

Posted by Philip Martin <ph...@wandisco.com>.

Johan Corveleyn <jc...@gmail.com> writes:

> Now, all is not lost if there are mismatches: 'svn cleanup' corrects
> the last-mod-times in the wc-metadata (for all files which are still
> identical to the pristine, the last-mod-time in metadata will be
> updated). But the problem is that users are lazy, and they won't run
> 'svn cleanup' unless they know it's necessary / beneficial. So I'd
> like to help them a bit by offering an easy way to detect if their
> working copy contains a significant amount of timestamp-mismatches.

Can that be done more efficiently than 'svn cleanup'?  Perhaps what you
want is a 'svn cleanup --verbose' that reports on fixed timestamps and
redundant pristines?

-- 
Philip