You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@apache.org> on 2022/01/28 17:15:53 UTC

[PATCH] Sketch of per-user/per-wc config for pristines-mode

The attached patch is a WIP/debug patch, against 'pristines-on-demand'
branch, sketching how a 'pristines-mode' setting can be set:

  - in the user config (~/.subversion/config)
  - and/or in a new per-WC config (wc/.svn/config)

each with the same syntax ('[working-copy]' section, 'pristines-mode'
config key).

For interest/comments.

Re: [PATCH] Sketch of per-user/per-wc config for pristines-mode (#4889)

Posted by Julian Foad <ju...@apache.org>.
Filed as #4889, https://subversion.apache.org/issue/4889,
"Pristines-on-demand: per-WC config"


Re: [PATCH] Sketch of per-user/per-wc config for pristines-mode

Posted by Karl Fogel <kf...@red-bean.com>.
On 10 Feb 2022, Julian Foad wrote:
>Julian Foad wrote:
>> pristines-on-demand behaviour needs to be made conditional on 
>> WC format.
>> [...]
>> Once that is done, I plan to return to this per-WC config 
>> option storage.
>
>Now that we have (just) decided the default WC format will be the 
>old
>format (31) and upgrading a WC will be optional, that means this 
>per-WC
>config option and storage is not essential for MVP. The feature 
>can just
>be enabled whenever the WC is the new format (32), with no
>pristines-specific config.
>
>I am not saying we will never do pristines-specific config, just 
>that it
>will initially be usable without it.

+1, FWIW.  I agree with you that the "Minimum Viable Product" here 
does not need it, although I do hope (and expect) that eventually 
we'll get it.

Re: [PATCH] Sketch of per-user/per-wc config for pristines-mode

Posted by Julian Foad <ju...@foad.me.uk>.
Julian Foad wrote:
> pristines-on-demand behaviour needs to be made conditional on WC format.
> [...]
> Once that is done, I plan to return to this per-WC config option storage.

Now that we have (just) decided the default WC format will be the old
format (31) and upgrading a WC will be optional, that means this per-WC
config option and storage is not essential for MVP. The feature can just
be enabled whenever the WC is the new format (32), with no
pristines-specific config.

I am not saying we will never do pristines-specific config, just that it
will initially be usable without it.

- Julian


Re: [PATCH] Sketch of per-user/per-wc config for pristines-mode

Posted by Julian Foad <ju...@foad.me.uk>.
I haven't heard any further feedback on this settings storage. I am
still favouring the new in-WC config file ('wc/.svn/config'). It seems
straightforward and understandable with hopefully not so much potential
for bikeshedding (no need for new command line options or commands, for example).

On the 'multi-wc-format' branch I have completed the required test
framework modifications to allow testing multiple requested versions
(test suite '--wc-format-version' option; see the thread
"Multi-WC-format branch: preparing for merge to trunk").

Next I am running tests of pristines-on-demand merged with
multi-wc-format. As expected, tests fail when requesting the older
format, because right now it unconditionally expects the new format. The
pristines-on-demand behaviour needs to be made conditional on WC format.
At libsvn_client layer that's simple, while at libsvn_wc layer it is
also simple but requires a little bit of futzing to make two variants of
some existing wc-db queries, one with and one without the extra
'hydrated' column.

Once that is done, I plan to return to this per-WC config option storage.

- Julian


Re: [PATCH] Sketch of per-user/per-wc config for pristines-mode

Posted by Evgeny Kotkov <ev...@visualsvn.com>.
Julian Foad <ju...@apache.org> writes:

> Some people said it should be in wc.db. Evgeny wrote, "this sounds like
> a property associated with a specificwc_id in the database.  I would say
> that this pretty much rules out optionsof storing it outside the wc.db."
> But (Brane wrote) "WC_ID is hardcoded to 1 pretty much everywhere."
>
> What do you all make of this now? Is a new WC config file plausible?

I tend to think that using a per-wc config file could potentially introduce
more problems that it solves.

A few examples:

- Even if WC_ID is currently hardcoded to 1, I think that storing the new
  property without associating it with a WC_ID would probably circumvent
  this part of the working copy design.  In other words, if at some point the
  database starts to use more than one WC_ID, we would probably have to do
  something about a config option that works on a per-WC_ID basis but isn't
  persisted as such.

- A user-editable config option implies that it can be changed externally
  and that the working copy has to cope with such a possibility.  This rules
  out a relatively simple approach where the property is set when a working
  copy is created and is kept immutable afterwards.

  (This approach also allows entirely skipping the textbase sync by seeing
   that the stored pristines-on-demand property is false.  Compared to that,
   the approach with a config option implies having to do a more thorough
   check because some of the pristines might be missing.)

- To change this option programmatically, one would have to update an existing
  config file, which could be a non-trivial technical task, considering things
  like persisting comments and whitespace trivia.


Thanks,
Evgeny Kotkov
(On vacation for a couple of weeks, so might not be able to respond promptly.)

Re: [PATCH] Sketch of per-user/per-wc config for pristines-mode

Posted by Julian Foad <ju...@apache.org>.
Some more notes on pristines-mode configuration.

Naming of option values: agreed that we need to choose names carefully,
avoiding ambiguity like 'mode=all'.

On the consensus that:

  (1) There should be a user config (file/registry/cmdline) option to set
the desired pristines-mode that will be applied to a new/upgraded WC on checkout/upgrade.
  (2) The desired pristines-mode should be persisted as a property of
the working copy, stored in the WC.

(1) Once we implement a user config setting, then there automatically is
a cmdline option:

  - svn co --config-option=config:working-copy:pristines-mode=FOO

It would of course be nice to have an abbreviation, such as one of:

  - svn co --wc-option=pristines-mode=FOO
  - svn co --pristines-mode=FOO

I suggest such abbreviation is a nicety to be added if we have the time
and inclination, otherwise no big deal to leave it out.

(2) The in-WC setting could be stored either as:

  (2a) a user-editable setting in (new invention) a 'wc/.svn/config'
file; or
  (2b) a machine-editable setting in wc.db; plus a command for changing
it later.

The former (2a, a wc config file) has the nice property of being plain
text user-editable and in the same format as '~/.subversion/config' and
'repo/conf/svnserve.conf'. No additional commands needed to change it.

Introducing a new config file is, strictly speaking, an orthogonal
change, and should be committed separately. The current patch in this
thread contains the basic implementation. Additional work:

  - Document its existence. (Where?)
  - Generate the file, containing documentation and commented-out
example settings, when creating a new WC.

For the latter (2b, setting in wc.db), we would need:

  - some way to display it;
  - a separate command to change it later.

These might be quite simple at the UI level, such as 'svn info' to
display it and 'svn checkout --force' to change it. Even a simple UI
change requires a non-negligible amount of code churn in the client and
WC layer APIs. I don't see any advantages ("performance"? "locality"?)
of storage in wc.db.

Some people said it should be in wc.db. Evgeny wrote, "this sounds like
a property associated with a specificwc_id in the database.  I would say
that this pretty much rules out optionsof storing it outside the wc.db."
But (Brane wrote) "WC_ID is hardcoded to 1 pretty much everywhere."

What do you all make of this now? Is a new WC config file plausible?

Finally now, Lorenz brought up the idea of basing the mode setting on
wc-props, so the user could use (a variant of) the existing properties
mechanism for per-subdirectories and per-file control. That could turn
into a powerful, more general mechanism for configuring any
client-specific properties of a WC and its paths, not specific to
controlling pristines. However, that seems a substantial and tangential
design project, not something to tackle within pristines-on-demand. We
might like to explore it further as a side project.

- Julian


Re: [PATCH] Sketch of per-user/per-wc config for pristines-mode

Posted by Julian Foad <ju...@apache.org>.
Karl Fogel wrote:
> Does the current design involve a new per-wc flag that indicates 
> something about a pristines-(present|absent) mode?
> 
> [...] ideally we would record that fact solely on a 
> per-file basis. [...]

That's a good question to ask, as it's a little complex.

TL;DR:

  - There is consensus to store the *desired behaviour* mode at the WC level.
  - There is no need in principle to store a *current state* indicator
("all present?") at the WC level, but there is a performance overhead
issue to consider and such a state flag is one of the possible ways to
address it, but probably not the best.

I agree it is a Good Thing for the design to rely on per-pristine state
and not add a separate (cached) state flag to duplicate this state
knowledge, and handling all behaviour in a single general case. That is
simpler and more reliable. Such a design is possible in principle.

The current patch (but we can/should change it) adds a "mode" setting at
the WC level which combines two semantics:

  (1) the desired fetching/discarding policy (initially either keep all
pristines or use the current pristines-on-demand behaviour), and also
  (2) one of the modes has the additional meaning of "assume all
present: skip the check and expect to error out if assumption was wrong".

Point (1) is from consensus that we should store, in the WC, the current
"pristines mode" setting, so that in case the WC is portable or used
with different user config settings, we wish to avoid flip-flopping its
pristine store between empty-ish and full. (This mode setting will be
initially set at checkout (or upgrade, etc.), according to config from
the user's config file and/or command-line option.)

I'll explain about point (2) and the alternatives to it.

In practice, the current implementation adds a tree scan with
non-negligible cost. To maintain no overhead until opt-in, it seems we
would need to skip this scan. I discussed other potential optimisations
for it and they are generally difficult and limited, given the overall
shape of this design which wants to populate pristines in bulk before
starting an operation, rather than at the moment of access.

I can see three possible approaches, of which the third looks best.

1. The special "assume all present" semantics in the per-WC mode
setting. It has the disadvantage we both mentioned, basically a less
clean design which could lead to further problems down the road. It
doesn't seem to have any show-stopper problems for a first cut.

2. One alternative would be to change the design such that pristine
fetching is done right at the point of use, rather than fetching
everything we might need before a whole client operation. That would
make it simple to use per-pristine state with no overhead. On the other
hand the client would make authentication callback(s) some time after
starting an operation rather than before the operation begins, and
potential re-connection and re-authentication. Evgeny
and others suggested that would be undesirable, creating a more complex
interaction pattern and network connection pattern for the client to
deal with. I have not tried to evaluate how this might impact clients,
especially GUI-type clients.

3. Another alternative would be to make Subversion detect the "all
pristines present" state on its own, and skip the scans then. There are
two ways to implement that:

  - Augment the current implementation to cache that state in the WC
between runs, separately from the desired mode option; or
  - Switch the order of the DB and file-stat scan phases to do the DB
phase first (low overhead) and skip the file-stat scan phase if the DB
knows all pristines are present.

The second implementation option there has advantages:

  - Simplicity of design: no cached state to maintain.
  - This optimises operations on any all-pristines-present subtree, even
when pristines are missing elsewhere. This might not be common but it's
something. The cached state flag approach either works only for the case
when the whole WC is populated, or would require the additional
complexity of a state flag per subtree.
  - It also optimises out the stats in the dehydrate-only pass after an
operation, for files that are already dehydrated, which may well be common.


Conclusions:

  - This is currently based on my assumption that the overhead
introduced needs to be eliminated until opt-in. We haven't published
measurements to back this up, and should.
  - "switch the scan phases and optimise out the stats" looks worthy of
further investigation, possibly the best thing we can do.
  - A desired mode value that carries an assumption of
all-pristines-present (skip the checks) may be kept in mind as a
fallback if we hit a block with a better solution.


Thoughts?

- Julian


Re: [PATCH] Sketch of per-user/per-wc config for pristines-mode

Posted by Karl Fogel <kf...@red-bean.com>.
In an ideal world, I'd have time to study the details closely 
enough that I would already know the answer to the question I'm 
about to ask. 

However, after (quickly) reading the posts so far, I still wasn't 
completely clear on the answer, so... what the heck, I'll just 
ask!

Does the current design involve a new per-wc flag that indicates 
something about a pristines-(present|absent) mode?

Here's why I ask that question:

I'd always assumed that we wouldn't need such a flag.  That is, 
since the presence/absence of a given pristine is inherently a 
per-file fact, ideally we would record that fact solely on a 
per-file basis. 

Now, our current *user interface* to pristinelessness may only 
offer it on a whole-wc basis, and that's fine.  But that doesn't 
imply a per-wc flag.  The best way to keep the feature flexible -- 
that is, to keep the door open to other user interfaces in the 
future -- is to associate pristine status with each file's entry 
and nowhere else.

Maybe that's what we're doing?  But I get the feeling, from 
skimming the posts, that there *is* some kind of per-wc flag being 
stored.  If so, maybe there's a good reason for it (i.e., perhaps 
my reasoning above is missing some key point).  Or maybe there 
isn't a good reason for it, in which case we should discuss :-).

Best regards,
-Karl

Re: [PATCH] Sketch of per-user/per-wc config for pristines-mode

Posted by Julian Foad <ju...@apache.org>.
Daniel Shahaf wrote:
> The value "all" [...] is not handled at all by check_pristines_mode().
> [...]
> Or perhaps this patch is at too early a stage for any of this to
> matter :-)

That's right: all those values in the patch are just pencil sketches at
this stage.

The same answer applies to Johan's observation yesterday that the naming
of those option values is currently confusing.

The point of sending this patch was just to illustrate the rough shape
of passing an option value through from config files to in-memory WC
data structures to the code that decides whether to hydrate and/or
dehydrate pristines.

I'll be re-working it once I've digested the discussion over the last
couple of days about how to configure and where to store the configured mode.

- Julian


Re: [PATCH] Sketch of per-user/per-wc config for pristines-mode

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Julian Foad wrote on Fri, Jan 28, 2022 at 17:15:53 +0000:
> For interest/comments.

Please use text/* MIME type for patches so our MUAs show them inline.
Naming patches *.txt usually achieves this.

> On the 'pristines-on-demand' branch:
> 
>   - Sketch of configurable pristines-mode per WC.
>   - Debug/trace prints.
> 
> The pristines-mode config option can be set in:
> 
>   - user config (e.g. '~/.subversion/config') -- for user default
>   - (new) WC config file ('wc/.svn/config') -- per WC override
> 
> The pristines mode can:
> 
>   - skip the "hydrating" of modified files
>   - skip the "dehydrating" of unmodified files
> 
> It CANNOT yet:
> 
>   - force "hydrating" files that currently have no local modification
> 
> Syntax and currently effective values for the option (in the user
> config file and/or in the per-WC config file):
> 
>   [working-copy]
>   pristines-mode = hydrate=[never|on-demand],dehydrate=[never|on-demand]

The value "all", returned as the default value from svn_wc__db_pristines_mode(),
is not handled at all by check_pristines_mode().

Therefore, the resulting behaviour will be the default behaviour, namely
to fall back to the values of ALLOW_HYDRATE and ALLOW_DEHYDRATE
hardcoded into the svn_client__textbase_sync() callsite.

Should "all" mean ALLOW_HYDRATE and ALLOW_DEHYDRATE would both be forced
to TRUE?  Or perhaps should it be renamed "default"?

Or perhaps this patch is at too early a stage for any of this to matter :-)

Trimmed diff below.

> +++ subversion/libsvn_wc/textbase.c	(working copy)
> @@ -513,12 +516,52 @@ textbase_hydrate_cb(void *baton,
> +/* Check the pristines mode configured for the WC identified by WC_CTX,
> + * LOCAL_ABSPATH.  Modify the values of *ALLOW_HYDRATE, *ALLOW_DEHYDRATE
> + * according to the mode.

When the value is "all", *ALLOW_HYDRATE and *ALLOW_DEHYDRATE are not
modified.  That doesn't matter for the current callers, but a future
caller could assume that passing the address of an uninitialized
variable is allowed, in which case we'd go on using the uninitialized
value.

> + */
> +static svn_error_t *
> +check_pristines_mode(svn_boolean_t *allow_hydrate,
> +                     svn_boolean_t *allow_dehydrate,
> +                     svn_wc_context_t *wc_ctx,
> +                     const char *local_abspath,
> +                     apr_pool_t *scratch_pool)
> +{
> +  const char *mode;
> +
> +  SVN_ERR(svn_wc__db_pristines_mode(wc_ctx->db, local_abspath,
> +                                    &mode, scratch_pool));
> +
> +  /* ### for now, these are just a bunch of experimental modes and
> +   * experimental matching syntax */
> +  if (strcmp(mode, "off") == 0)
> +    {
> +      *allow_hydrate = FALSE;
> +      *allow_dehydrate = FALSE;
> +    }
> +  if (strcmp(mode, "off-line") == 0)
> +    *allow_hydrate = FALSE;
> +  if (strcmp(mode, "fetch-only") == 0)
> +    *allow_dehydrate = FALSE;
> +  if (strcmp(mode, "fetch-all") == 0)
> +    *allow_dehydrate = FALSE;
> +
> +  if (strstr(mode, "hydrate=never"))
> +    *allow_hydrate = FALSE;
> +  if (strstr(mode, "dehydrate=never"))
> +    *allow_dehydrate = FALSE;
> +
> +  SVN_DBG(("pristines mode: %s -> hydrate=%d dehydrate=%d",
> +           mode, *allow_hydrate, *allow_dehydrate));
> +  return SVN_NO_ERROR;
> +}
> +++ subversion/libsvn_wc/wc_db.h	(working copy)
> @@ -1069,12 +1069,21 @@ svn_wc__db_pristine_check(svn_boolean_t
> +/* Read the pristines mode into *MODE from the per-WC config (if set there)
> + * or else from the per-user config (if set there), with a default value of
> + * "all". */
> +svn_error_t *
> +svn_wc__db_pristines_mode(svn_wc__db_t *db,
> +                          const char *local_abspath,
> +                          const char **mode,
> +                          apr_pool_t *scratch_pool);
> +++ subversion/libsvn_wc/wc_db_wcroot.c	(working copy)
> @@ -229,21 +229,23 @@ svn_wc__db_open(svn_wc__db_t **db,
>  {
>    *db = apr_pcalloc(result_pool, sizeof(**db));
>    (*db)->config = config;
>    (*db)->verify_format = !open_without_upgrade;
>    (*db)->enforce_empty_wq = enforce_empty_wq;
>    (*db)->dir_data = apr_hash_make(result_pool);
> +  (*db)->pristines_mode = "all";

Cheers,

Daniel