You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Lorenz <lo...@yahoo.com> on 2022/02/01 08:07:43 UTC

Re: A two-part vision for Subversion and large binary objects.

Daniel Shahaf wrote:

>Lorenz wrote on Mon, Jan 31, 2022 at 07:13:46 +0000:
>> Karl Fogel wrote:
>> >Hi, everyone.  I'd like feedback an idea that I've had for some 
>> >years now but never written up before.
>> >
>> >Subversion can already be used to manage large (usually binary) 
>> >files.  In fact, we use SVN for this at my company and it works 
>> >decently.  However, there are two possible features that would 
>> >make Subversion go beyond "decent" all the way to "quite good" at 
>> >this :-).  They are:
>> >
>> >1) Make pristine text-base files optional. [...]
>> 
>> I'm following the optional pristines debate for a while now. but can't
>> remember a properies based configuration having been discussed.
>> 
>> So here is what I would like to see:
>> 
>
>*Why* would you like to see a properties-based design?  Could you please
>describe the use-case, constraints, business needs, etc., you're
>designing for?  We shouldn't be discussing concrete solutions/designs
>until we have a common understanding of what use-case they are designed
>to solve.
>
>What does this design achieve or enable that other proposals do not?

And there I thought I was asking if I overlooked some discussion about
this variant 8-)

Mainly a property based configuration allows storing a default
configuration based on knowledge about the intended use of files in
the repository. This can't be done by a pure client side (per
workstation or WC wise) approach.
Basing configuration on properties uses existing infrastructur (I
think) like inheritance and auto-props while not preventing a pur
client side usage.

Our use case is not that tricky, we store binaries in our release
branches and tags.
Some are created when branching, so on the branch only the binaries
would be pristine-less.
Tag-WCs on the other hand don't need pristines at all (could use an
export, but a WC maintains the repo connection and makes sure the
files haven't been fiddeled with/corrupted)

My original post was not mainly about our use case - we are doing just
fine as it is - but pointing out a more flexible approach for
configuration of the pristine-less WC feature.


>As you may have seen elsethread, Julian has already begun to implement
>a single per-WC toggle design (as a first iteration of the new
>functionality).  If you see any conflict between that work and your
>use-case, please say so sooner rather than later: it's easier to pivot
>before work has been done than after.
>
>> 1) a inheritable svn:pristine (on, off, on-demand) property on files
>> and folders.
>> On folders that could be extended to handle thresholds (size, age).
>
>For files, this could make sense.  "This file will not need to be diffed
>by most users" does sound like information the user might have that we
>can't determine otherwise.  Perhaps it's a generated file (that gets put
>in veresion control for whatever reason; e.g., our dist/ repository, and
>the Apache CMS websites/ tree).  Perhaps it's a file that only one or
>two users will be modifying.

For a whole folder with such files it's easier to configure on the
folder.
Also inheritance needs folder properties so far as I know, so setting
no-pristine for a whole tag would need them too.


>For folders, however, I don't see how this makes sense.  Size and age
>thresholds are not an intrinsic property of an inode in the versioned
>filesystem; they are time-space trade-offs that each client makes.
>Different clients could make different trade-offs, and clients that
>checkout today's HEAD in the future (using the «svn checkout $URL@$peg»
>syntax) might have different needs than clients that checkout today's
>HEAD today.

That's what the local override is for.
Using a WC-only property sibling of the repo stored svn:pristine
allows to access both variants in the same way.


>What do you think of using an r0 revision property for storing
>information about what files typically don't need their pristines?
>
>This could get interesting if some of the files involved are protected
>by "no access" authz.

svn:auto-props come into play here (not sure if that works for folder
properties too in the moment)


>> 2) a inheritibe svn:pristine-wc property for local override. This
>> property would WC-only, not to be stored in the repository.
>> 
>
>There is such a thing as WC-only (see SVN_PROP_WC_PREFIX and
>svn_prop_wc_kind).  The existing ones are deliberately not shown in the
>UI.  IIRC, they were used as the precursor of today's NODES.dav_cache
>column, as a place where the RA layer can store per-file information.

can't comment on this, I have almost no knowledge about the inner
workings of subversion 8-)


>In any case, properties are for attributes of the versioned filesystem's
>inodes.  They are not for local configuration.  It wouldn't make sense
>for «svn diff» to show both changes to, say, a file's encoding (in
>svn:mime-type) and to a file's pristinefulness, because those are
>different kinds-of-things: one describes the versioned inode, like its
>charset (which can be stored in svn:mime-type), and another is
>a property of the user's working copy [literally] of that versioned
>inode, like its depth.
>
>If it's not committable, it shouldn't be shown by «svn diff».

No need to show wc-only properties in a diff. They represent no change
to the versioned content.
The wc-only properties would be accessable via the property commands
(that does not mean, that a dedicated command managing svn:pristine-wc
proerties may be helpful).


>> 3) optional but not neccessary a command line option for svn checkout
>> to set svn-pristine-wc to off.
>> Optional because one can always restricting the initial checkout
>> depth, setting the propery, then update.
>> 
>> 4) for workstation global settings an entry in the config
>> corresponding to the svn:pristine-wc would be needed.
>
>If there's a "typical" configuration that many clients will want to use,
>having some way for the server to advise them about it would make sense,
>as would letting clients decide whether or not to honour the advice
>(both by default and _ad hoc_ for a particular working copy).
>
>Furthermore, whenever we have some sort of server-recommended
>configuration, having some syntax to show where the wc differs from the
>recommendation will make sense.  For instance, for depth I do
>.
>    svn info -R | grep-dctrl -F Depth '' -s Path,Depth
>.
>to print all local files that have a depth other than the default — but
>having some syntactic sugar (e.g., a tools/ script) to do this query
>would make sense.  This goes not only for server-recommended depth
>configuration but also for server-recommended pristinefulness
>configuration, if we have that.
>
>Looking forward to hearing about your use-case.
>
>Cheers,
>
>Daniel

Sorry, out of spare time for now.
-- 

Lorenz


Re: A two-part vision for Subversion and large binary objects.

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Lorenz wrote on Tue, Feb 01, 2022 at 08:07:43 +0000:
> Daniel Shahaf wrote:
> 
> >Lorenz wrote on Mon, Jan 31, 2022 at 07:13:46 +0000:
> >> Karl Fogel wrote:
> >> >Hi, everyone.  I'd like feedback an idea that I've had for some 
> >> >years now but never written up before.
> >> >
> >> >Subversion can already be used to manage large (usually binary) 
> >> >files.  In fact, we use SVN for this at my company and it works 
> >> >decently.  However, there are two possible features that would 
> >> >make Subversion go beyond "decent" all the way to "quite good" at 
> >> >this :-).  They are:
> >> >
> >> >1) Make pristine text-base files optional. [...]
> >> 
> >> I'm following the optional pristines debate for a while now. but can't
> >> remember a properies based configuration having been discussed.
> >> 
> >> So here is what I would like to see:
> >> 
> >
> >*Why* would you like to see a properties-based design?  Could you please
> >describe the use-case, constraints, business needs, etc., you're
> >designing for?  We shouldn't be discussing concrete solutions/designs
> >until we have a common understanding of what use-case they are designed
> >to solve.
> >
> >What does this design achieve or enable that other proposals do not?
> 
> And there I thought I was asking if I overlooked some discussion about
> this variant 8-)
> 

We have discussed making it possible to enable/disable pristinelessness
on a finer granularity than per-working-copy.  See, for instance,
<https://mail-archives.apache.org/mod_mbox/subversion-dev/202108.mbox/%3CCAP_GPNgFGSfpjVkr6dAEtxiipgCoOqznwJyCX6ietMDRQH-TAw%40mail.gmail.com%3E>
in the early part of this thread.

We've also discussed passing a file's properties hash to a per-file
callback predicate that decides whether or not the file would be
pristineful; see, e.g.,
<https://mail-archives.apache.org/mod_mbox/subversion-dev/202201.mbox/%3C53D67C6B-82ED-4B4D-9594-187BD6982197%40getmailspring.com%3E>
and <https://mail-archives.apache.org/mod_mbox/subversion-dev/202201.mbox/%3C87sftrorsl.fsf%40red-bean.com%3E>.

And it turns out I actually proposed earlier in this thread to use
properties both for storing the server-recommended configuration and for
storing the client settings.

> Mainly a property based configuration allows storing a default
> configuration based on knowledge about the intended use of files in
> the repository. This can't be done by a pure client side (per
> workstation or WC wise) approach.

Let's distinguish the two uses of properties here: using them as
a vehicle for the server to propose configuration to the client with,
and using them for storing client settings.

> [Server uses properties to communicate recommended configuration to
> the client]

The server doesn't know the client's use-case.  See, for instance,
<https://mail-archives.apache.org/mod_mbox/subversion-dev/202201.mbox/%3C87sftrorsl.fsf%40red-bean.com%3E>.
In that sense, having some sort of server-proposed default would be
a secondary priority, something less important than implementing some
fully client-side way to select which subset of files should have
pristines.  That's just like how we have depth and even viewspecs but
don't have a way for the server to distribute viewspecs to client.

If we were to have a way for the server to distribute recommended
viewspecs (whether for depth or the equivalent for pristinefulness),
then:

- We might want the the server to offer the client more than one
  viewspec, so the client will be able to choose.  E.g., the server
  could offer a "qa" preset and a "dev" preset, and each client would
  choose what they need.  Or our site/publish/ tree could have a "ja"
  preset that includes index.ja.html and excludes index.zh.html.
  (Besides, if a single preset would work for everyone, we could just
  make it the default and let people use --depth=infinity to override.
  [I'm stating this in terms of depth, but the situation is similar for
  pristines, I think.])

- How _else_ could the recommendations be transferred, if not via node
  properties?  The contents of «svnadmin dump» and locks are virtually
  the only things a client can fetch in-band, and node properties and
  in-tree .svnfoo files are the only things in that set that are subject
  to authz.
  
  Moreover, it can be argued that "How much of the code does a user
  need" is a property of the code that changes with time (and if someone
  wants to change the value retroactively, that's impossible like it's
  impossible to fix bugs in old revisions retroactively). 
  
  I guess the alternative to node properties are (1) invent a new RA API
  (that does authz as needed); (2) let admins come up with out-of-band
  solutions (that use svnauthz(1) if necessary).

> [Storing client settings in properties]

This means «svn propset» will have two different meanings: both "deal
with versioned data" and "deal with wc configuration".  On the one hand,
I somewhat hesitate to overload «propset» this way; but on the other
hand, perhaps this is actually the direction we should go, like Unix has
its "Everything is a file" design.  E.g., running with the idea, should
we reimagine «svn update --set-depth=exclude foo» as syntactic sugar for
«svn propset svn:wc:depth exclude foo»?  And then have, say, «svn diff
--properties-only --include-wc-props», so we'd have "viewspecs" that
could be applied by «svn patch»?

But then again, why is the "single per-WC toggle" design not sufficient
for your use-case?  And what other ways are there to support your
"branch WC" use-case?

> Basing configuration on properties uses existing infrastructur (I
> think) like inheritance and auto-props while not preventing a pur
> client side usage.
> 

"infrastructure" is right word.

> Our use case is not that tricky, we store binaries in our release
> branches and tags.
> Some are created when branching, so on the branch only the binaries
> would be pristine-less.
> Tag-WCs on the other hand don't need pristines at all (could use an
> export, but a WC maintains the repo connection and makes sure the
> files haven't been fiddeled with/corrupted)

Thanks for explaining your use-case.  It sounds like a good example of
where one would want only some (a few?) files to be pristineless.

A working copy of a tag is a special case, since it doesn't expect to
ever have local modifications; if it did have local mods, wouldn't need
to keep them or commit them; and in any case, is fully served by the
single per-WC toggle that's already being worked on.

The first two attributes also make the tags use-case a lot easier to
solve without Subversion; e.g., with plain old rsync(1), or by making
the tree read-only at the OS level.

For the "branch WC" use-case, see my questions above.

> My original post was not mainly about our use case - we are doing just
> fine as it is - but pointing out a more flexible approach for
> configuration of the pristine-less WC feature.

I understand, but I stand by my previous remarks that we should discuss
use-cases and requirements first and solutions second, not the other way
around.  Are per-file granularity, inheritability, and configuration
that's settable by 1.14 and older clients design goals?  Or are we
making a "When your tool is directory properties, every problem looks
like a nail" mistake?

> >As you may have seen elsethread, Julian has already begun to implement
> >a single per-WC toggle design (as a first iteration of the new
> >functionality).  If you see any conflict between that work and your
> >use-case, please say so sooner rather than later: it's easier to pivot
> >before work has been done than after.
> >
> >> 1) a inheritable svn:pristine (on, off, on-demand) property on files
> >> and folders.
> >> On folders that could be extended to handle thresholds (size, age).
> >
> >For files, this could make sense.  "This file will not need to be diffed
> >by most users" does sound like information the user might have that we
> >can't determine otherwise.  Perhaps it's a generated file (that gets put
> >in veresion control for whatever reason; e.g., our dist/ repository, and
> >the Apache CMS websites/ tree).  Perhaps it's a file that only one or
> >two users will be modifying.
> 
> For a whole folder with such files it's easier to configure on the
> folder.

If there is such a folder, sure.

That would mean the setting would "inherit" automatically when a branch
(of the entire containing directory) is created.  That makes sense.

What happens if the file is copied to some other folder?  If it's pulled
as a file external somewhere?  Would the semantics of inherited
properties in such cases be appropriate for use-cases of the proposed
inheritable property?

> Also inheritance needs folder properties so far as I know, so setting
> no-pristine for a whole tag would need them too.

As mentioned above, tag wc's being a special case that has relatively
easy non-Subversion solutions.  Even within Subversion, since it only
undergoes «checkout» and a networked «revert» and has the same semantics
for all its files, it's somewhat of a _sui generis_: e.g., in this case,
we could add a --option to «svn checkout» that makes the new tag wc
entirely pristineless.

> 
> >For folders, however, I don't see how this makes sense.  Size and age
> >thresholds are not an intrinsic property of an inode in the versioned
> >filesystem; they are time-space trade-offs that each client makes.
> >Different clients could make different trade-offs, and clients that
> >checkout today's HEAD in the future (using the «svn checkout $URL@$peg»
> >syntax) might have different needs than clients that checkout today's
> >HEAD today.
> 
> That's what the local override is for.
> Using a WC-only property sibling of the repo stored svn:pristine
> allows to access both variants in the same way.
> 

*nod*

> >What do you think of using an r0 revision property for storing
> >information about what files typically don't need their pristines?
> >
> >This could get interesting if some of the files involved are protected
> >by "no access" authz.
> 
> svn:auto-props come into play here (not sure if that works for folder
> properties too in the moment)

It doesn't:

[[[
% svnadmin create r 
% svn co -q file://$PWD/r wc
% cd wc
% svn ps -q svn:auto-props '* = k=v' . 
% mkdir foo 
% svn add -q foo
% svn mkdir -q bar 
% svn pl -v */
% 
]]]

This might be a bug.

> >> 2) a inheritibe svn:pristine-wc property for local override. This
> >> property would WC-only, not to be stored in the repository.
> >> 
> >
> >There is such a thing as WC-only (see SVN_PROP_WC_PREFIX and
> >svn_prop_wc_kind).  The existing ones are deliberately not shown in the
> >UI.  IIRC, they were used as the precursor of today's NODES.dav_cache
> >column, as a place where the RA layer can store per-file information.
> 
> can't comment on this, I have almost no knowledge about the inner
> workings of subversion 8-)
> 

Sure.  I was just providing context, both for you and for other readers.

> >In any case, properties are for attributes of the versioned filesystem's
> >inodes.  They are not for local configuration.  It wouldn't make sense
> >for «svn diff» to show both changes to, say, a file's encoding (in
> >svn:mime-type) and to a file's pristinefulness, because those are
> >different kinds-of-things: one describes the versioned inode, like its
> >charset (which can be stored in svn:mime-type), and another is
> >a property of the user's working copy [literally] of that versioned
> >inode, like its depth.
> >
> >If it's not committable, it shouldn't be shown by «svn diff».
> 
> No need to show wc-only properties in a diff. They represent no change
> to the versioned content.

+1

> The wc-only properties would be accessable via the property commands
> (that does not mean, that a dedicated command managing svn:pristine-wc
> proerties may be helpful).

*nod*  That'd be the moral equivalent of «svn edit-log-message -r 42»…
(I'd probably have added this to my shell's dotfiles' «svn() { … ;
command svn "$@" }» wrapper function if I hadn't known the right syntax
by heart.)

> >> 3) optional but not neccessary a command line option for svn checkout
> >> to set svn-pristine-wc to off.
> >> Optional because one can always restricting the initial checkout
> >> depth, setting the propery, then update.
> >> 
> >> 4) for workstation global settings an entry in the config
> >> corresponding to the svn:pristine-wc would be needed.
> >
> >If there's a "typical" configuration that many clients will want to use,
> >having some way for the server to advise them about it would make sense,
> >as would letting clients decide whether or not to honour the advice
> >(both by default and _ad hoc_ for a particular working copy).
> >
> >Furthermore, whenever we have some sort of server-recommended
> >configuration, having some syntax to show where the wc differs from the
> >recommendation will make sense.  For instance, for depth I do
> >.
> >    svn info -R | grep-dctrl -F Depth '' -s Path,Depth
> >.
> >to print all local files that have a depth other than the default — but
> >having some syntactic sugar (e.g., a tools/ script) to do this query
> >would make sense.  This goes not only for server-recommended depth
> >configuration but also for server-recommended pristinefulness
> >configuration, if we have that.
> >
> >Looking forward to hearing about your use-case.
> >
> >Cheers,
> >
> >Daniel
> 
> Sorry, out of spare time for now.

No worries.

Cheers,

Daniel