You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Paul Libbrecht <pa...@activemath.org> on 2004/07/26 14:27:45 UTC

Pluggeable diff...

Hello List,


We recently played with the cmd-diff and cmd-diff3 settings in the hope 
of getting "pluggeable diffs" as we wished... and... well.. it works 
but it is not at all what we understood as pluggeable diffs.

Namely, we had expected such a setting to be the diff used to compute, 
at commit time, the difference to be stored in the database (more or 
less equivalent to what's stored as fragements of the ",v" files of 
CVS), and, at update time, the merge algorithm to modify the files.

We have an XML-diff and would like to put it to use inside such a tool 
as subversion. The latter should provide us the storage and transport 
mechanism, being agnostic of the data of the diffs and updates... Maybe 
I should call this the "patch" format.

Did I understand wrongly ?
How hard would it be to modify Subversion so that the patch-format is 
pluggeable.

paul


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Pluggeable diff...

Posted by Branko Čibej <br...@xbc.nu>.
Paul Libbrecht wrote:

> Le 26 juil. 04, à 16:32, C. Michael Pilato a écrit :
>
>>> Did I understand wrongly ?  How hard would it be to modify
>>> Subversion so that the patch-format is pluggeable.
>>
>> Why would you actually want this?  I think some real use-cases --
>> including where Subversion is failing you now -- would help out here.
>
>
> One use case is to be able to base on a more descriptive 
> representation of change to allow, for example:
> - source respecting updates (e.g. respecting an "own" identation scheme)
> - more explicit merging, including the ability to show merging within 
> a user-interface: (the person has added this element and you have 
> added this element as well, what should we do ? Currently, such 
> conflict resolution is done in the source!)
> - more explicit merging may mean a better computation of the update 
> operations' commutativity, hence less frequent conflicts.
> - more explicit merging may also mean a better "management of change" 
> where a tool may analyze the incoming changes and warn on the impact 
> of things that depend on that (or do the same at commit time, so that 
> you know whose content you may impact by publishing such a change)
>
> Hope that gives some light, I do think there's more but these are a 
> few important ones, I think.

But none of these should touch the way the server and client exchange 
data. I think there's a misconception at work here again. You can 
already plug in your own diff on the client, using the diff-cmd and 
diff3-cmd config options (granted, it would take a bit of work to make 
their behaviour depend on the type of file), and you can maintain 
invariants (e.g., enforce an indentation shceme) on the server in 
pre-commit hooks. But changing the _delta_ algorithm -- the one that 
determines how the server and client communicate changes, and how 
changes are stored in the repository -- doesn't make sense. Yes, you 
could get slightly more efficient storage for certain kinds of files, 
but you wouldn't achieve any of the goals you mention above.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Pluggeable diff...

Posted by Paul Libbrecht <pa...@activemath.org>.
Le 26 juil. 04, à 16:32, C. Michael Pilato a écrit :
>> We have an XML-diff and would like to put it to use inside such a tool
>> as subversion. The latter should provide us the storage and transport
>> mechanism, being agnostic of the data of the diffs and
>> updates... Maybe I should call this the "patch" format.
> Subversion is already agnostic -- it treats all files as binary when
> transferring changes between repos and client.

"changes", that's what we want to affect.

>> Did I understand wrongly ?  How hard would it be to modify
>> Subversion so that the patch-format is pluggeable.
> Why would you actually want this?  I think some real use-cases --
> including where Subversion is failing you now -- would help out here.

One use case is to be able to base on a more descriptive representation 
of change to allow, for example:
- source respecting updates (e.g. respecting an "own" identation scheme)
- more explicit merging, including the ability to show merging within a 
user-interface: (the person has added this element and you have added 
this element as well, what should we do ? Currently, such conflict 
resolution is done in the source!)
- more explicit merging may mean a better computation of the update 
operations' commutativity, hence less frequent conflicts.
- more explicit merging may also mean a better "management of change" 
where a tool may analyze the incoming changes and warn on the impact of 
things that depend on that (or do the same at commit time, so that you 
know whose content you may impact by publishing such a change)

Hope that gives some light, I do think there's more but these are a few 
important ones, I think.

paul

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Pluggeable diff...

Posted by Daniel Berlin <db...@dberlin.org>.
On Jul 26, 2004, at 3:46 PM, Paul Libbrecht wrote:

> Well, even this would be interesting, I think... that would be a half 
> solution... but only half (as it would need to know the exact byte 
> position of an XPath coordinate thus shouldn't tolerate, e.g., a 
> whitespace difference.

I'm not sure you are understanding (it's hard to tell), so just in case 
you arent:
svndiff is the name of the format that is actually stored in the 
database.
It is a set of encoded instructions for how to produce some data (in 
our case, a revision), given some other data (in our case, a previous 
revision).
The binary diff algorithm svn uses, vdelta, produces svndiff 
instructions.

You have two options for changing the binary diff algorithm:
1. Either make yours produce svndiff instructions (which i don't see 
the real benefit in, since it will still be just a set of svndiff 
instructions on binary data)
or
2. change svndiff so it can accomodate what you want to do.

svndiff is actually versioned now with the current version number being 
0 (i produced an svndiff version 1, with better encoding/compression, 
and in the process, added a version number to svndiff), and the version 
number is stored in the svndiff stream.

You could produce an "svndiff version 2" that was nothing like the 
existing svndiff at all (you could make it an xml based diff that works 
in a completely different way, whatever), and as long as you teach 
libsvn_delta to produce/read it, everything should work.

I consider these easy, but then again, i've done this before (As i 
said, i produced an svndiff version 1 whose encoding is completely 
different than the current svndiff version 0).  It took me roughly 3-4 
weeks to produce working code that did this.

What you can't currently do without going outside libsvn_delta,  is 
tell it to use svndiff version 0 for some files (IE use the standard 
binary diff algorithm for some files), and tell it to use "svndiff 
version 2" (your xml diff format) for other files.  That would require 
passing that information down from somewhere on high.


Of course, none of this is "pluggable" diffs, in the sense that they 
are all hard-coded diff algorithms.
You could theoretically plug in any program you want to do the 
encoding/decoding (as long as it can handle arbitrary data, and produce 
something that we can store in the database), but it needs to always be 
available, and always work, or else you'd seriously be f*cked.

The long and short of it is that hardcoded *new* diff algorithms for 
certain types of files isn't actually all that hard.  It's probably a  
month or two of subversion hacking for an experienced subversion 
hacker.
You can do special xml diffing that way if you wanted.

Plugging in random diff and merge programs at diff and merge time is a 
completely intractable idea.
If your program changed its format, or failed to work with some 
arbitrary data, it would make your revision database worthless (unless 
you encode the way to decode the data, into the encoding format, which 
is actually what svndiff does).



> Is the application of such patch really scattered around the source 
> code ?

Theoretically, it should only touch libsvn_delta.


>
> paul
>
>
> Le 26 juil. 04, à 19:59, Daniel Berlin a écrit :
>
>> In fact, you *could* plug in your xmldiff, though it would likely be 
>> pointless, since you'd have to take the results and turn it into 
>> byte-oriented insert + copy instructions, which svndiff uses.
>>
>> :)
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Pluggeable diff...

Posted by Paul Libbrecht <pa...@activemath.org>.
Well, even this would be interesting, I think... that would be a half 
solution... but only half (as it would need to know the exact byte 
position of an XPath coordinate thus shouldn't tolerate, e.g., a 
whitespace difference.

Is the application of such patch really scattered around the source 
code ?

paul


Le 26 juil. 04, à 19:59, Daniel Berlin a écrit :

> In fact, you *could* plug in your xmldiff, though it would likely be 
> pointless, since you'd have to take the results and turn it into 
> byte-oriented insert + copy instructions, which svndiff uses.
>
> :)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Pluggeable diff...

Posted by Daniel Berlin <db...@dberlin.org>.
On Jul 26, 2004, at 10:32 AM, C. Michael Pilato wrote:

> Paul Libbrecht <pa...@activemath.org> writes:
>
>> We recently played with the cmd-diff and cmd-diff3 settings in the
>> hope of getting "pluggeable diffs" as we wished... and... well.. it
>> works but it is not at all what we understood as pluggeable diffs.
>>
>> Namely, we had expected such a setting to be the diff used to compute,
>> at commit time, the difference to be stored in the database (more or
>> less equivalent to what's stored as fragements of the ",v" files of
>> CVS), and, at update time, the merge algorithm to modify the files.
>
> Er, no, that's not at all what it is.  Subversion uses a binary
> differencing algorithm, very *not* pluggable,

I think you meant to say: You can plug in any diff algorithm you want 
into the source code, as long as it generates the internal svndiff 
format in the end, which is very non-pluggable ;)

In fact, you *could* plug in your xmldiff, though it would likely be 
pointless, since you'd have to take the results and turn it into 
byte-oriented insert + copy instructions, which svndiff uses.

:)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Pluggeable diff...

Posted by "C. Michael Pilato" <cm...@collab.net>.
Paul Libbrecht <pa...@activemath.org> writes:

> We recently played with the cmd-diff and cmd-diff3 settings in the
> hope of getting "pluggeable diffs" as we wished... and... well.. it
> works but it is not at all what we understood as pluggeable diffs.
> 
> Namely, we had expected such a setting to be the diff used to compute,
> at commit time, the difference to be stored in the database (more or
> less equivalent to what's stored as fragements of the ",v" files of
> CVS), and, at update time, the merge algorithm to modify the files.

Er, no, that's not at all what it is.  Subversion uses a binary
differencing algorithm, very *not* pluggable, for getting information
to and from the repository.  The --diff-cmd and --diff3-cmd options
are just for setting the programs used to generate contextual diffs,
client-side only, for the purposes of display and merging.

> We have an XML-diff and would like to put it to use inside such a tool
> as subversion. The latter should provide us the storage and transport
> mechanism, being agnostic of the data of the diffs and
> updates... Maybe I should call this the "patch" format.

Subversion is already agnostic -- it treats all files as binary when
transferring changes between repos and client.

> Did I understand wrongly ?  How hard would it be to modify
> Subversion so that the patch-format is pluggeable.

Why would you actually want this?  I think some real use-cases --
including where Subversion is failing you now -- would help out here.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org