You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by anatoly techtonik <te...@gmail.com> on 2010/08/26 09:57:47 UTC

Extensible changeset format proposal

Hello,

Don't you think it is time to design an extensible changeset format
for exchanging information about changesets between systems?

Right now I am struggling to extract full information from uncommitted
Subversion changeset for uploading it for review (in Rietveld
project). Rietveld code review tool was initially designed to work
with Subversion, but so far it is still impossible to get complete
diff of changes from SVN that reviewer can apply to its working copy
and commit after review. The problem to get complete diff is twofold:

1. Subversion data for uncommited changeset is scattered and it is
hard to say if it ever complete.
2. "svn diff" format is too limited.

For the first part I can give an example of problem I am trying to
solve currently - 'Rietveld code review data is missing files that
were created as a result of "svn copy" or "svn move" operation'. If a
text file is added with "svn add" - its contents will appear in "svn
diff" output, but text files created as a result of "svn move" or "svn
copy" operation will not. To get this missing information one need to
run "svn status", check for the presence of copied or moved files
(marked with "A  +"), check these files are not binary, manually
reconstruct change chunk for them and append missing data to the
output of "svn diff". But even after that reviewer still won't be able
to exactly reproduce changeset, because "svn diff" format will not
contain information about source of copied or moved file. And here
comes the second part.

"svn diff" format doesn't record enough information to reproduce
committed changeset. For example, it doesn't have data about source of
copied and moved files. This is believed to be solved by "git diff"
format, but it won't be a panacea either, because Subversion
changesets also contain information about properties, mime types etc.
It is also impossible to include binary files (if needed) or original
author info (can be useful for contibulyzer), or any other information
that a given VCS (Subversion in this case) is needed to completely
reconstruct its own changeset.

For code reviews, ideally, code review system such as Rietveld should
grab the changeset, parse it and extract relevant information for
reviewer (skipping or filtering non-interesting parts and giving
warning about unknown parts). It should also save original or filtered
changeset file to be imported and committed if review is successful.


That's why extensible changeset format is required. It will not only
be useful for sending changesets for review, but also for
synchronizing changes with other VCSes. With new changeset format
mirroring tool could automatically analyze incoming data to find
Subversion related attributes to save them into repository directly
and automatically save all other attributes to properties.

I see this format as an XML format that resembles Atom feed, with
logical order of events (i.e. file removed after it was copied etc.).
Subversion already uses XML formats internally, so I logically assume
that folks here possess required experience and may even have some
ready pieces to work out an initial draft of such format.

Please, CC.
--
anatoly t.

Re: Extensible changeset format proposal

Posted by Greg Hudson <gh...@MIT.EDU>.
On Thu, 2010-08-26 at 05:57 -0400, anatoly techtonik wrote:
> Don't you think it is time to design an extensible changeset format
> for exchanging information about changesets between systems?

Mostly for your entertainment, see:

http://www.red-bean.com/pipermail/changesets/2003-April/thread.html

There was an attempt to create a unified cross-system changeset format
seven years ago, but it didn't get very far.  However, the principals
are different today and more is known about the space of successful DVCS
tools.


Re: Extensible changeset format proposal

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Aug 26, 2010 at 12:57:47PM +0300, anatoly techtonik wrote:
> Hello,
> 
> Don't you think it is time to design an extensible changeset format
> for exchanging information about changesets between systems?
> 
> Right now I am struggling to extract full information from uncommitted
> Subversion changeset for uploading it for review (in Rietveld
> project). Rietveld code review tool was initially designed to work
> with Subversion, but so far it is still impossible to get complete
> diff of changes from SVN that reviewer can apply to its working copy
> and commit after review. The problem to get complete diff is twofold:
> 
> 1. Subversion data for uncommited changeset is scattered and it is
> hard to say if it ever complete.
> 2. "svn diff" format is too limited.
> 
> For the first part I can give an example of problem I am trying to
> solve currently - 'Rietveld code review data is missing files that
> were created as a result of "svn copy" or "svn move" operation'. If a
> text file is added with "svn add" - its contents will appear in "svn
> diff" output, but text files created as a result of "svn move" or "svn
> copy" operation will not.

In trunk, svn diff has a --show-copies-as-adds option, which causes copied
and moved files to be displayed even if they weren't modified after being
copied/moved. This will be released in 1.7.

> To get this missing information one need to
> run "svn status", check for the presence of copied or moved files
> (marked with "A  +"), check these files are not binary, manually
> reconstruct change chunk for them and append missing data to the
> output of "svn diff". But even after that reviewer still won't be able
> to exactly reproduce changeset, because "svn diff" format will not
> contain information about source of copied or moved file. And here
> comes the second part.

svn diff does show deleted files, so it also shows the delete half
of each move. With the new --git option of svn diff, you get headers
which tell you where something was copied from.

> "svn diff" format doesn't record enough information to reproduce
> committed changeset. For example, it doesn't have data about source of
> copied and moved files. This is believed to be solved by "git diff"
> format, but it won't be a panacea either, because Subversion
> changesets also contain information about properties, mime types etc.

svn diff and svn patch in trunk can show and apply property diffs,
respectively. This will be released in 1.7.

> It is also impossible to include binary files (if needed) or original
> author info (can be useful for contibulyzer), or any other information
> that a given VCS (Subversion in this case) is needed to completely
> reconstruct its own changeset.

Support for binary data is on the todo list for svn diff / svn patch.
Nothing has been implemented yet.

Showing author information is interesting, though in the general case
where a diff spans multiple revisions it may not be very useful.
But note also that in Subversion trunk, svn log has a --diff option which
shows the committed diff beneath the log message (which includes author
and date information). This will also be released in Subversion 1.7.

> For code reviews, ideally, code review system such as Rietveld should
> grab the changeset, parse it and extract relevant information for
> reviewer (skipping or filtering non-interesting parts and giving
> warning about unknown parts). It should also save original or filtered
> changeset file to be imported and committed if review is successful.
> 
> 
> That's why extensible changeset format is required. It will not only
> be useful for sending changesets for review, but also for
> synchronizing changes with other VCSes. With new changeset format
> mirroring tool could automatically analyze incoming data to find
> Subversion related attributes to save them into repository directly
> and automatically save all other attributes to properties.

You realise that it's often impossible to represent data generated
by one version control tool in another version control tool?
If that was an easy problem, the company I work for would be out of
business because nobody would need our help. We're often migrating data
between version control systems, and there is always compromise involved.

Some things, like add/delete, and maybe even copy (unless you count older
systems like CVS), are virtually universal.
But renames are already represented very differently in virtually every tool.
Directories are another example -- some tools version them, some don't.
And most meta data, like EOL-style and character set of files, commit author
information, list of files touched by a changset, etc., is represented in very
different and sometimes incompatible ways, and sometimes not at all.

There is no single data format that can really solve this problem.
Version control tools differ. In general, you cannot magically mirror every
aspect of a change made in one tool to another tool.

I'm not saying that a common changeset exchange data format would be useless.
It would certainly help if all tools had a unified way of exporting and
importing changesets. But it will always be limited to handling the lowest
common denominator, which often isn't enough. The svn diff --git is the
best we've got so far. It's not perfect, but it's a good step forward.

> I see this format as an XML format that resembles Atom feed, with
> logical order of events (i.e. file removed after it was copied etc.).
> Subversion already uses XML formats internally,

Subversion uses virtually no XML internally.
It can produce some XML for presentation, but data isn't being stored
as XML inside of Subversion.

> so I logically assume
> that folks here possess required experience and may even have some
> ready pieces to work out an initial draft of such format.

We've added the --git option to svn diff, which produces output compatible
to Mercurial and git for some common operations (add, delete, copy). 
That's a common denominator, and the format is nice because it is readable. 

svn diff also has an --xml option which makes it produce XML output.
Currently that only works in --summarize mode, and only for repository to
repository diffs. You cannot use it to show changes in a working copy.
I guess if there really is a need we could extend the XML output.
But I think the --git diff format is nicer, because it contains more
information and is already usable by at least two other tools. Maybe
more tools will start to support it, now that Subversion also supports it.

I hope the new features I've listed above will help you solve the problems
you're trying to solve. If you have further ideas about how they can be
improved, please share them.

Thanks,
Stefan