You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Näslund <da...@longitudo.com> on 2010/04/05 08:02:56 UTC

[RFC] Proposal for GSoC project - extend 'svn {patch,diff}' with git unidiff format

Hi!

I'm supposed to send this proposal to the Google Summer of Code
machinery and let it be forwarded to the interrested mentor of the
Subversion community, in this case Stefan. In the interrest of openess
I'm posting it here before sending it off to Google later today. Maybe
someone has something they'd like to add?

===========================================================
Git unidiff format extension to 'svn patch' and 'svn diff'
===========================================================

Contents
==========
Suggested workflow
What is the git unidiff format?
Parsing the git headers
Applying tree changes
Applying mode changes
Applying binary patches
Applying property changes

Suggested workflow
----------------------
Here's my project proposal for GSoC 2010. The purpose of the project is
pretty self-explanatory; make 'svn patch' and 'svn diff' able to deal
with git unidiff extensions. I've tried to point out some of the API
changes that are neccessary to show that I have an understanding of what
to do. If I'd get accepted I would do things in this order:

1) Rev funcs to allow a use_git_format flag to be passed down to
   libsvn_diff and create git diff format patches for adds and deletes.
   Write a copule of tests to verify that we get the intended
   format.

2) Add the ability to track renames and copies in libsvn_diff. Probably
   by using some wc funcs for getting the status. My first assumption
  was that the svn_wc_diff_callbacks4_t vtable would be revved to allow
  for copied and moved scenarios once we have editor-v2. But Neels was
  talking about some bigger rewrite where the diff editor would be
  dropped. Anyway, as goes for the 'git unidiff format' work, I need
  some way to detect copies and moves. When I have detection, add the
  git headers for copies and renames and write tests to confirm the
  right behavior.

3) Determine how the base85 format works and write C-tests to confirm
   the behavior. Git does it like this: [4]

4) Pass down a flag for allowing or disallowing binary diffs to
   libsvn_diff. Detect binary files and write the patches. Write tests
   to confirm the behavior.

5) Allow 'svn patch' to apply git diff formats for adds and deletes.
   Write tests to confirm the behavior.

6) Allow 'svn patch' to apply git diff formats for moves and copies.

7) Allow 'svn patch' to apply git diff formats for binary patches. I
   propably need to do some thinking about what state the wc can be in
   as for obstructed, missing, replaced, unversioned, ignored nodes and
   so on.

8) Make libsvn_diff able to record modes. Probably we're only
   interrested in the executable bit and that one can we get from
   svn:executable. Write tests to confirm the behavior.

9) Allow 'svn patch' to apply mode changes (if we agree that we want
   that behavior):

10) Decide on a header for dealing with props? Do we need to stay
    compatible with git and diff? Probably, so we need a header that will be
    ignored by applications not interrested in svn:properties.

11) Decide on the header format for properties. Implement it in the diff
    code and write tests for it.

12) Extend the diff parser to deal with property diffs. Write tests.

13) Done.

What is the git unidiff format?
--------------------------------
The format is thoroughly described in [1] so I'll just recapitulate the
use cases for it:

1) Track copies and renames
2) File mode changes
3) Binary patches

Creating the git headers
-------------------------
A couple of funcs needs to be revved to pass down the neccessary
parameters telling libsvn_diff to create a git diff. And we need a way
to detect copies and renames.

subversion/libsvn_client/diff.c
 (svn_client_diff5): We need a parameter to tell the diff machinery we
    want a git diff.
  (svn_wc_diff_callbacks4_t): We have callbacks for changed, added and
    deleted nodes but none for copied or moved nodes. Since we don't
    have editor-v2 we can't get that info from the server so git diffs
    should only be possible for wc-wc diffs at the moment. At the moment
    I'll probably check the status of the path that we get in
    file_added() and record copied-from or moved-from.

subversion/libsvn_diff/diff_file.c
  (svn_diff_file_diff2): 

Parsing the git headers
------------------------
We have examples of how the parsing should be done from the mercurial
source code [2]. (This link was found in the notes document referred
above. A big thank you to Augie Fackler for taking the time to write
down all the information).

subversion/libsvn_diff/parse-diff.c 
  (parse_git_hunk_header): Create this func to be invoked before
    parse_hunk_header(). Captures oldname, newname, operation and mode.

Applying tree changes
-----------------------
We already have many different scenarios to handle with nodes beeing
obstructed, missing, ignored, unversioned and so on. If we'll track tree
changes the number of scenarios will increase. I probably should make
some kind of graph to map out the possible scenarios.

subversion/libsvn_client/patch.c
  (install_patched_target): Here we're currently handling deletes, adds
    and modifications. With the git diff format we can handle copies
    and moves here too.

Applying mode changes
-------------------------
Subversion does not allow file permissions to be recorded. I
assume it's since it's hard to make those portable between windows fs
and non-windows fs [3]. We'll have to make a decision as to whether 'svn
patch' and 'svn diff' should be able to deal with applying permissions.
As I see it, version control is about tracking file contents, not that
kind of userdata but if someone has a good usecase let me hear! From
what I understand it, mode changes are mostly used for setting the
executable bit but we have svn:executable for that. Hrm, that of course
can't be used yet since we can't use property diffs. :-)

Applying binary patches
-------------------------

subversion/libsvn_client/patch.c
  (init_patch_target): If the content is binary it will be encoded with
    base85. A really, really small possibility but the translated stream
    might translate something encoded as a keyword.

Applying property changes
--------------------------
Subversion has properties and it would be great if those could be
included in patches. We have a diff format for properties that patch(1)
(and hopefully the rest of the patch family) ignores, e.g. they can be
displayed without beeing interpreted by the parser.  We need a header
format that tells the parser on what lines in the patch we have the
properties. All the action is in:

subversion/libsvn_client/diff.c
  (display_prop_diffs)

cheers,
Daniel

[1] notes/svnpatch/svnpatch-git.txt
[2] http://mercurial.selenic.com/hg/hg/file/ac02b43bc08a/mercurial/patch.py#l195
[3] http://pagesperso-orange.fr/b.andre/permissions.html
[4] http://git.kernel.org/?p=git/git.git;a=blob;f=base85.c;hb=HEAD

Re: [RFC] Proposal for GSoC project - extend 'svn {patch,diff}' with git unidiff format

Posted by Daniel Näslund <da...@longitudo.com>.
On Mon, Apr 05, 2010 at 01:41:03PM +0200, Stefan Sperling wrote:
> On Mon, Apr 05, 2010 at 10:02:56AM +0200, Daniel Näslund wrote:

> > Git unidiff format extension to 'svn patch' and 'svn diff'
> > ===========================================================

[..]

> > 2) Add the ability to track renames and copies in libsvn_diff. Probably
> >    by using some wc funcs for getting the status. My first assumption
> >   was that the svn_wc_diff_callbacks4_t vtable would be revved to allow
> >   for copied and moved scenarios once we have editor-v2. But Neels was
> >   talking about some bigger rewrite where the diff editor would be
> >   dropped. Anyway, as goes for the 'git unidiff format' work, I need
> >   some way to detect copies and moves. When I have detection, add the
> >   git headers for copies and renames and write tests to confirm the
> >   right behavior.
> 
> It's a pity that diff is implemented with an editor because it makes
> your task much harder than necessary (you don't have all information
> you need available while the diff is produced piece-by-piece).
> 
> Can you think of a way to do this in a simpler fashion?
> I think this could be done simpler by checking all files which
> appear in the diff for 'copyfrom' info.

The question of how to get that information is not easily answered,
given that we allow wc-wc, repos-wc and repos-repos diffs.

neels said on IRC:
[[[
If you're in a wc-wc diff (i.e. "base" -> "working"), you
could get each node's wc-ng status. easy., but in a repos-repos diff or
repos-wc diff, you can't find that stuff unless you a) actively query 
the repos for more info or b) use editor v2 to tell that stuff right
away. a) sucks because it duplicates server code on the client side.
]]]

We could decide to only allow git diffs for wc-wc diffs. A decision that
will be hard to motivate for users. But maybe we can get it working for
wc-wc and by the time we're finished, editor-v2 will be too. Though The
'Vision and Roadmap' mail says that editor-v2 will not be scheduled for
release until 1.9. Too bad! 

> Let's assume your diff contains files A, B and C:
> 
>   If you find a single relation A->B,
>     if A was deleted: A was moved to B
>     if A was not deleted: A was copied to B
>   If you find multiple relations A->B, A->C:
>     A was copied to B and to C
>     maybe A was also deleted
> 
> But maybe because of the editorness the receiver (i.e. diff text
> producer) doesn't know about A when dealing with B and vice-versa,
> because it only gets to deal with each of them in isolated function
> calls? Bah. That is really ugly.
> Maybe you can fix this by tweaking the way the diff editor is driven?
> Note though that the same editor is used for merge!
> Or maybe you should rewrite diff this year and then do the
> git-diff-extensions project next year? :)
> 
> I like the proposal. The only thing that worries me is that we may
> need to tackle the diff-is-implemented-with-the-merge-editor hack.
> Note that this hack is already biting us elsewhere:
> http://subversion.tigris.org/issues/show_bug.cgi?id=2333#desc12
> 
> Maybe we should just bite the bullet and try to re-implement (or "fix")
> diff together before or during gsoc? It certainly would not hurt to
> have this problem solved before adding the git extensions.

Cheers,
Daniel

Re: [RFC] Proposal for GSoC project - extend 'svn {patch,diff}' with git unidiff format

Posted by Daniel Näslund <da...@longitudo.com>.
On Mon, Apr 05, 2010 at 01:41:03PM +0200, Stefan Sperling wrote:
> On Mon, Apr 05, 2010 at 10:02:56AM +0200, Daniel Näslund wrote:

> > Git unidiff format extension to 'svn patch' and 'svn diff'
> > ===========================================================

[..]

> > 2) Add the ability to track renames and copies in libsvn_diff. Probably
> >    by using some wc funcs for getting the status. My first assumption
> >   was that the svn_wc_diff_callbacks4_t vtable would be revved to allow
> >   for copied and moved scenarios once we have editor-v2. But Neels was
> >   talking about some bigger rewrite where the diff editor would be
> >   dropped. Anyway, as goes for the 'git unidiff format' work, I need
> >   some way to detect copies and moves. When I have detection, add the
> >   git headers for copies and renames and write tests to confirm the
> >   right behavior.
> 
> It's a pity that diff is implemented with an editor because it makes
> your task much harder than necessary (you don't have all information
> you need available while the diff is produced piece-by-piece).
> 
> Can you think of a way to do this in a simpler fashion?
> I think this could be done simpler by checking all files which
> appear in the diff for 'copyfrom' info.

The question of how to get that information is not easily answered,
given that we allow wc-wc, repos-wc and repos-repos diffs.

neels said on IRC:
[[[
If you're in a wc-wc diff (i.e. "base" -> "working"), you
could get each node's wc-ng status. easy., but in a repos-repos diff or
repos-wc diff, you can't find that stuff unless you a) actively query 
the repos for more info or b) use editor v2 to tell that stuff right
away. a) sucks because it duplicates server code on the client side.
]]]

We could decide to only allow git diffs for wc-wc diffs. A decision that
will be hard to motivate for users. But maybe we can get it working for
wc-wc and by the time we're finished, editor-v2 will be too. Though The
'Vision and Roadmap' mail says that editor-v2 will not be scheduled for
release until 1.9. Too bad! 

> Let's assume your diff contains files A, B and C:
> 
>   If you find a single relation A->B,
>     if A was deleted: A was moved to B
>     if A was not deleted: A was copied to B
>   If you find multiple relations A->B, A->C:
>     A was copied to B and to C
>     maybe A was also deleted
> 
> But maybe because of the editorness the receiver (i.e. diff text
> producer) doesn't know about A when dealing with B and vice-versa,
> because it only gets to deal with each of them in isolated function
> calls? Bah. That is really ugly.
> Maybe you can fix this by tweaking the way the diff editor is driven?
> Note though that the same editor is used for merge!
> Or maybe you should rewrite diff this year and then do the
> git-diff-extensions project next year? :)
> 
> I like the proposal. The only thing that worries me is that we may
> need to tackle the diff-is-implemented-with-the-merge-editor hack.
> Note that this hack is already biting us elsewhere:
> http://subversion.tigris.org/issues/show_bug.cgi?id=2333#desc12
> 
> Maybe we should just bite the bullet and try to re-implement (or "fix")
> diff together before or during gsoc? It certainly would not hurt to
> have this problem solved before adding the git extensions.

Cheers,
Daniel

Re: [RFC] Proposal for GSoC project - extend 'svn {patch,diff}' with git unidiff format

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Apr 05, 2010 at 10:02:56AM +0200, Daniel Näslund wrote:
> Hi!
> 
> I'm supposed to send this proposal to the Google Summer of Code
> machinery and let it be forwarded to the interrested mentor of the
> Subversion community, in this case Stefan. In the interrest of openess
> I'm posting it here before sending it off to Google later today. Maybe
> someone has something they'd like to add?
> 
> ===========================================================
> Git unidiff format extension to 'svn patch' and 'svn diff'
> ===========================================================
> 
> Contents
> ==========
> Suggested workflow
> What is the git unidiff format?
> Parsing the git headers
> Applying tree changes
> Applying mode changes
> Applying binary patches
> Applying property changes
> 
> Suggested workflow
> ----------------------
> Here's my project proposal for GSoC 2010. The purpose of the project is
> pretty self-explanatory; make 'svn patch' and 'svn diff' able to deal
> with git unidiff extensions. I've tried to point out some of the API
> changes that are neccessary to show that I have an understanding of what
> to do. If I'd get accepted I would do things in this order:
> 
> 1) Rev funcs to allow a use_git_format flag to be passed down to
>    libsvn_diff and create git diff format patches for adds and deletes.
>    Write a copule of tests to verify that we get the intended
>    format.

It's acceptable to always produce the information in the diff because
patch tools which don't understand it will just ignore it as noise.
So we may not even need a flag.
 
> 2) Add the ability to track renames and copies in libsvn_diff. Probably
>    by using some wc funcs for getting the status. My first assumption
>   was that the svn_wc_diff_callbacks4_t vtable would be revved to allow
>   for copied and moved scenarios once we have editor-v2. But Neels was
>   talking about some bigger rewrite where the diff editor would be
>   dropped. Anyway, as goes for the 'git unidiff format' work, I need
>   some way to detect copies and moves. When I have detection, add the
>   git headers for copies and renames and write tests to confirm the
>   right behavior.

It's a pity that diff is implemented with an editor because it makes
your task much harder than necessary (you don't have all information
you need available while the diff is produced piece-by-piece).

Can you think of a way to do this in a simpler fashion?
I think this could be done simpler by checking all files which
appear in the diff for 'copyfrom' info.
Let's assume your diff contains files A, B and C:

  If you find a single relation A->B,
    if A was deleted: A was moved to B
    if A was not deleted: A was copied to B
  If you find multiple relations A->B, A->C:
    A was copied to B and to C
    maybe A was also deleted

But maybe because of the editorness the receiver (i.e. diff text
producer) doesn't know about A when dealing with B and vice-versa,
because it only gets to deal with each of them in isolated function
calls? Bah. That is really ugly.
Maybe you can fix this by tweaking the way the diff editor is driven?
Note though that the same editor is used for merge!
Or maybe you should rewrite diff this year and then do the
git-diff-extensions project next year? :)

> 3) Determine how the base85 format works and write C-tests to confirm
>    the behavior. Git does it like this: [4]

There was code to do base85 stuff in the original svnpatch implementation.
It's been deleted from trunk but you can dig it out of the history if
that helps.

Note that we *cannot* use code from git because git is GPL'd.

> 4) Pass down a flag for allowing or disallowing binary diffs to
>    libsvn_diff. Detect binary files and write the patches. Write tests
>    to confirm the behavior.

You don't need to support binary files in the first pass if you
don't want to. I'd probably put handling of binary files as an optional
item just to keep some room for unexpected difficulties.

> 5) Allow 'svn patch' to apply git diff formats for adds and deletes.
>    Write tests to confirm the behavior.
> 
> 6) Allow 'svn patch' to apply git diff formats for moves and copies.
> 
> 7) Allow 'svn patch' to apply git diff formats for binary patches. I
>    propably need to do some thinking about what state the wc can be in
>    as for obstructed, missing, replaced, unversioned, ignored nodes and
>    so on.
> 
> 8) Make libsvn_diff able to record modes. Probably we're only
>    interrested in the executable bit and that one can we get from
>    svn:executable. Write tests to confirm the behavior.

Yeah we only care about the X bit.
The rest is done by the user's umask.
 
> 9) Allow 'svn patch' to apply mode changes (if we agree that we want
>    that behavior):

It would be nice to have svn path understand about the X bit.

> 
> 10) Decide on a header for dealing with props? Do we need to stay
>     compatible with git and diff? Probably, so we need a header that will be
>     ignored by applications not interrested in svn:properties.

We don't need to stay compatible, just do it in a way that will cause
git and hg to ignore our property data as noise.

> 11) Decide on the header format for properties. Implement it in the diff
>     code and write tests for it.
> 
> 12) Extend the diff parser to deal with property diffs. Write tests.
> 
> 13) Done.

Great.

> 
> What is the git unidiff format?
> --------------------------------
> The format is thoroughly described in [1] so I'll just recapitulate the
> use cases for it:
> 
> 1) Track copies and renames
> 2) File mode changes
> 3) Binary patches
> 
> Creating the git headers
> -------------------------
> A couple of funcs needs to be revved to pass down the neccessary
> parameters telling libsvn_diff to create a git diff. And we need a way
> to detect copies and renames.
> 
> subversion/libsvn_client/diff.c
>  (svn_client_diff5): We need a parameter to tell the diff machinery we
>     want a git diff.

Or just do it by default. No need to rev the function.

>   (svn_wc_diff_callbacks4_t): We have callbacks for changed, added and
>     deleted nodes but none for copied or moved nodes. Since we don't
>     have editor-v2 we can't get that info from the server so git diffs
>     should only be possible for wc-wc diffs at the moment. At the moment
>     I'll probably check the status of the path that we get in
>     file_added() and record copied-from or moved-from.
> 
> subversion/libsvn_diff/diff_file.c
>   (svn_diff_file_diff2): 
> 
> Parsing the git headers
> ------------------------
> We have examples of how the parsing should be done from the mercurial
> source code [2]. (This link was found in the notes document referred
> above. A big thank you to Augie Fackler for taking the time to write
> down all the information).
> 
> subversion/libsvn_diff/parse-diff.c 
>   (parse_git_hunk_header): Create this func to be invoked before
>     parse_hunk_header(). Captures oldname, newname, operation and mode.
> 
> Applying tree changes
> -----------------------
> We already have many different scenarios to handle with nodes beeing
> obstructed, missing, ignored, unversioned and so on. If we'll track tree
> changes the number of scenarios will increase. I probably should make
> some kind of graph to map out the possible scenarios.
> 
> subversion/libsvn_client/patch.c
>   (install_patched_target): Here we're currently handling deletes, adds
>     and modifications. With the git diff format we can handle copies
>     and moves here too.
> 
> Applying mode changes
> -------------------------
> Subversion does not allow file permissions to be recorded. I
> assume it's since it's hard to make those portable between windows fs
> and non-windows fs [3]. We'll have to make a decision as to whether 'svn
> patch' and 'svn diff' should be able to deal with applying permissions.
> As I see it, version control is about tracking file contents, not that
> kind of userdata but if someone has a good usecase let me hear! From
> what I understand it, mode changes are mostly used for setting the
> executable bit but we have svn:executable for that. Hrm, that of course
> can't be used yet since we can't use property diffs. :-)
> 
> Applying binary patches
> -------------------------
> 
> subversion/libsvn_client/patch.c
>   (init_patch_target): If the content is binary it will be encoded with
>     base85. A really, really small possibility but the translated stream
>     might translate something encoded as a keyword.

Don't translate binary files then :)

> 
> Applying property changes
> --------------------------
> Subversion has properties and it would be great if those could be
> included in patches. We have a diff format for properties that patch(1)
> (and hopefully the rest of the patch family) ignores, e.g. they can be
> displayed without beeing interpreted by the parser.  We need a header
> format that tells the parser on what lines in the patch we have the
> properties. All the action is in:
> 
> subversion/libsvn_client/diff.c
>   (display_prop_diffs)

I like the proposal. The only thing that worries me is that we may
need to tackle the diff-is-implemented-with-the-merge-editor hack.
Note that this hack is already biting us elsewhere:
http://subversion.tigris.org/issues/show_bug.cgi?id=2333#desc12

Maybe we should just bite the bullet and try to re-implement (or "fix")
diff together before or during gsoc? It certainly would not hurt to
have this problem solved before adding the git extensions.

Stefan

Re: [RFC] Proposal for GSoC project - extend 'svn {patch,diff}' with git unidiff format

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Apr 05, 2010 at 10:02:56AM +0200, Daniel Näslund wrote:
> Hi!
> 
> I'm supposed to send this proposal to the Google Summer of Code
> machinery and let it be forwarded to the interrested mentor of the
> Subversion community, in this case Stefan. In the interrest of openess
> I'm posting it here before sending it off to Google later today. Maybe
> someone has something they'd like to add?
> 
> ===========================================================
> Git unidiff format extension to 'svn patch' and 'svn diff'
> ===========================================================
> 
> Contents
> ==========
> Suggested workflow
> What is the git unidiff format?
> Parsing the git headers
> Applying tree changes
> Applying mode changes
> Applying binary patches
> Applying property changes
> 
> Suggested workflow
> ----------------------
> Here's my project proposal for GSoC 2010. The purpose of the project is
> pretty self-explanatory; make 'svn patch' and 'svn diff' able to deal
> with git unidiff extensions. I've tried to point out some of the API
> changes that are neccessary to show that I have an understanding of what
> to do. If I'd get accepted I would do things in this order:
> 
> 1) Rev funcs to allow a use_git_format flag to be passed down to
>    libsvn_diff and create git diff format patches for adds and deletes.
>    Write a copule of tests to verify that we get the intended
>    format.

It's acceptable to always produce the information in the diff because
patch tools which don't understand it will just ignore it as noise.
So we may not even need a flag.
 
> 2) Add the ability to track renames and copies in libsvn_diff. Probably
>    by using some wc funcs for getting the status. My first assumption
>   was that the svn_wc_diff_callbacks4_t vtable would be revved to allow
>   for copied and moved scenarios once we have editor-v2. But Neels was
>   talking about some bigger rewrite where the diff editor would be
>   dropped. Anyway, as goes for the 'git unidiff format' work, I need
>   some way to detect copies and moves. When I have detection, add the
>   git headers for copies and renames and write tests to confirm the
>   right behavior.

It's a pity that diff is implemented with an editor because it makes
your task much harder than necessary (you don't have all information
you need available while the diff is produced piece-by-piece).

Can you think of a way to do this in a simpler fashion?
I think this could be done simpler by checking all files which
appear in the diff for 'copyfrom' info.
Let's assume your diff contains files A, B and C:

  If you find a single relation A->B,
    if A was deleted: A was moved to B
    if A was not deleted: A was copied to B
  If you find multiple relations A->B, A->C:
    A was copied to B and to C
    maybe A was also deleted

But maybe because of the editorness the receiver (i.e. diff text
producer) doesn't know about A when dealing with B and vice-versa,
because it only gets to deal with each of them in isolated function
calls? Bah. That is really ugly.
Maybe you can fix this by tweaking the way the diff editor is driven?
Note though that the same editor is used for merge!
Or maybe you should rewrite diff this year and then do the
git-diff-extensions project next year? :)

> 3) Determine how the base85 format works and write C-tests to confirm
>    the behavior. Git does it like this: [4]

There was code to do base85 stuff in the original svnpatch implementation.
It's been deleted from trunk but you can dig it out of the history if
that helps.

Note that we *cannot* use code from git because git is GPL'd.

> 4) Pass down a flag for allowing or disallowing binary diffs to
>    libsvn_diff. Detect binary files and write the patches. Write tests
>    to confirm the behavior.

You don't need to support binary files in the first pass if you
don't want to. I'd probably put handling of binary files as an optional
item just to keep some room for unexpected difficulties.

> 5) Allow 'svn patch' to apply git diff formats for adds and deletes.
>    Write tests to confirm the behavior.
> 
> 6) Allow 'svn patch' to apply git diff formats for moves and copies.
> 
> 7) Allow 'svn patch' to apply git diff formats for binary patches. I
>    propably need to do some thinking about what state the wc can be in
>    as for obstructed, missing, replaced, unversioned, ignored nodes and
>    so on.
> 
> 8) Make libsvn_diff able to record modes. Probably we're only
>    interrested in the executable bit and that one can we get from
>    svn:executable. Write tests to confirm the behavior.

Yeah we only care about the X bit.
The rest is done by the user's umask.
 
> 9) Allow 'svn patch' to apply mode changes (if we agree that we want
>    that behavior):

It would be nice to have svn path understand about the X bit.

> 
> 10) Decide on a header for dealing with props? Do we need to stay
>     compatible with git and diff? Probably, so we need a header that will be
>     ignored by applications not interrested in svn:properties.

We don't need to stay compatible, just do it in a way that will cause
git and hg to ignore our property data as noise.

> 11) Decide on the header format for properties. Implement it in the diff
>     code and write tests for it.
> 
> 12) Extend the diff parser to deal with property diffs. Write tests.
> 
> 13) Done.

Great.

> 
> What is the git unidiff format?
> --------------------------------
> The format is thoroughly described in [1] so I'll just recapitulate the
> use cases for it:
> 
> 1) Track copies and renames
> 2) File mode changes
> 3) Binary patches
> 
> Creating the git headers
> -------------------------
> A couple of funcs needs to be revved to pass down the neccessary
> parameters telling libsvn_diff to create a git diff. And we need a way
> to detect copies and renames.
> 
> subversion/libsvn_client/diff.c
>  (svn_client_diff5): We need a parameter to tell the diff machinery we
>     want a git diff.

Or just do it by default. No need to rev the function.

>   (svn_wc_diff_callbacks4_t): We have callbacks for changed, added and
>     deleted nodes but none for copied or moved nodes. Since we don't
>     have editor-v2 we can't get that info from the server so git diffs
>     should only be possible for wc-wc diffs at the moment. At the moment
>     I'll probably check the status of the path that we get in
>     file_added() and record copied-from or moved-from.
> 
> subversion/libsvn_diff/diff_file.c
>   (svn_diff_file_diff2): 
> 
> Parsing the git headers
> ------------------------
> We have examples of how the parsing should be done from the mercurial
> source code [2]. (This link was found in the notes document referred
> above. A big thank you to Augie Fackler for taking the time to write
> down all the information).
> 
> subversion/libsvn_diff/parse-diff.c 
>   (parse_git_hunk_header): Create this func to be invoked before
>     parse_hunk_header(). Captures oldname, newname, operation and mode.
> 
> Applying tree changes
> -----------------------
> We already have many different scenarios to handle with nodes beeing
> obstructed, missing, ignored, unversioned and so on. If we'll track tree
> changes the number of scenarios will increase. I probably should make
> some kind of graph to map out the possible scenarios.
> 
> subversion/libsvn_client/patch.c
>   (install_patched_target): Here we're currently handling deletes, adds
>     and modifications. With the git diff format we can handle copies
>     and moves here too.
> 
> Applying mode changes
> -------------------------
> Subversion does not allow file permissions to be recorded. I
> assume it's since it's hard to make those portable between windows fs
> and non-windows fs [3]. We'll have to make a decision as to whether 'svn
> patch' and 'svn diff' should be able to deal with applying permissions.
> As I see it, version control is about tracking file contents, not that
> kind of userdata but if someone has a good usecase let me hear! From
> what I understand it, mode changes are mostly used for setting the
> executable bit but we have svn:executable for that. Hrm, that of course
> can't be used yet since we can't use property diffs. :-)
> 
> Applying binary patches
> -------------------------
> 
> subversion/libsvn_client/patch.c
>   (init_patch_target): If the content is binary it will be encoded with
>     base85. A really, really small possibility but the translated stream
>     might translate something encoded as a keyword.

Don't translate binary files then :)

> 
> Applying property changes
> --------------------------
> Subversion has properties and it would be great if those could be
> included in patches. We have a diff format for properties that patch(1)
> (and hopefully the rest of the patch family) ignores, e.g. they can be
> displayed without beeing interpreted by the parser.  We need a header
> format that tells the parser on what lines in the patch we have the
> properties. All the action is in:
> 
> subversion/libsvn_client/diff.c
>   (display_prop_diffs)

I like the proposal. The only thing that worries me is that we may
need to tackle the diff-is-implemented-with-the-merge-editor hack.
Note that this hack is already biting us elsewhere:
http://subversion.tigris.org/issues/show_bug.cgi?id=2333#desc12

Maybe we should just bite the bullet and try to re-implement (or "fix")
diff together before or during gsoc? It certainly would not hurt to
have this problem solved before adding the git extensions.

Stefan