You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Näslund <da...@longitudo.com> on 2010/06/18 06:56:25 UTC

Progress report for GSoC 'svn patch' project

Hi!

The goal is to be able to create, parse and apply diffs in the 'git
unidiff' format. The original thought was to add support for property
diffs when the basic parts of the 'git unidiff' part was in place, but
Stefan suggested that we could easily add property diffs even without
the git parts. At the moment I'm working in that direction.

What we have for 'property diffs'
==================================
* A fixed format using '##' as hunk header delimiters instead of '@@'.

  Added: prop
  ## -0,0 +1 ##
  - value
  + value

  Modified: prop
  ## -0,0 +1 ##
  - value
  + value

  Deleted: prop
  ## -0,0 +1 ##
  - value
  + value

* The ability to parse those property hunks since r955844.

What needs to be done for 'property diffs'
==========================================
* A diff header is only added if we have text changes, we need one even
  if the patch has only property changes.
* We need to be able to distinguish between patches that deletes a
  property and those that sets the property to empty.
* The property changes needs to be applied to the target.
* More C-tests involving different combinations of text and prop hunks.
* C-tests with context lines starting with 'Added: ', 'Modified: ' and
  'Deleted'.
* C-tests with reverse diffs involving properties.
* A clear documentation of the property diff format.
* Proper handling of properties with binary content.

What we have for 'git unidiffs'
================================
* With SVN_EXPERIMENTAL_PATCH we can create git headers for added and
  deleted paths
* We have fields in 'svn_patch_t' for recording what tree operation the
  patch performs.
* C-test for parsing simple 'git diffs', e.g. diffs with either text
  modifications or tree changes but not both.
* Three XFailing unittests (passes with SVN_EXPERIMENTAL_PATCH defined)
  for wc-wc, url-wc, url-url.

What needs to be done for 'git unidiffs'
=========================================
* The parsing of the git headers. We still need to be able to handle
  unidiffs. There's a *lot* of if statements in the patch I have. I'm
  thinking about using something table-driven instead.
* C-tests for git diffs that combines tree changes and text mods.
* Applying the tree changes.
* Create git headers for copied paths. Needs some small rearrangement of
  diff callbacks but nothing big.
* Create git headers for moved paths. We need to keep the deleted paths
  in a baton and match them against copyfrom information. Can be a bit
  messy.
* Encode and decode binary files.

Needs to be done but can't be done at the moment
==================================================
* Be able to handle copies and renames that's in revisions previous to
  WORKING. We need editor-v2 for that.

With SVN_EXPERIMENTAL_PATCH defined, we create git diff headers for all
possible combinatinos of sources, e.g. wc-wc, url-wc and url-url. But we
can only track renames and copies for wc-wc and the changes in wc for
url-wc. I mean that that since we won't have git diffs available for 1.7
anyway, we can just continue with the goal of releasing the git diff
feature when we have proper rename/copy tracking available.

Daniel

Re: Progress report for GSoC 'svn patch' project

Posted by Stefan Sperling <st...@elego.de>.
On Fri, Jun 18, 2010 at 08:56:25AM +0200, Daniel Näslund wrote:
> What needs to be done for 'property diffs'
> ==========================================
> * A diff header is only added if we have text changes, we need one even
>   if the patch has only property changes.
> * We need to be able to distinguish between patches that deletes a
>   property and those that sets the property to empty.
> * The property changes needs to be applied to the target.
> * More C-tests involving different combinations of text and prop hunks.
> * C-tests with context lines starting with 'Added: ', 'Modified: ' and
>   'Deleted'.
> * C-tests with reverse diffs involving properties.
> * A clear documentation of the property diff format.

Maybe move this last point about documentation to the top of the list? :)

> * Proper handling of properties with binary content.

Do you mean printing a line such as "(binary value differ)"?
Or intelligent handling of binary data?
I think the latter would be out of scope for gsoc.

> What we have for 'git unidiffs'
> ================================
> * With SVN_EXPERIMENTAL_PATCH we can create git headers for added and
>   deleted paths
> * We have fields in 'svn_patch_t' for recording what tree operation the
>   patch performs.
> * C-test for parsing simple 'git diffs', e.g. diffs with either text
>   modifications or tree changes but not both.
> * Three XFailing unittests (passes with SVN_EXPERIMENTAL_PATCH defined)
>   for wc-wc, url-wc, url-url.
> 
> What needs to be done for 'git unidiffs'
> =========================================
> * The parsing of the git headers. We still need to be able to handle
>   unidiffs. There's a *lot* of if statements in the patch I have. I'm
>   thinking about using something table-driven instead.

+1 to table-driven

> * C-tests for git diffs that combines tree changes and text mods.
> * Applying the tree changes.

We're already applying tree changes we can handle right now (add,
delete), but are inferring them from "standard" (or "plain") unidiff.
Note that I would like to keep the current behaviour we have for plain
unidiff, i.e. adds and deletes should still be performed in the working
copy. With the git diff format, those will be explicit, but only so that
we can later distinguish them from copies and moves.

> * Create git headers for copied paths. Needs some small rearrangement of
>   diff callbacks but nothing big.
> * Create git headers for moved paths. We need to keep the deleted paths
>   in a baton and match them against copyfrom information. Can be a bit
>   messy.

I'd say defer moves until everything else is working fine.

> * Encode and decode binary files.

Also defer until everything else works fine.

> Needs to be done but can't be done at the moment
> ==================================================
> * Be able to handle copies and renames that's in revisions previous to
>   WORKING. We need editor-v2 for that.
> 
> With SVN_EXPERIMENTAL_PATCH defined, we create git diff headers for all
> possible combinatinos of sources, e.g. wc-wc, url-wc and url-url. But we
> can only track renames and copies for wc-wc and the changes in wc for
> url-wc. I mean that that since we won't have git diffs available for 1.7
> anyway, we can just continue with the goal of releasing the git diff
> feature when we have proper rename/copy tracking available.

It is no problem to rip out anything we don't like to ship in 1.7 once
1.7.x has been branched. For convenience, please keep marking parts we
may not want to ship in 1.7.x using special #defines.

But feel free to enable code wrapped in those #defines unconditionally
on trunk (e.g. by defining the macro to 1 at the top of a C file or in
some header file). We want the code to be built and tested as much as
possible.

Overall, very good progress!

Thanks,
Stefan

Re: Progress report for GSoC 'svn patch' project

Posted by Stefan Sperling <st...@elego.de>.
On Fri, Jun 18, 2010 at 08:56:25AM +0200, Daniel Näslund wrote:
> What needs to be done for 'property diffs'
> ==========================================
> * A diff header is only added if we have text changes, we need one even
>   if the patch has only property changes.
> * We need to be able to distinguish between patches that deletes a
>   property and those that sets the property to empty.
> * The property changes needs to be applied to the target.
> * More C-tests involving different combinations of text and prop hunks.
> * C-tests with context lines starting with 'Added: ', 'Modified: ' and
>   'Deleted'.
> * C-tests with reverse diffs involving properties.
> * A clear documentation of the property diff format.

Maybe move this last point about documentation to the top of the list? :)

> * Proper handling of properties with binary content.

Do you mean printing a line such as "(binary value differ)"?
Or intelligent handling of binary data?
I think the latter would be out of scope for gsoc.

> What we have for 'git unidiffs'
> ================================
> * With SVN_EXPERIMENTAL_PATCH we can create git headers for added and
>   deleted paths
> * We have fields in 'svn_patch_t' for recording what tree operation the
>   patch performs.
> * C-test for parsing simple 'git diffs', e.g. diffs with either text
>   modifications or tree changes but not both.
> * Three XFailing unittests (passes with SVN_EXPERIMENTAL_PATCH defined)
>   for wc-wc, url-wc, url-url.
> 
> What needs to be done for 'git unidiffs'
> =========================================
> * The parsing of the git headers. We still need to be able to handle
>   unidiffs. There's a *lot* of if statements in the patch I have. I'm
>   thinking about using something table-driven instead.

+1 to table-driven

> * C-tests for git diffs that combines tree changes and text mods.
> * Applying the tree changes.

We're already applying tree changes we can handle right now (add,
delete), but are inferring them from "standard" (or "plain") unidiff.
Note that I would like to keep the current behaviour we have for plain
unidiff, i.e. adds and deletes should still be performed in the working
copy. With the git diff format, those will be explicit, but only so that
we can later distinguish them from copies and moves.

> * Create git headers for copied paths. Needs some small rearrangement of
>   diff callbacks but nothing big.
> * Create git headers for moved paths. We need to keep the deleted paths
>   in a baton and match them against copyfrom information. Can be a bit
>   messy.

I'd say defer moves until everything else is working fine.

> * Encode and decode binary files.

Also defer until everything else works fine.

> Needs to be done but can't be done at the moment
> ==================================================
> * Be able to handle copies and renames that's in revisions previous to
>   WORKING. We need editor-v2 for that.
> 
> With SVN_EXPERIMENTAL_PATCH defined, we create git diff headers for all
> possible combinatinos of sources, e.g. wc-wc, url-wc and url-url. But we
> can only track renames and copies for wc-wc and the changes in wc for
> url-wc. I mean that that since we won't have git diffs available for 1.7
> anyway, we can just continue with the goal of releasing the git diff
> feature when we have proper rename/copy tracking available.

It is no problem to rip out anything we don't like to ship in 1.7 once
1.7.x has been branched. For convenience, please keep marking parts we
may not want to ship in 1.7.x using special #defines.

But feel free to enable code wrapped in those #defines unconditionally
on trunk (e.g. by defining the macro to 1 at the top of a C file or in
some header file). We want the code to be built and tested as much as
possible.

Overall, very good progress!

Thanks,
Stefan