You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@wandisco.com> on 2011/08/02 15:58:56 UTC

RFC: Improving the 'diff callbacks'

I'm looking at rationalizing the way the client handles diffs, which is
currently mostly through the "svn_wc_diff_callbacks4_t" interface, with
some use made of svn_delta_editor_t.  I'll loosely abbreviate symbol
names below, for example "diff_callbacks_t" to refer to the former.

In libsvn_client, currently:

  Diff what?    Producer            Interface           Consumer

Normal diff:
  repo-repo  svn_ra_do_diff3 + |> diff_callbacks_t |> diff_callbacks
             get_diff_editor   |                   |   in diff.c;
  repo-wc    svn_ra + svn_wc   |                   |   prints diff to
  wc-wc      svn_wc_diff6      |                   |   stdout

Summarizing diff:
  repo-repo  svn_ra_do_diff3   |> delta_editor_t   |> repos_diff_
                               |                   |   summarize.c;
                               |                   |   calls sum-cb
  repo-wc    not implemented
  wc-wc      not implemented

What's going on here?  Why is the summarizing diff output subsystem not
simply an alternative consumer to the normal diff?  If it used the same
interface, we should be able to plug it straight in.

First I wondered why the delta_editor_t isn't a suitable interface for
diffs, and why the WC defines its own 'callback' type for this.  One
reason is because we want a symmetric diff, one that provides the full
content of both what's added and what's deleted.  Although the
summarizing diff doesn't need to know about file content or property
deletions, it does want to know about file and directory deletions, so
that would seem to be a good thing.  So why aren't we using the
diff_callbacks_t here?

The reason seems to be because the summarize code wants to collect up
all the changes for a given node and then issue a single notification at
'close file' or 'close directory' time, and the diff_callbacks_t doesn't
provide that functionality.  It does provide a 'dir_closed' function,
but no 'close_file', and simply having a function isn't necessarily
enough: it doesn't provide a per-node baton through which the consumer
can link the 'open', 'change' and 'close' calls for a single node
together.

To get around the lack of a per-node baton, I think the consumer could
collect the information it wants in a hash, keyed on path, that's global
to the whole diff operation.  And I'm still checking whether the
interface guarantees are sufficient to not need a 'file_closed'
function; it may be so.

So, my initial aim is to make 'diff --summarize' work through the very
same diff callbacks structure, driven from the very same diff callbacks
providers, as the normal diff, and so make 'summarize' available from
all diff sources.  Whether that involves adding per-node batons to the
interface or other changes or no changes at all, I'm not sure yet.

Next, I'm interested in 'merge' which is the other current consumer of
this diff interface.  

A second reason why the delta_editor_t isn't suitable seems to be that
the diff callbacks are tasked with communicating conflicts back to the
caller, for use in a merge.  So let's look at merge as well.

Merge:
  repo-repo  svn_ra_do_diff3 + |> diff_callbacks_t |> merge_callbacks
             get_diff_editor   |                   |   in merge.c;
                               |                   |   edits the wc
  repo-wc    not implemented
  wc-wc      not implemented

We might not care to implement 'merge' with a WC-WC or WC-repo source,
but at least we have roughly the right infrastructure to be able to do
so, which is a sign of a good design.

I'll take a look at all these parameters where we communicate the merge
status back to the caller:

  file_opened() => tree_conflicted, skip
  file_changed() => contentstate, propstate, tree_conflicted
  file_added() => contentstate, propstate, tree_conflicted
  file_deleted() => state, tree_conflicted
  etc.

At first sight it looks like there are more of these than I'd expect in
a general diff interface.  Maybe some of this info would be better
communicated a different way, perhaps through a per-node baton.

And after that I'm interested in whether 'patch' could theoretically be
implemented as a patchfile-to-diff-callbacks parser followed by the
standard 'merge' consumer.  If we can do things like that, it opens the
door to chaining blocks of functionality together in more interesting
ways.

- Julian




RE: Improving the 'diff callbacks'

Posted by Julian Foad <ju...@wandisco.com>.
Bert Huijben wrote:
> > -----Original Message-----
> > From: Julian Foad [mailto:julian.foad@wandisco.com]
> > On Tue, 2011-08-02 at 16:23 +0200, Bert Huijben wrote:
> > > > -----Original Message-----
> > > > From: Julian Foad [mailto:julian.foad@wandisco.com]
> > > > First I wondered why the delta_editor_t isn't a suitable interface for
> > > > diffs, and why the WC defines its own 'callback' type for this.  One
> > > > reason is because we want a symmetric diff, one that provides the full
> > > > content of both what's added and what's deleted.  Although the
> > > > summarizing diff doesn't need to know about file content or property
> > > > deletions, it does want to know about file and directory deletions, so
> > > > that would seem to be a good thing.  So why aren't we using the
> > > > diff_callbacks_t here?
> > >
> > > Maybe because the diff callbacks transfer the full texts of both
> > > before and after a change to provide them to the callback?
> > 
> > It appears that the diff_callbacks_t can already avoid that: the
> > 'file_opened' callback says:
> > 
> > /* This function is called before @a file_changed to allow callbacks to
> >  * skip the most expensive processing of retrieving the file data. */
> 
> If you do that (returning skip status) you don't get the information
> on text and property changes.
> 
> How would you generate the summary without those callbacks being
> invoked?

I guess you can't.  So you're right, this is one reason why we're not
using the current set of diff callbacks.

- Julian



RE: Improving the 'diff callbacks'

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Julian Foad [mailto:julian.foad@wandisco.com]
> Sent: dinsdag 2 augustus 2011 16:57
> To: Bert Huijben
> Cc: 'Subversion Development'
> Subject: RE: Improving the 'diff callbacks'
> 
> On Tue, 2011-08-02 at 16:23 +0200, Bert Huijben wrote:
> >
> > > -----Original Message-----
> > > From: Julian Foad [mailto:julian.foad@wandisco.com]
> > > Sent: dinsdag 2 augustus 2011 15:59
> > > To: Subversion Development
> > > Subject: RFC: Improving the 'diff callbacks'
> > >
> > > I'm looking at rationalizing the way the client handles diffs, which is
> > > currently mostly through the "svn_wc_diff_callbacks4_t" interface, with
> > > some use made of svn_delta_editor_t.  I'll loosely abbreviate symbol
> > > names below, for example "diff_callbacks_t" to refer to the former.
> > >
> > > In libsvn_client, currently:
> > >
> > >   Diff what?    Producer            Interface           Consumer
> > >
> > > Normal diff:
> > >   repo-repo  svn_ra_do_diff3 + |> diff_callbacks_t |> diff_callbacks
> > >              get_diff_editor   |                   |   in diff.c;
> > >   repo-wc    svn_ra + svn_wc   |                   |   prints diff to
> > >   wc-wc      svn_wc_diff6      |                   |   stdout
> > >
> > > Summarizing diff:
> > >   repo-repo  svn_ra_do_diff3   |> delta_editor_t   |> repos_diff_
> > >                                |                   |   summarize.c;
> > >                                |                   |   calls sum-cb
> > >   repo-wc    not implemented
> > >   wc-wc      not implemented
> > >
> > > What's going on here?  Why is the summarizing diff output subsystem not
> > > simply an alternative consumer to the normal diff?  If it used the same
> > > interface, we should be able to plug it straight in.
> > >
> > > First I wondered why the delta_editor_t isn't a suitable interface for
> > > diffs, and why the WC defines its own 'callback' type for this.  One
> > > reason is because we want a symmetric diff, one that provides the full
> > > content of both what's added and what's deleted.  Although the
> > > summarizing diff doesn't need to know about file content or property
> > > deletions, it does want to know about file and directory deletions, so
> > > that would seem to be a good thing.  So why aren't we using the
> > > diff_callbacks_t here?
> >
> > Maybe because the diff callbacks transfer the full texts of both
> > before and after a change to provide them to the callback?
> 
> It appears that the diff_callbacks_t can already avoid that: the
> 'file_opened' callback says:
> 
> /* This function is called before @a file_changed to allow callbacks to
>  * skip the most expensive processing of retrieving the file data. */

If you do that (returning skip status) you don't get the information on text and property changes.

How would you generate the summary without those callbacks being invoked?

	Bert 



RE: Improving the 'diff callbacks'

Posted by Julian Foad <ju...@wandisco.com>.
On Tue, 2011-08-02 at 16:23 +0200, Bert Huijben wrote:
> 
> > -----Original Message-----
> > From: Julian Foad [mailto:julian.foad@wandisco.com]
> > Sent: dinsdag 2 augustus 2011 15:59
> > To: Subversion Development
> > Subject: RFC: Improving the 'diff callbacks'
> > 
> > I'm looking at rationalizing the way the client handles diffs, which is
> > currently mostly through the "svn_wc_diff_callbacks4_t" interface, with
> > some use made of svn_delta_editor_t.  I'll loosely abbreviate symbol
> > names below, for example "diff_callbacks_t" to refer to the former.
> > 
> > In libsvn_client, currently:
> > 
> >   Diff what?    Producer            Interface           Consumer
> > 
> > Normal diff:
> >   repo-repo  svn_ra_do_diff3 + |> diff_callbacks_t |> diff_callbacks
> >              get_diff_editor   |                   |   in diff.c;
> >   repo-wc    svn_ra + svn_wc   |                   |   prints diff to
> >   wc-wc      svn_wc_diff6      |                   |   stdout
> > 
> > Summarizing diff:
> >   repo-repo  svn_ra_do_diff3   |> delta_editor_t   |> repos_diff_
> >                                |                   |   summarize.c;
> >                                |                   |   calls sum-cb
> >   repo-wc    not implemented
> >   wc-wc      not implemented
> > 
> > What's going on here?  Why is the summarizing diff output subsystem not
> > simply an alternative consumer to the normal diff?  If it used the same
> > interface, we should be able to plug it straight in.
> > 
> > First I wondered why the delta_editor_t isn't a suitable interface for
> > diffs, and why the WC defines its own 'callback' type for this.  One
> > reason is because we want a symmetric diff, one that provides the full
> > content of both what's added and what's deleted.  Although the
> > summarizing diff doesn't need to know about file content or property
> > deletions, it does want to know about file and directory deletions, so
> > that would seem to be a good thing.  So why aren't we using the
> > diff_callbacks_t here?
> 
> Maybe because the diff callbacks transfer the full texts of both
> before and after a change to provide them to the callback?

It appears that the diff_callbacks_t can already avoid that: the
'file_opened' callback says:

/* This function is called before @a file_changed to allow callbacks to
 * skip the most expensive processing of retrieving the file data. */

- Julian

> The summarize function might have been designed to reduce that overhead?



RE: Improving the 'diff callbacks'

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Julian Foad [mailto:julian.foad@wandisco.com]
> Sent: dinsdag 2 augustus 2011 15:59
> To: Subversion Development
> Subject: RFC: Improving the 'diff callbacks'
> 
> I'm looking at rationalizing the way the client handles diffs, which is
> currently mostly through the "svn_wc_diff_callbacks4_t" interface, with
> some use made of svn_delta_editor_t.  I'll loosely abbreviate symbol
> names below, for example "diff_callbacks_t" to refer to the former.
> 
> In libsvn_client, currently:
> 
>   Diff what?    Producer            Interface           Consumer
> 
> Normal diff:
>   repo-repo  svn_ra_do_diff3 + |> diff_callbacks_t |> diff_callbacks
>              get_diff_editor   |                   |   in diff.c;
>   repo-wc    svn_ra + svn_wc   |                   |   prints diff to
>   wc-wc      svn_wc_diff6      |                   |   stdout
> 
> Summarizing diff:
>   repo-repo  svn_ra_do_diff3   |> delta_editor_t   |> repos_diff_
>                                |                   |   summarize.c;
>                                |                   |   calls sum-cb
>   repo-wc    not implemented
>   wc-wc      not implemented
> 
> What's going on here?  Why is the summarizing diff output subsystem not
> simply an alternative consumer to the normal diff?  If it used the same
> interface, we should be able to plug it straight in.
> 
> First I wondered why the delta_editor_t isn't a suitable interface for
> diffs, and why the WC defines its own 'callback' type for this.  One
> reason is because we want a symmetric diff, one that provides the full
> content of both what's added and what's deleted.  Although the
> summarizing diff doesn't need to know about file content or property
> deletions, it does want to know about file and directory deletions, so
> that would seem to be a good thing.  So why aren't we using the
> diff_callbacks_t here?

Maybe because the diff callbacks transfer the full texts of both before and after a change to provide them to the callback?

The summarize function might have been designed to reduce that overhead?

	Bert